Gain greater depth of understanding of file descriptors by seeing how read uses them
read(2) entry
read(2)
Advanced reference count optimization
Reading through the virtual filesystem
SYSCALL_DEFINE3(read, ...)
Just calls ksys_read()
ksys_read()
Only one other caller in s390 compat code
Originally there were more callers
Obtain a reference to the file position or bail
Create a local copy of the file position
Perform virtual filesystem (vfs) read
If needed, update the file position
Drop any held references
fdget_pos()
CLASS(...
DEFINE_CLASS(fd_pos,...)
What is this struct fd and why might we want something more than just the struct file?
struct fd
struct file
We don't always need to have our own reference to the struct file
We need to keep track of which references need to be dropped
fd_file()
Get an unsigned long
unsigned long
Split it into a struct fd
#define fd_file(f) ((struct file *)((f).word & ~(FDPUT_FPUT|FDPUT_POS_UNLOCK)))
First, do we need the file lock?
Then, do we need the file position lock?
fdget()
Get a reference to a file descriptor unless it's opened in path mode
__fget_light()
If the refcount is 1, we can borrow it
Otherwise, we need our own reference
Use atomic_read_acquire() to get the current reference count
atomic_read_acquire()
Call files_lookup_fd_raw() directly
files_lookup_fd_raw()
The unsigned long return value will be cast
__fget_files()
__fget_files_rcu()
In the case we cannot borrow, mark the lower bits of the pointer
Question: what is fd_file() doing?
file_needs_f_pos_lock()
When do we need the file position lock?
Any regular file or directory has FMODE_POS_ATOMIC set
FMODE_POS_ATOMIC
in do_dentry_open()
do_dentry_open()
POSIX.1-2017 2.9.7
In addition, we check the file_count and for a shared iterator
To finish up, lock and set another bit if needed
First check whether the file is open with fd_empty()
fd_empty()
Recent patchset by maintainer
Introduced fd_empty() and fd_file()
Used to check f.file
f.file
file_ppos()
Otherwise, this just gets the address of the file position
vfs_read()
Overview:
Validate the operation and its inputs
Execute the specific read handler
Notify of completion
First three checks
Make sure the file is open for reading
Make sure that the file can be read
Make sure the output buffer is a sane address
rw_verify_area()
Sanity check the file position
Verify read access
security_file_permission()
Check that count isn't too big
count >= MAX_RW_COUNT
MAX_RW_COUNT
Ensures maximum value is rounded down to page boundary
Call the actual read!
Call the read() member of file operations
read()
Otherwise, call read_iter()
read_iter()
If we are successful:
Tell fsnotify to let others know of this access
Account for task's bytes written
Unconditionally:
See struct task_io_accounting
struct task_io_accounting
Last steps to wrap up
Update the file position if relevant
Drop any references we may have
Return the number of bytes read or an error
fdput_pos()
Question: When does control flow call this function?
If we locked the file position: __f_unlock_pos()
__f_unlock_pos()
If we locked the file: fdput() calls fput()
fdput()
fput()
read() doesn't need to do as much as open() or write()
open()
write()
Small optimizations on file descriptor operations add up to significant performance improvements
Watch out for data storage in unexpected places like the lower bits of a pointer!