Gain a broad overview of many aspects of the kenrel by understanding what's necessary to close a file descriptor
Peel back the layers of close(2)
Removing entries of the FDT
Scheduling work to be done later
Execution context design considerations
Several more concurrency techniques
Execution context sentitive code
close(2)
do?Invalidate int fd
index in FDT
Close the struct file *
if needed
close(2)
from start to finishVerify with strace
that close(3)
indeed calls close(2)
SYSCALL_DEFINE1(close)
cannot restart syscall since struct file
is gone
If the file fails to be closed, the data may be hosed
close_fd()
Use int fd
arugment to index into FDT
Obtain underlying struct file
What benefit could is there to using spin_lock
here?
file_close_fd_locked()
Index into FDT properly
Do bounds checking on input value
Use array_index_nospec() macro for security
Use RCU to safely NULL
ify FDT entry without locks
Concurrent readers of the FDT will see a value that makes sense
array_index_mask_nospec()
(generic) (arm64) (x86)Create bitmask based on index
All 1
s if within bounds, else 0
Bitwise AND index to zero if out-of-bounds
Speculative indexing into the array always within bounds
__put_unused_fd()
Why don't we call put_unused_fd()
?
We already hold the files->file_lock
spinlock
__clear_open_fd()
Update bitmaps holding open file info
full and low resolution maps used
BITS_PER_LONG-sized ranges checked if all fds in use
__put_unused_fd()
Smallest available fd
stored for next open(2)
This free may require updating smallest fd
file_close_fd_locked()
Return the struct file
associated with the open fd
Return NULL
if fd
not open
close_fd()
No file? Then -EBADF
Lastly, return whatever filp_close()
returns
filp_open()
filp_close()
sanity check reference count
Never should be 0
Use CHECK_DATA_CORRUPTION()
macro which may call BUG()
on kernels configured to do so
BUG()
ASM_BUG_FLAGS()
generates assembly from preprocessor macros
Why use high numbers in assembly labels?
high numbers in assembly are to avoid collisions
filp_open()
filp_close()
If implemented call the ->flush()
file operation
Flush performs pre-closure cleanup
Example: writing buffered data to storage medium
Can open(2)
a file with O_PATH
Lighweight efernce to filesystem path entry
No I/O
Example usage: permission checks, change of ownership
filp_open()
filp_close()
For files with I/O context
Flush directory notifications using the dnotify
system
Remove POSIX locks associated with this file
First Linux filesystem event notificaton system
Added in 2001 in Linux 2.4.0
Monitor CRUD chagnes in directory
Notifed via SIGIO
usually
Only directory granularity
Signal handling can be tricky
Need open fd
not much info about events
No longer used
Kept for legacy reasons
Replacement: inotify
fcntl(2)
example posix locks program
filp_open()
filp_close()
Call fput()
to finish the job
No error code from fput()
Return value nonzero only when flush
fails
fput()
Decrement the file's reference count (file->f_count
)
No other action taken when result is nonzero
If count reaches zero, instigate the real work
fput()
Why rush? Schedule a future callback
First method: only for process context
Second method: for any context
in_interrupt()
A depreceated macro
Transitively defined by irq_count()
Bitwise OR three shifted values
NMI, softirq, and hardirq counts
Nonzero when any count is nonzero
preempt_count()
Architecture-specific data source
Value stored in current->thread_info
Can directly cast current
since struct thread_info
is first member
READ_ONCE()
prevents racy compiler re-ordering
Process context without userspace
Can sleep, be preempted
Can call most kernel funtions
No userspace memory to access
likely()
macroGenerates branch prediction hints
Not on all CPUs
unlikely()
does the inverse
likely()
?Faster true case
Slower false case
Helpful only when very likely true
Otherwise considered harmful
Schedule callback to run on current
's behalf
init_task_work()
wraps callback struct member assignment
task_work_add()
schedules the work
____fput()
If this fails, just fallback to the other method
TWA_SIGNAL
interrupts target task
TWA_SIGNAL_NO_IPI
is more chill
TWA_RESUME
is the most relaxed
Global delayed work queue
Create a list of files to pass to callback
Run them all in a jiffy (next timer tick)
fput()
Use schedule_delayed_work()
to access global queue
Uses structure defined with DECLARE_DELAYED_WORK()
Do work after delay
timer ticks pass
fput()
Avoid any extra scheduling
Conditionally call schedule_delayed_work()
Only on first list append
Resulting work will empty this queue
llist
: lockless linked list implementation optimized for concurrent access
Two possible callers of __fput()
____fput()
when using task work
delayed_fput
when using global delayed work
____fput()
Uses the container_of()
macro
Use struct member offset subtraction
Pass containing struct file
to __fput()
delayed_fput()
Detach list of files from caller handle
Use special llist
iterator
Pass each file to __fput()
__fput()
What do we need to do?
Clean up file-associated resoruces
Drop references held by file
Free allocated memory
__fput()
Is the file really open?
Check FMODE_OPENED
flag in file->f_mode
,
Set by do_dentry_open()
in open(2)
path
Without this flag, skip to memory freeing
__fput()
A debugging helper: might_sleep()
__fput()
Spead the news of this closure
fsnotify
provides fs event info to other kernel systems
e.g. inotify
consumes this data
__fput()
Call eventpoll_release()
to clean up all resoruces associated with event polling on this struct file
__fput()
Safe to release the file's locks: locks_remove_file()
__fput()
Integrity Management Architecture (IMA)
Prevents tampering with file contens
Allocates resources for each file
Cleanup with ima_file_free()
__fput()
Handle pending asynchronous operations
Only if file has FASYNC
flag set
Call fasync()
handler defined by underlying file implementation
__fput()
Call any extant release()
file operation
__fput()
Release reference to a character device and file operations
Only if file is backed by one
Reference to any underlying module implements fops reference
__fput()
Drop reference to pid
of file owner
Contained in struct pid
in struct fown_struct
Which is a member of struct file
__fput()
Use put_file_access()
to perform access mode specific tasks to clean up access to the file
__fput()
Drop a reference to the dentry
for this file with dput()
__fput()
Some file modes will require an unmount at this point
Handled by dissolve_on_fput()
May cover later material on namespaces
__fput()
mntput()
frees the struct file
's struct vfsmount
member
__fput()
Finish the job with file_free()
file_free()
Notify Linux Security Modules (LSM) framework users to cleanup with security_file_free()
to clean up
file_free()
Decrement open file counter
Directly decrement local percpu counter
Global total periodically calculated
file_free()
Drop refrence to file's struct cred
file_free()
If the file is a backing store for a device or file, drop reference to associated struct path
Example: a loopback device
Last step before freeing memory
file_free()
Free the last structure's memory
Backing files free their backing file structure
Otherwise, return the struct file
to its kmem_cache()
Back to whence it came
We return to userspace, concluding the close(2)
implementation
The close(2)
systemcall contains plenty of complexity and many layers
Many different types of in-kernel resources may be associated with a file
The kernel employs creative lock avoidant techniques to implement correct concurrency
Correct reference counting is essential
The codepath can invoke several file operations, including release()
, flush()
, and fasync()
msg = (silence)
whoami = None
singularity v0.6-56-g8e52bc8 https://github.com/underground-software/singularity