A race condition in the Linux kernel's io_uring zerocopy receive (zcrx) subsystem allows concurrent scrub and refill paths to corrupt the freelist, leading to a double-free condition and out-of-bounds write.
This finding was accepted upstream and backported to the stable kernel tree.
Finding Summary
| Component | io_uring/zcrx (zerocopy receive) |
| Bug Class | Race Condition → Double-Free → Out-of-Bounds Write |
| Impact | Kernel memory corruption, potential privilege escalation |
| Upstream Commit | 003049b1c4fb |
| Stable Backport | a94f096e28bf (v6.18.16) |
| Status | Fixed, backported to stable (Cc: [email protected]) |
| Author | Kai Aizen ([email protected]) |
Technical Analysis
The io_uring/zcrx subsystem implements zerocopy receive for high-performance networking.
It manages a pool of network I/O vectors (niov) using a reference-counted freelist.
Each niov has a user_refs field that tracks whether userspace still holds a reference.
The vulnerability exists in the interaction between two concurrent code paths:
- Scrub path — reclaims buffers when io_uring operations complete
- Refill path — returns buffers to the pool when userspace is done with them
Both paths perform a check-then-decrement sequence on user_refs that is not atomic.
When both paths race on the same niov, the following corruption chain occurs:
The Race
The vulnerable code performs a non-atomic read-check-decrement on user_refs:
- Thread A (scrub) reads
user_refs == 1 - Thread B (refill) reads
user_refs == 1 - Thread A decrements to
0, pushesniovto freelist, incrementsfree_count - Thread B decrements to
-1(or wraps), pushes the sameniovto freelist again, incrementsfree_count
The Corruption Chain
After the race, the system state is corrupted in two ways:
- Double-free: The same
niovappears twice in the freelist. Subsequent allocations will hand out the same buffer to two different operations, causing use-after-free conditions. - free_count exceeds nr_iovs: Because both paths increment the free counter,
free_countbecomes greater than the total number of I/O vectors (nr_iovs). Subsequent freelist push operations then write past the end of the freelist array — an out-of-bounds write in kernel memory.
The Fix
The fix replaces the non-atomic check-then-decrement with an atomic_try_cmpxchg loop,
ensuring that only one path can successfully claim the final reference:
/* Before (vulnerable): non-atomic check-then-decrement */
if (niov->user_refs) {
niov->user_refs--;
if (!niov->user_refs)
io_zcrx_return_niov(niov); /* push to freelist */
}
/* After (fixed): atomic compare-and-exchange loop */
u32 refs = atomic_read(&niov->user_refs);
do {
if (!refs)
return;
} while (!atomic_try_cmpxchg(&niov->user_refs, &refs, refs - 1));
if (refs == 1)
io_zcrx_return_niov(niov); /* only winner pushes */
The atomic_try_cmpxchg loop guarantees that exactly one thread wins the final decrement from 1 → 0,
preventing the double-free. Losing threads see the already-decremented value and bail out.
Impact Assessment
The io_uring subsystem is one of the most security-sensitive components in the Linux kernel.
It runs in kernel context with full privileges and has been the source of numerous privilege escalation vulnerabilities in recent years.
- Memory corruption: The out-of-bounds write can corrupt adjacent kernel heap objects, potentially leading to arbitrary code execution in kernel context.
- Denial of service: At minimum, the double-free corrupts the freelist and causes kernel panics on subsequent allocations.
- Privilege escalation: Controlled heap corruption via io_uring has been a well-documented path to root — this class of bug is the foundation of multiple prior io_uring CVEs.
The fix was accepted upstream and marked Cc: [email protected] for backport to all maintained stable kernel branches,
confirming the kernel maintainers assessed this as a security-relevant bug.
Context: io_uring as an Attack Surface
The io_uring interface has produced a disproportionate number of kernel vulnerabilities since its introduction in Linux 5.1.
Google's security team reported
that 60% of their kCTF VRP Linux kernel exploits in 2022 targeted io_uring.
Multiple container runtimes and security profiles (including gVisor and ChromeOS) disable io_uring entirely due to its attack surface.
The zcrx (zerocopy receive) extension is relatively new, adding complexity through shared-memory buffer management between kernel and userspace — exactly the pattern where atomicity bugs hide.
This finding demonstrates that even modern additions to io_uring carry the same class of concurrency vulnerabilities that have plagued the subsystem since inception.
Significance
This is the first Linux kernel contribution in the SnailSploit portfolio, expanding from web application and AI security research into operating system internals.
The finding was identified through manual code review of the zcrx buffer lifecycle, not through fuzzing — the race window is narrow enough that standard kernel fuzzers (syzkaller) are unlikely to trigger it reliably.