Skip to main content
Menu

Linux Kernel io_uring/zcrx: Race Condition to Double-Free

kernel io_uring race-condition double-free vulnerability-research

A race condition in the Linux kernel's io_uring zerocopy receive (zcrx) subsystem allows concurrent scrub and refill paths to corrupt the freelist, leading to a double-free condition and out-of-bounds write. This finding was accepted upstream and backported to the stable kernel tree.

Finding Summary

Componentio_uring/zcrx (zerocopy receive)
Bug ClassRace Condition → Double-Free → Out-of-Bounds Write
ImpactKernel memory corruption, potential privilege escalation
Upstream Commit003049b1c4fb
Stable Backporta94f096e28bf (v6.18.16)
StatusFixed, backported to stable (Cc: [email protected])
AuthorKai Aizen ([email protected])

Technical Analysis

The io_uring/zcrx subsystem implements zerocopy receive for high-performance networking. It manages a pool of network I/O vectors (niov) using a reference-counted freelist. Each niov has a user_refs field that tracks whether userspace still holds a reference.

The vulnerability exists in the interaction between two concurrent code paths:

  • Scrub path — reclaims buffers when io_uring operations complete
  • Refill path — returns buffers to the pool when userspace is done with them

Both paths perform a check-then-decrement sequence on user_refs that is not atomic. When both paths race on the same niov, the following corruption chain occurs:

The Race

The vulnerable code performs a non-atomic read-check-decrement on user_refs:

  1. Thread A (scrub) reads user_refs == 1
  2. Thread B (refill) reads user_refs == 1
  3. Thread A decrements to 0, pushes niov to freelist, increments free_count
  4. Thread B decrements to -1 (or wraps), pushes the same niov to freelist again, increments free_count

The Corruption Chain

After the race, the system state is corrupted in two ways:

  • Double-free: The same niov appears twice in the freelist. Subsequent allocations will hand out the same buffer to two different operations, causing use-after-free conditions.
  • free_count exceeds nr_iovs: Because both paths increment the free counter, free_count becomes greater than the total number of I/O vectors (nr_iovs). Subsequent freelist push operations then write past the end of the freelist array — an out-of-bounds write in kernel memory.

The Fix

The fix replaces the non-atomic check-then-decrement with an atomic_try_cmpxchg loop, ensuring that only one path can successfully claim the final reference:

/* Before (vulnerable): non-atomic check-then-decrement */
if (niov->user_refs) {
    niov->user_refs--;
    if (!niov->user_refs)
        io_zcrx_return_niov(niov);  /* push to freelist */
}

/* After (fixed): atomic compare-and-exchange loop */
u32 refs = atomic_read(&niov->user_refs);
do {
    if (!refs)
        return;
} while (!atomic_try_cmpxchg(&niov->user_refs, &refs, refs - 1));

if (refs == 1)
    io_zcrx_return_niov(niov);  /* only winner pushes */

The atomic_try_cmpxchg loop guarantees that exactly one thread wins the final decrement from 1 → 0, preventing the double-free. Losing threads see the already-decremented value and bail out.

Impact Assessment

The io_uring subsystem is one of the most security-sensitive components in the Linux kernel. It runs in kernel context with full privileges and has been the source of numerous privilege escalation vulnerabilities in recent years.

  • Memory corruption: The out-of-bounds write can corrupt adjacent kernel heap objects, potentially leading to arbitrary code execution in kernel context.
  • Denial of service: At minimum, the double-free corrupts the freelist and causes kernel panics on subsequent allocations.
  • Privilege escalation: Controlled heap corruption via io_uring has been a well-documented path to root — this class of bug is the foundation of multiple prior io_uring CVEs.

The fix was accepted upstream and marked Cc: [email protected] for backport to all maintained stable kernel branches, confirming the kernel maintainers assessed this as a security-relevant bug.

Context: io_uring as an Attack Surface

The io_uring interface has produced a disproportionate number of kernel vulnerabilities since its introduction in Linux 5.1. Google's security team reported that 60% of their kCTF VRP Linux kernel exploits in 2022 targeted io_uring. Multiple container runtimes and security profiles (including gVisor and ChromeOS) disable io_uring entirely due to its attack surface.

The zcrx (zerocopy receive) extension is relatively new, adding complexity through shared-memory buffer management between kernel and userspace — exactly the pattern where atomicity bugs hide. This finding demonstrates that even modern additions to io_uring carry the same class of concurrency vulnerabilities that have plagued the subsystem since inception.

Significance

This is the first Linux kernel contribution in the SnailSploit portfolio, expanding from web application and AI security research into operating system internals. The finding was identified through manual code review of the zcrx buffer lifecycle, not through fuzzing — the race window is narrow enough that standard kernel fuzzers (syzkaller) are unlikely to trigger it reliably.

References

KA

Kai Aizen

Creator of AATMF • Author of Adversarial Minds • NVD Contributor

Known as "The Jailbreak Chef," specializing in LLM jailbreaking and adversarial AI. Creator of the AATMF and P.R.O.M.P.T frameworks for systematic AI security analysis.