Render-Layer Redaction Oracle — Indirect Injection

Every indirect-prompt-injection threat model rests on one comfort: the attacker is blind. They drop a payload into a corpus, fire, and wait — and all they ever see back is the product's clean, rendered answer. Whether the injection became a privileged tool call or died in a classifier is, to them, unobservable. That assumption is the load-bearing beam of the entire defensive posture. It is also false for any party on the decrypted stream.

Kai Aizen
creator of AATMF · author of Adversarial Minds · NVD contributor
SHELL.002 · 2026.05

EchoLeak (CVE-2025-32711) is the reference point, and the part everyone quotes is the wrong part. The hard engineering in EchoLeak was not the injection — it was building a channel to confirm the exfil and route it past CSP. The injection is easy. Knowing it worked is the expensive part. This piece is about the case where that part is free: where the attacker is sitting on the SSE stream and watches the injected instruction become a tool call, in plaintext, before the interface decides to hide it.

This is the same thesis argued across kernels, language models, and agents, pointed at the transport this time: the channel reveals what the surface conceals. It is a property of the protocol rather than a trick against one product. Same attack. Different substrate.

We walk it in five steps. Each is a mechanism, stated as protocol fact where it is one and as a testable claim where it is one, with the line between the two drawn explicitly so the strong parts are not diluted by the checkable ones.

The model the threat model assumes

Indirect prompt injection is OWASP LLM01. The standard treatment models the attacker as a writer with no read-back: they place text where a model will ingest it — RAG corpus, email body, fetched page, tool output — and the model either follows it or does not. The defender's controls (spotlighting, channel separation, classifiers) are all built against an adversary who cannot tell, from outside, whether any individual payload landed. Detection asymmetry is assumed to favor the defender.

The assumption is not written down anywhere, the same way the three buried clauses of the confused-deputy assumption were never written down. It is free, until it isn't. Here it stops being free the moment the attacker's vantage point includes the response stream rather than only the rendered result. The diagram below is the assumption and its failure side by side; the rest of the piece is one walk from the left box to the right.

fig 1 · the blind-attacker assumption and its failure — detection asymmetry was a vantage-point artifact, not a property of the attack

Step 1 — The stream carries the tool call as syntax

This step is protocol fact, not inference, and it is the foundation everything else stands on, so it is stated plainly. Streaming chat APIs do not emit a finished answer — they emit a sequence of typed events over Server-Sent Events. Tool invocations are part of that sequence as structured syntax, not as an opaque blob.

On Anthropic's wire: a content_block_start event with type: "tool_use" carrying the tool name, followed by input_json_delta events that stream the argument JSON fragment by fragment until content_block_stop. On OpenAI's wire: tool_calls entries whose function.name and function.arguments arrive as incremental deltas. Either way the tool name and its arguments cross the wire as readable text, in order, as they are generated.

Nothing here is a vulnerability. It is how streaming tool use is specified to work, and it has to work this way for the client to render progress. The point of the step is what it makes true for everything downstream: the agent's decision to call a tool, and the arguments it chose, exist on the wire as plaintext before any consumer of that stream has done anything with them.

fig 2 · tool use is streamed syntax — the foundation claim, true by protocol design on every major streaming API

Step 2 — Redaction is a render-layer decision

This step is the testable claim, and it is labelled as such because the strength of the piece depends on not overstating it. When a product hides something — a tool result, a reasoning block, a secret-bearing argument, an intermediate the final answer doesn't show — where does that hiding happen?

If the product decides what to suppress and only then streams the survivor, redaction is at the trust boundary and this piece has nothing on it. If the product streams the content and the client or a post-stream pass removes it from what the user sees, then the redaction is render-layer: it is CSS over bytes that already crossed the wire. The empirically common pattern is the second, for a structural reason — deciding-then-streaming costs latency, and streaming-then-hiding does not. Products optimize the metric the user feels. That trade, latency bought with a render-layer redaction, is the defect, and it is OWASP LLM02: insecure output handling, with the handling applied one layer too late to be a control.

State the honest boundary: this is per-product checkable, not universal. The reason the piece does not depend on it — see step 4 — is that the oracle survives even where redaction is server-side, because the oracle does not need redaction to fail. It needs only step 1.

fig 3 · the testable claim — where the hide happens; the common answer is render-layer, and that is the LLM02 defect

Step 3 — The oracle

This is the payload, and it is deliberately separated from step 2 because it does not inherit step 2's qualification. Return to the blind attacker. They have placed an instruction in a document the agent will retrieve. In the classic model they fire and learn nothing — the rendered answer is clean whether the injection became a privileged call or was caught.

Put them on the stream. The injected instruction, if it lands, becomes a tool_use block with input_json_delta arguments — step 1, protocol fact. The attacker watches their own instruction turn into a named tool call with arguments, as it happens, before any render-layer suppression (step 2) can act and regardless of whether that suppression exists at all. Blind indirect injection becomes closed-loop indirect injection. The attacker now has live confirmation of landing, the exact tool and arguments the injection produced, and a feedback signal to iterate the payload against the model's actual behavior instead of guessing.

This is not an observation about transports. It is a change in attacker capability against LLM01 — the top of the taxonomy. EchoLeak's expensive component, confirmation, is delivered for free by the protocol to anyone on the stream. The defensive posture that assumed detection asymmetry favored the defender was reasoning from a vantage point the attacker is not obligated to accept.

fig 4 · the oracle — confirmation EchoLeak had to engineer is delivered by the protocol; survives even server-side redaction

Step 4 — Why this is not one product's bug

If this were a misconfiguration it would be a hardening note. It is not, and the reason is the split established across steps 1 and 2. One property is universal by protocol design; the other is common but checkable. Keeping them separate is what makes the piece robust to its own fork.

Universal — streamed tool deltas before the action resolves. Every major streaming API emits the tool name and arguments incrementally before the tool actually runs and before the turn completes. This is not optional; progressive rendering requires it. The oracle (step 3) needs only this. It holds wherever tool use is streamed, independent of any product's redaction choices.

Common but checkable — render-layer redaction. The amplifying defects — system-prompt and tool-schema disclosure through reasoning blocks that stream and are then hidden by the renderer; secrets present in tool-argument deltas that a client-side filter scrubs from the displayed transcript; intermediate argument revisions the final render never shows — all depend on step 2 being true for the specific product. Where it is, the wire transcript and the rendered transcript differ, and the difference is exactly the material the product chose to suppress. Where it is not, these corollaries fall away and the oracle still stands.

The systemic point is the trade itself. Streaming-then-hiding is chosen over deciding-then-streaming because users feel latency and do not feel that a redaction happened after the bytes left. That is a deliberate exchange of a security boundary for a performance metric, made the same way across the industry, which is what makes it a class rather than an incident.

fig 5 · universal versus checkable — the oracle rides the protocol property, not the product one

The primitive, and the boundary it ignores

Name it so it can be argued about: render-layer redaction. A suppression applied after the trust boundary is not a control — it is presentation, and an attacker is never obligated to consume the presentation instead of the wire. The capability it hands the attacker is the injection oracle: the conversion of blind indirect injection into a closed loop, delivered by the streaming protocol to any party on the decrypted stream. EchoLeak had to build its confirmation channel. The protocol builds it for everyone.

The honest scope, stated not buried: this requires payload visibility. The party on the stream is the browser, a research or malicious extension, a logging or intercepting proxy, the client itself, or a tap on a product that is not end-to-end encrypted. For browser-delivered SSE products with no E2E, that is the default position, not an exotic one — and the threat model has to say so in its first sentence or it deserves the pushback it will get.

What actually closes it: move redaction to the trust boundary — decide-then-stream, never stream-then-hide — and accept the latency that buys, because the latency was the price of the boundary all along. Treat the system prompt and tool schema as disclosed the instant they are reasoned about in a streamed block. Treat any secret that can appear in a tool argument as disclosed if the argument is streamed before the secret is scrubbed. None of these are hard once the layer is named; the reason they are pervasive is that the cheaper layer was chosen on purpose.

The blindfold was never a property of the attack. It was a property of where the defender assumed the attacker was standing. We built agents that narrate their decisions over a wire in plaintext, then hid the narration with the interface, and called the interface a control.

same attack. different substrate.

sources — Anthropic Messages streaming: content_block_start / input_json_delta / content_block_stop event sequence for tool_use blocks. OpenAI chat streaming: tool_calls function.name / function.arguments incremental deltas. OWASP Top 10 for LLM Applications (2025): LLM01 prompt injection, LLM02 insecure output handling. Aim Labs / MSRC, EchoLeak (CVE-2025-32711, 2025), referenced for the confirmation-channel cost, not the injection.

Indirect Injection Was Never Blind: The Render-Layer Redaction Oracle