Part I · Chapter 005

Data Exfiltration Concepts

Inference pipelines as covert channels -- how model outputs can carry hidden information.

Status Draft -- for peer review Updated 2026-03-21

In traditional cybersecurity, data exfiltration is the unauthorized transfer of data from a system to an adversary-controlled destination. The techniques are well-understood: DNS tunneling, steganography in images, covert channels in protocol headers, encoding data in timing patterns. The defenses are also well-understood: network monitoring, data loss prevention (DLP), egress filtering. Security teams have decades of experience detecting and preventing exfiltration through network-based channels.

LLMs break this model. When a language model processes a query that includes sensitive data — whether from a user prompt, a RAG-retrieved document, or a tool call result — it has that data in its context window. The model does not need a network connection to exfiltrate it. It does not need to write it to disk or establish a covert channel through a firewall. It can encode data in the text it generates. Every output token is a choice from a vocabulary of tens of thousands of options, and at every generation step, many of those options are approximately equally plausible. This surplus of valid choices is the channel: a model can select tokens that subtly encode information without producing text that looks anomalous to a human reader or a naive output filter.

This is the fundamental insight of this chapter: the model itself is the potential exfiltration agent, and its normal output channel is the exfiltration vector. Air-gapping the model from the network does not solve the problem if the model’s outputs leave the air gap — which they must, because the outputs are the entire point of running the model.

This chapter provides the conceptual bridge between traditional exfiltration (which security readers already understand) and the LLM-specific vectors analyzed in Chapter 102’s data exfiltration section and Chapter 104’s scenario walkthroughs. Understanding both the traditional and LLM-specific mechanics is necessary to evaluate the mitigations discussed in Chapter 103.

Traditional Exfiltration: Channels, Encoding, and Detection

This section reviews the exfiltration techniques that security professionals already know, framing them as a foundation for comparison. It covers common channels (network protocols, DNS, HTTP headers, steganography in media files), encoding methods (base64, binary-to-text, protocol abuse), and detection approaches (traffic analysis, behavioral anomaly detection, DLP systems). The purpose is not to teach exfiltration to security readers — they know it — but to establish the vocabulary and threat model that will be extended to LLM-specific vectors in the sections that follow.

LLM Inference as a Data Processing Pipeline

Before examining exfiltration risks, this section maps the inference pipeline from a data-flow perspective: what data enters the model’s context window, how it is processed, and where it exits. This includes the system prompt (which may contain sensitive configuration), user input (which may contain queries about sensitive data), RAG-retrieved documents (which may contain the sensitive data itself), and tool call results (which may include database records, file contents, or API responses). Mapping this data flow is necessary to understand where exfiltration opportunities arise — which is at every point where external data enters the context and at every point where generated output leaves it.

The Context Window as an Attack Surface

The context window is the model’s working memory during inference, and anything placed in it is accessible to the model during generation. This section explains why the context window is a critical attack surface: in agentic and RAG-enabled deployments, the context window routinely contains data from multiple trust levels — trusted system instructions, partially trusted user input, and untrusted retrieved content. A compromised or adversarial model has access to all of it simultaneously, and there is no in-context isolation mechanism that prevents the model from incorporating high-sensitivity data into its output. Chapter 008 covers the architectures that create this situation; this section establishes why it matters for exfiltration.

Output Channels: Text, Tool Calls, and Side Effects

A model’s outputs are not limited to the text a user sees in a chat interface. This section catalogs the full range of output channels available to an LLM: generated text responses, structured tool call arguments (which may include URLs, file paths, or API parameters), log entries that include model output, and any downstream system that processes generated content. In agentic configurations, tool calls are particularly dangerous because they can direct the model’s output to external systems — writing to files, making HTTP requests, or executing code — effectively giving the model a network connection even in deployments that did not intend to provide one. Chapter 102 covers tool-use exploitation in detail.

Steganography in Natural Language

This section covers the core technical mechanism of LLM-based exfiltration: encoding information in the model’s token choices. Unlike traditional steganography (which hides data in the least-significant bits of images or audio), linguistic steganography exploits the redundancy of natural language. At every generation step, the model’s probability distribution over the vocabulary assigns non-trivial probability to many tokens. By selecting among these plausible alternatives according to a scheme known to the adversary, data can be encoded at rates of several bits per token without producing text that appears unusual. Research on neural linguistic steganography has demonstrated practical encoding rates and high imperceptibility (Ziegler et al., 2019).

Bandwidth Analysis: How Much Data Can Be Exfiltrated

Abstract exfiltration risks need concrete bounds to be useful for risk assessment. This section estimates the practical bandwidth of steganographic encoding through generated text: how many bits per token can be encoded, what throughput this produces given typical response lengths and query rates, and what types of data can realistically be exfiltrated through this channel. The analysis considers both single-query encoding (limited by response length) and multi-query accumulation (where an external orchestrator makes repeated queries to extract data incrementally). These bandwidth estimates are used directly in the risk quantification in Chapter 105.

Air-Gap Limitations: Why Network Isolation Is Insufficient

This section addresses a common misconception: that running a model on an air-gapped or network-isolated system eliminates exfiltration risk. Air-gapping prevents the model from establishing its own network connections, but it does not prevent the model from encoding data in its outputs. If those outputs are read by a human, copied to another system, included in a report, or processed by any downstream tool, the encoded data travels with them. The exfiltration channel is the output text itself, and that text must leave the system for the model to be useful. This section makes the case that output monitoring — not just network monitoring — is necessary for any deployment handling sensitive data. Chapter 103 evaluates practical approaches to output monitoring.

Key Takeaway: Exfiltration Risk Is Inherent to Giving Models Access to Sensitive Data

The closing section synthesizes the chapter into its core argument: the exfiltration risk from LLMs is structurally different from traditional exfiltration because the model’s normal operation — reading input, generating output — is the potential exfiltration mechanism. There is no way to give a model access to sensitive data for legitimate processing while simultaneously guaranteeing that it cannot encode that data in its output. The mitigations discussed in Chapter 103 can reduce the risk, but they cannot eliminate it entirely. This residual risk is a factor in the deployment model recommendations in Chapter 107.

Summary

[Chapter summary to be written after full content is drafted.]