Part I · Chapter 007

The AI/ML Security Landscape

ML threat taxonomy, red teaming, and how traditional security maps (and doesn't) to machine learning.

AI/ML security is a field that is simultaneously urgent and immature. Organizations are deploying language models in production systems that handle sensitive data, make consequential decisions, and interact with external services — while the security community is still establishing basic threat taxonomies, developing detection tools, and arguing about which attack categories are even feasible in practice. If you come from traditional cybersecurity, where vulnerability classes have CVE numbers, exploit frameworks have decades of refinement, and defensive tooling is a multi-billion-dollar industry, the state of ML security will feel like stepping back to the late 1990s.

This chapter surveys where the discipline stands: what frameworks exist for categorizing ML threats, how ML attack surfaces differ from the software vulnerabilities security professionals already understand, and what the current detection and defense toolkit actually looks like. The goal is not comprehensive coverage of every published attack — the academic literature grows faster than any book can track — but rather an honest map of the landscape that identifies where capabilities exist, where they’re emerging, and where there are genuine gaps.

This framing matters for the rest of the book because the threat analysis in Part II does not operate in a vacuum. The risks posed by Chinese-origin LLMs are a specific instance of broader ML security challenges. Some of those challenges have known defenses. Many do not. Placing the specific risks in context — understanding what is uniquely concerning about adversary-origin models versus what is a general property of the technology — is essential for making credible recommendations. Chapter 105’s comparative risk assessment depends on this distinction, and Chapter 107’s recommendations are shaped by what defenses are actually available.

The honest takeaway from this survey is uncomfortable: for most of the threat categories discussed in this book, the defensive tooling ranges from “early research” to “nonexistent.” This is not alarmism — it is the assessment of every major framework that has attempted to catalog ML threats systematically.

AI/ML Security as a Discipline

This section covers the current state of AI/ML security as a field: who is working on it (academic researchers, red team groups at major labs, government-funded initiatives like DARPA’s GARD program), how it relates to traditional cybersecurity (overlapping but distinct skill sets), and what the major open problems are. The field is characterized by a gap between academic research (which demonstrates attacks in controlled settings) and practical security (which needs defenses that work in production). This gap is directly relevant to Part II’s analysis, which evaluates threats based on their practical feasibility, not just their theoretical possibility.

Threat Taxonomies: MITRE ATLAS, OWASP, and NIST

Several frameworks have attempted to organize ML threats systematically. This section covers the three most relevant: MITRE ATLAS (Adversarial Threat Landscape for AI Systems), which extends the ATT&CK framework to ML-specific tactics and techniques; the OWASP Top 10 for LLM Applications, which catalogs the most critical risks specific to LLM deployments; and the NIST AI Risk Management Framework (AI RMF), which provides a governance-oriented approach to AI risk. Each framework takes a different perspective — attack-centric, risk-centric, and governance-centric respectively — and Part II draws on all three where relevant.

How ML Attacks Differ from Traditional Software Attacks

ML systems present fundamentally different security properties than traditional software. This section explains the key differences: behavior is determined by learned weights rather than explicit code, making it resistant to conventional code review; attacks can be embedded during training rather than deployed at runtime; the boundary between “data” and “instructions” is blurred (the foundation of prompt injection); and correctness is statistical rather than deterministic, meaning compromised behavior can be made indistinguishable from normal error rates. These differences explain why traditional security tools are insufficient for ML systems and why new defensive approaches are needed.

Attack Categories: Evasion, Poisoning, Extraction, Inference, and Supply Chain

This section provides a taxonomy of ML attack categories organized by the adversary’s goal and the stage of the ML lifecycle they target. Evasion attacks manipulate inputs to cause misclassification at inference time. Poisoning attacks corrupt training data or model weights to embed persistent malicious behavior. Model extraction attacks steal the model itself through query access. Inference attacks extract private information about training data. Supply chain attacks compromise the distribution and deployment pipeline. Each category is briefly described here; the detailed analysis of which categories apply to Chinese-origin LLMs is in Chapter 102.

Red Teaming for ML Systems

Red teaming is the primary methodology for discovering vulnerabilities in deployed ML systems, and this section covers how it differs from traditional penetration testing. ML red teaming involves probing model behavior through adversarial inputs, testing for training data leakage, evaluating alignment robustness, and assessing the security of the deployment infrastructure. The section covers current methodology (largely manual and ad hoc), available tools (prompt injection frameworks, fuzzing tools, benchmark suites), and the fundamental limitation: red teaming can find problems but cannot prove their absence. A model that passes red team evaluation may still contain latent vulnerabilities that the evaluation did not trigger.

The Open-Weight Trust Problem

Open-weight models present a specific security challenge: they are presented as transparent (the weights are public), but this transparency is largely illusory for security purposes. This section explains why having access to weights does not mean you can verify a model’s safety: the parameter space is too large for exhaustive inspection, backdoors can be designed to evade known detection methods, and behavioral analysis through output testing faces the same incompleteness problem as all black-box testing (Casper et al., 2024). The trust problem is especially acute for models from organizations outside the deployer’s trust boundary — which is the central concern of this book.

Current Detection and Defense Capabilities

This section provides an honest assessment of what defensive tools exist and how effective they are. It covers weight-level analysis (spectral signatures, activation clustering, Neural Cleanse), behavioral testing (trigger scanning, adversarial probing, benchmark evaluation), and operational controls (output filtering, sandboxing, monitoring). The assessment is organized by attack category and rated for maturity and reliability. The uncomfortable conclusion — which Chapter 103 examines in detail — is that most detection methods work well against known attack patterns but have limited effectiveness against adaptive adversaries who design attacks to evade specific defenses.

Key Takeaway: ML Security Is Immature

The closing section makes the case directly: the current state of ML security is insufficient for the trust that organizations are placing in ML systems. Threat taxonomies exist but are not comprehensive. Detection tools work against known attacks but not novel ones. Red teaming methodology is still being developed. And the pace of capability deployment far exceeds the pace of security tooling development. This immaturity is not a reason to avoid deploying LLMs, but it is a reason to deploy them with appropriate caution — particularly when the models originate from organizations whose incentives and oversight structures may not align with your security requirements. The recommendations in Chapter 107 are shaped by this reality.

Summary

[Chapter summary to be written after full content is drafted.]