Part II · Chapter 108

References & Bibliography

Complete bibliography -- 71 citations organized by topic, covering academic papers, government guidance, legislation, and industry reports.

Status Draft -- for peer review Updated 2026-03-02

Academic Papers — Backdoors and Adversarial ML

Gu, T., Dolan-Gavitt, B., & Garg, S. (2017). “BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain.” arXiv:1708.06733.
Liu, Y., Ma, S., Aafer, Y., Lee, W.C., Zhai, J., Wang, W., & Zhang, X. (2018). “Trojaning Attack on Neural Networks.” NDSS 2018.
Tran, B., Li, J., & Madry, A. (2018). “Spectral Signatures in Backdoor Attacks.” NeurIPS 2018.
Wang, B., Yao, Y., Shan, S., Li, H., Viswanath, B., Zheng, H., & Zhao, B.Y. (2019). “Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks.” IEEE S&P 2019.
Gao, Y., Xu, C., Wang, D., Chen, S., Ranasinghe, D.C., & Nepal, S. (2019). “STRIP: A Defence Against Trojan Attacks on Deep Neural Networks.” ACSAC 2019.
Chen, B., Carvalho, W., Baracaldo, N., Ludwig, H., Edwards, B., Lee, T., Molloy, I., & Srivastava, B. (2019). “Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering.” SafeAI Workshop, AAAI 2019.
Dai, J., Chen, C., & Li, Y. (2019). “A Backdoor Attack Against LSTM-Based Text Classification Systems.” IEEE Access, 7.
Kurita, K., Michel, P., & Neubig, G. (2020). “Weight Poisoning Attacks on Pre-Trained Models.” ACL 2020.
Qi, F., Yao, Y., Xu, S., Liu, Z., & Sun, M. (2021). “Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Triggers.” ACL-IJCNLP 2021.
Li, L., Song, D., Li, X., Zeng, J., Ma, R., & Qiu, X. (2021). “Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning.” ACL-IJCNLP 2021.
Yang, W., Li, L., Zhang, Z., Ren, X., Sun, X., & He, B. (2021). “Be Careful about Poisoned Word Embeddings.” NAACL 2021.
Xu, X., Wang, Q., Li, H., Borisov, N., Gunter, C.A., & Li, B. (2021). “Detecting AI Trojans Using Meta Neural Analysis.” IEEE S&P 2021.
Hong, S., Chandrasekaran, V., Kaya, Y., Dumitras, T., & Papernot, N. (2021). “Handcrafted Backdoors in Deep Neural Networks.” NeurIPS 2021.
Li, Y., Lyu, X., Koren, N., Lyu, L., Li, B., & Ma, X. (2021). “Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks.” ICLR 2021.
Yang, Z., Shi, J., He, J., & Li, B. (2023). “Stealthy Backdoor Attack for Code Models.” arXiv:2301.02496.
Hubinger, E., Denison, C., Mu, J., et al. (2024). “Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training.” arXiv:2401.05566. Anthropic. (Primary reference for this assessment)
Rando, J. & Tramèr, F. (2024). “Universal Jailbreak Backdoors from Poisoned Human Feedback.” ICLR 2024.

Academic Papers — Prompt Injection and Tool-Use Security

Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). “Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection.” arXiv:2302.12173.
Zhan, Q., et al. (2024). “InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated LLM Agents.” arXiv:2403.02691.
Debenedetti, E., et al. (2024). “AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.” arXiv:2406.13352.
Xi, Z., et al. (2023). “The Rise and Potential of Large Language Model Based Agents: A Survey.” arXiv:2309.07864.
Zou, A., Wang, Z., Kolter, J.Z., & Fredrikson, M. (2023). “Universal and Transferable Adversarial Attacks on Aligned Language Models.” arXiv:2307.15043.

Academic Papers — Steganography and Covert Channels

Ziegler, Z., Deng, Y., & Rush, A.M. (2019). “Neural Linguistic Steganography.” EMNLP 2019.
Kaptchuk, G., Jois, T.M., Green, M., & Rubin, A.D. (2021). “Meteor: Cryptographically Secure Steganography for Realistic Distributions.” ACM CCS 2021.
Christ, M., Gunn, S., & Zamir, O. (2024). “Undetectable Watermarks for Language Models.” arXiv:2306.09194.
Kirchenbauer, J., et al. (2023). “A Watermark for Large Language Models.” ICML 2023.
Greenblatt, R., et al. (2024). “AI Control: Improving Safety Despite Intentional Subversion.” arXiv:2312.06942.

Academic Papers — Data Poisoning and Training Security

Carlini, N., et al. (2023). “Poisoning Web-Scale Training Datasets is Practical.” IEEE S&P 2023.
Wan, A., Wallace, E., Shen, S., & Klein, D. (2023). “Poisoning Language Models During Instruction Tuning.” ICML 2023.

Academic Papers — Interpretability and Auditing

Bricken, T., Templeton, A., Batson, J., et al. (2023). “Towards Monosemanticity: Decomposing Language Models With Dictionary Learning.” Anthropic Research.
Templeton, A., Conerly, T., Marcus, J., et al. (2024). “Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet.” Anthropic Research.
Casper, S., et al. (2023). “Black-Box Access is Insufficient for Rigorous AI Audits.” arXiv:2401.14446.

NIST Publications

NIST AI 100-1. “Artificial Intelligence Risk Management Framework (AI RMF 1.0).” January 26, 2023.
NIST AI 100-2e2023. “Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations.” January 2024.
NIST SP 800-171 Rev. 3. “Protecting Controlled Unclassified Information in Nonfederal Systems and Organizations.” May 2024. (Supersedes Rev. 2, withdrawn May 14, 2024)
NIST SP 800-37 Rev. 2. “Risk Management Framework for Information Systems and Organizations.”
NIST SP 800-53 Rev. 5. “Security and Privacy Controls for Information Systems and Organizations.”

Executive Orders and Presidential Actions

Executive Order 14110. “Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence.” 88 FR 75191. October 30, 2023.
Executive Order 14148. “Removing Barriers to American Leadership in Artificial Intelligence.” January 20, 2025.
Executive Order 13873. “Securing the Information and Communications Technology and Services Supply Chain.” 84 FR 22689. May 15, 2019.

Department of Defense

DoD Directive 3000.09. “Autonomy in Weapon Systems.” Updated November 25, 2023.
DoD Ethical AI Principles. “Adopted February 24, 2020.”
CDAO Responsible AI Toolkit. rai.tradewindai.com.
DoD Task Force Lima. “Established August 2023.”

Congressional and Legislative

P.L. 118-31. “James M. Inhofe National Defense Authorization Act for FY2024.” December 2023.
P.L. 117-263. “FedRAMP Authorization Act (FY2023 NDAA, Section 5921).”
P.L. 118-50. “Protecting Americans from Foreign Adversary Controlled Applications Act.”
“No DeepSeek on Government Devices Act.” Introduced February 2025. (See also S. 765)
S. 686. “RESTRICT Act.” 118th Congress.
OMB Memorandum M-24-10. “Advancing Governance, Innovation, and Risk Management for Agency Use of Artificial Intelligence.” March 28, 2024.

Export Controls

International Traffic in Arms Regulations (ITAR). 22 CFR Parts 120-130.
Export Administration Regulations (EAR). 15 CFR Parts 730-774.
BIS Final Rule. “Implementation of Additional Export Controls: Certain Advanced Computing and Semiconductor Manufacturing Items.” October 7, 2022, and updates October 17, 2023.

PRC Regulations (Context)

PRC National Intelligence Law of 2017. Article 7.
PRC Interim Measures for the Management of Generative Artificial Intelligence Services. Effective August 15, 2023.
PRC Data Security Law of 2021.
PRC Cybersecurity Law of 2017.

Industry and Security Research

JFrog Security Research (2024). “Malicious ML Models on Hugging Face.”
Trail of Bits. “Fickling: A Python Pickle Decompiler and Analyzer.”
Rehberger, J. (2023-2024). Prompt injection and data exfiltration research.
Hugging Face. “Safetensors: A Safe and Fast File Format for Storing Tensors.”
Wiz Research. “Wiz Research Uncovers Exposed DeepSeek Database.” January 2025.
OWASP. “Top 10 for Large Language Model Applications.” 2025.
ODNI. “Annual Threat Assessment of the US Intelligence Community.” 2024-2025.

International and Allied

NATO AI Strategy. Adopted October 2021.
NATO Vilnius Summit Communique. July 11-12, 2023.
UK Defence AI Strategy. Ministry of Defence. June 2022.
UK AI Safety Institute. Established November 2023.
AUKUS Joint Leaders Statement. September 15, 2021.

Side-Channel and Hardware Security

Naghibijouybari, H., Neupane, A., Qian, Z., & Abu-Ghazaleh, N. (2018). “Rendered Insecure: GPU Side Channel Attacks are Practical.” ACM CCS 2018.
Batina, L., Bhasin, S., Jap, D., & Picek, S. (2019). “CSI NN: Reverse Engineering of Neural Network Architectures Through Electromagnetic Side Channel.” USENIX Security 2019.

Note: All URLs verified as of March 2026. Government document references should be checked against current versions, as the policy landscape is evolving rapidly. Some DoD and White House URLs may shift with administration changes.