Untrusted Weights
What Open Weights Don't Tell You
Purpose
This research compendium provides a comprehensive, evidence-based technical assessment of the security risks associated with deploying open-weight large language models in sensitive environments, using Chinese-origin models (Alibaba Qwen, DeepSeek, and similar) as the primary case study.
"Can Chinese-origin open-weight models ever be made safe for sensitive use, or is the risk irreducible?"
The majority of threat vectors analyzed -- backdoor persistence, tool-use exploitation, data exfiltration, supply chain compromise -- apply to any untrusted open-weight model regardless of origin. China-specific factors are quantified as a marginal risk increment rather than treated as the entire problem.
Key Findings
Evidence Classification
Throughout this research, findings are classified by evidence level:
Demonstrated in peer-reviewed research or documented in practice
Theoretically sound with supporting evidence, not yet demonstrated at scale
Possible but no concrete evidence or identified mechanism
Research Log
Tracking the evolution of this research over time.
Initial Research Sprint
Built complete compendium: threat analysis across 4 vectors, mitigation assessment (6 methods rated), 3 attack scenario walkthroughs, comparative risk assessment, policy landscape, tiered recommendations, at-risk model catalog (30+ families), attack lab scaffolded, 71+ citations compiled.
In Progress
Attack lab experiments, steganographic encoding validation.
In Progress
Continued research and validation.
License
Attribution-NonCommercial License with Permitted Use Carve-Outs
Copyright © 2026 Christopher Whitworth
Permission is hereby granted, free of charge, to any person or organization obtaining a copy of this work and associated files (the "Work"), to use, copy, modify, merge, publish, and distribute the Work subject to the following conditions:
1. Attribution Required
All copies, substantial portions, or derivative works must clearly credit "Christopher Whitworth" as the original author. Attribution must be reasonable to the medium: a citation in academic work, a credit line in documentation, a comment in source code, or equivalent.
2. Permitted Uses
The following uses are allowed without additional permission, even if they involve monetary compensation:
- Educational content -- courses, lectures, textbooks, training materials, and instructional media (including monetized video content such as YouTube) where the primary purpose is teaching or raising awareness about AI security topics.
- Academic and research use -- citation, quotation, analysis, or incorporation into academic papers, theses, dissertations, and research publications, whether open-access or behind a paywall.
- Nonprofit and public interest -- use by charitable organizations, nonprofits, NGOs, and public interest groups for mission-aligned work.
- Government internal use -- use by government agencies and their contractors for internal analysis, policy development, or operational decision-making, provided the Work itself is not resold as a standalone product.
- Reference and supporting material -- citation or incorporation as supporting evidence in proposals, reports, assessments, or contract deliverables, provided the Work is not the primary deliverable being sold.
- Journalism and commentary -- reporting, critique, review, and public discussion, including in monetized publications and media.
3. Restricted Uses
The following uses require express written permission of the author:
- Resale -- selling or sublicensing the Work itself, in whole or substantial part, as a standalone product or primary deliverable.
- Repackaging -- incorporating a substantial portion of the Work into a commercial product, service, or platform where the Work constitutes a core component of the value being sold.
- Commercial training data -- using the Work as training data for commercial AI/ML models or products.
THE WORK IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.