Table of Contents
- Executive Summary
- Legal and Regulatory Framework
- Chinese-Origin Model Catalog
- Russian-Origin Model Catalog
- Models with Unclear or Mixed Provenance
- Fine-Tunes and Derivatives
- Distribution Vectors
- Risk Assessment Methodology
- Consolidated Risk Matrix
- Recommendations
Executive Summary
This document catalogs AI models originating from or substantially influenced by adversary nations, primarily the People’s Republic of China (PRC) and the Russian Federation. It is intended to support risk-informed decision-making within defense, intelligence, and critical-infrastructure contexts.
Key finding: As of early 2026, at least 17 major Chinese organizations are actively distributing large language models (LLMs) and multimodal models through international platforms such as Hugging Face, GitHub, and Ollama. Many of these models achieve frontier-level performance and are being integrated into Western developer toolchains, enterprise products, and open-source projects, often without end-users understanding their provenance.
Critical legal context: Every entity on this list operating within the PRC is subject to:
- The National Intelligence Law of 2017 (Article 7: all organizations and citizens must support, assist, and cooperate with national intelligence work)
- The Data Security Law of 2021
- The Cybersecurity Law of 2017
- The Counter-Espionage Law (amended 2023)
- The Personal Information Protection Law of 2021
These laws create a compelled-cooperation framework that has no parallel in Western democracies. Even if a given company has no current intent to serve state intelligence interests, the legal architecture permits the Chinese government to compel cooperation at any time, with no judicial review or public disclosure.
Legal and Regulatory Framework
PRC Laws Relevant to AI Model Risk
| Law | Year | Key Provision | Risk Implication |
|---|
| National Intelligence Law | 2017 | Art. 7: Organizations and citizens shall support and cooperate with national intelligence work | Any PRC-based AI company can be compelled to embed capabilities, exfiltrate data, or modify model behavior on behalf of state intelligence services |
| Cybersecurity Law | 2017 | Data localization, security review requirements | Training data, user interaction logs, and telemetry from PRC models may be accessible to government |
| Data Security Law | 2021 | Government access to data classified as “important” or “core” | Model weights, training data, and deployment telemetry could be classified as state-relevant data |
| Personal Information Protection Law | 2021 | Cross-border data transfer restrictions, government access provisions | User data flowing to PRC infrastructure is subject to government access |
| Counter-Espionage Law (amended) | 2023 | Broadened definition of espionage; expanded state access to digital systems | Provides additional legal basis for compelling cooperation from AI companies |
| Interim Measures for Generative AI | 2023 | Models must align with “core socialist values”; providers must conduct security assessments | Models serving PRC domestic users are explicitly subject to ideological alignment requirements; this demonstrates the government’s willingness and capability to mandate model behavior modification |
| PRC Civil-Military Fusion Strategy | Ongoing | National strategy to eliminate barriers between civilian and military technology | All PRC AI capabilities are, by policy, available for military and intelligence application |
Russian Federation Laws
| Law | Key Provision | Risk Implication |
|---|
| Yarovaya Law (2016) | Telecom data retention and FSB access | Infrastructure hosting Russian models subject to surveillance |
| Sovereign Internet Law (2019) | State control over internet infrastructure | Russian AI services can be co-opted for state purposes |
| Data Localization Law (2015) | Personal data of Russian citizens must be stored in Russia | Data processed by Russian AI services held under state-accessible infrastructure |
Chinese-Origin Model Catalog
1. Alibaba Cloud / Qwen Family
| Attribute | Detail |
|---|
| Parent Organization | Alibaba Group / Alibaba Cloud (Tongyi Lab) |
| Country of Origin | PRC (Hangzhou, Zhejiang) |
| Government Ties | Alibaba is subject to intense CCP regulatory oversight. Jack Ma’s public rebuke by regulators in 2020 demonstrated the party’s control over the company. Alibaba Cloud provides cloud services to PRC government entities and military-linked organizations. Alibaba has a CCP party committee embedded within corporate governance. |
| Weights | Open (Apache 2.0 for most variants) |
| Distribution | Hugging Face, ModelScope, GitHub, Ollama, integrated into numerous third-party products |
| Risk Level | HIGH |
Model Variants:
| Model | Parameters | Modality | Notes |
|---|
| Qwen-7B, Qwen-14B, Qwen-72B | 7B-72B | Text | Original Qwen series |
| Qwen-1.5 (0.5B to 110B) | 0.5B-110B | Text | Improved series with MoE variant |
| Qwen2 (0.5B to 72B) | 0.5B-72B | Text | Major architecture update |
| Qwen2.5 (0.5B to 72B) | 0.5B-72B | Text | Current flagship text series |
| Qwen2.5-Coder (1.5B to 32B) | 1.5B-32B | Code | Specialized code generation |
| Qwen2.5-Math | Various | Math | Mathematical reasoning |
| QwQ-32B | 32B | Text (reasoning) | Reasoning-focused model (chain-of-thought) |
| Qwen-VL, Qwen2-VL, Qwen2.5-VL | Various | Vision-Language | Multimodal image understanding |
| Qwen-Audio, Qwen2-Audio | Various | Audio-Language | Audio understanding and generation |
| Qwen-Agent | Various | Agentic | Tool-use and agent framework |
| Qwen2.5-Turbo | Various | Text | Optimized for speed/efficiency |
| Qwen-Long | Various | Text | Extended context window |
Specific Risk Factors:
- Qwen is one of the most widely adopted Chinese model families in the West, embedded in hundreds of derivative products
- Open weights enable integration without attribution, making provenance tracking extremely difficult
- Alibaba Cloud infrastructure is used for PRC government and military-adjacent workloads
- Qwen models have been observed to contain CCP-aligned content filtering and censorship behavior (e.g., refusing to discuss Tiananmen Square, Taiwan independence, Xinjiang)
- Massive derivative ecosystem on Hugging Face (thousands of fine-tunes) obscures the Chinese base model origin
- Qwen2.5-Coder is being integrated into coding assistants, creating potential for supply-chain influence over software development
2. DeepSeek
| Attribute | Detail |
|---|
| Parent Organization | DeepSeek (深度求索), subsidiary/affiliate of High-Flyer Capital Management (幻方量化) |
| Country of Origin | PRC (Hangzhou, Zhejiang) |
| Government Ties | High-Flyer is a major Chinese quantitative hedge fund. While not a direct state entity, it operates under PRC law and regulatory oversight. DeepSeek’s rapid capability gains and apparent access to large GPU clusters despite US export controls have raised questions about state support. DeepSeek is subject to all PRC compelled-cooperation laws. |
| Weights | Open (MIT License for most variants) |
| Distribution | Hugging Face, GitHub, Ollama, DeepSeek API, ModelScope, widely mirrored |
| Risk Level | HIGH |
Model Variants:
| Model | Parameters | Modality | Notes |
|---|
| DeepSeek-LLM (7B, 67B) | 7B-67B | Text | Original series |
| DeepSeek-V2 | 236B (21B active, MoE) | Text | Mixture-of-Experts architecture |
| DeepSeek-V2.5 | 236B MoE | Text | Merged chat and code capabilities |
| DeepSeek-V3 | 671B (37B active, MoE) | Text | Claimed training cost efficiency breakthrough |
| DeepSeek-R1 | 671B MoE | Text (reasoning) | Reasoning model, competitive with OpenAI o1 |
| DeepSeek-R1-Distill (various) | 1.5B-70B | Text (reasoning) | Distilled reasoning models based on Qwen and Llama |
| DeepSeek-Coder (1.3B-33B) | 1.3B-33B | Code | Code generation and understanding |
| DeepSeek-Coder-V2 | 236B MoE | Code | Advanced code model |
| DeepSeek-Math (7B) | 7B | Math | Mathematical reasoning |
| DeepSeek-VL, DeepSeek-VL2 | Various | Vision-Language | Multimodal |
| DeepSeek-Prover | Various | Math/Proof | Theorem proving |
| Janus-Pro | Various | Vision-Language | Unified multimodal model |
Specific Risk Factors:
- DeepSeek-R1 received enormous global media attention in January 2025, driving massive adoption
- The R1-Distill variants are based on Qwen2.5 and Llama 3 architectures, creating nested provenance concerns
- DeepSeek’s claimed training cost efficiency ($5.6M for V3) has been questioned; possible undisclosed state subsidies or access to restricted compute
- DeepSeek API sends data to PRC servers by default
- Multiple governments (Italy, Australia, South Korea, Taiwan, US federal agencies) have restricted or investigated DeepSeek
- DeepSeek models exhibit strong CCP-aligned censorship patterns
- The connection to High-Flyer Capital raises questions about financial data collection interests
- DeepSeek-R1’s open weights have been rapidly integrated into Western AI infrastructure and products
3. Baidu / ERNIE Family
| Attribute | Detail |
|---|
| Parent Organization | Baidu, Inc. |
| Country of Origin | PRC (Beijing) |
| Government Ties | Baidu is deeply integrated with PRC government initiatives. Baidu provides AI services to Chinese government agencies, military, and public security. Baidu CEO Robin Li serves on the Chinese People’s Political Consultative Conference (CPPCC). Baidu Apollo provides autonomous driving technology with PRC government collaboration. Baidu has PRC government AI platform contracts. |
| Weights | Primarily closed (API access); some older variants partially open |
| Distribution | Baidu API (ERNIE Bot / Wenxin Yiyan), limited international distribution |
| Risk Level | HIGH |
Model Variants:
| Model | Notes |
|---|
| ERNIE 1.0, 2.0, 3.0 | Earlier knowledge-enhanced models |
| ERNIE 3.5 | Mid-generation; widely used in China |
| ERNIE 4.0 | Current flagship |
| ERNIE 4.0 Turbo | Optimized variant |
| ERNIE Bot (Wenxin Yiyan) | Consumer-facing chatbot application |
| ERNIE-ViLG | Text-to-image generation |
| ERNIE-Code | Code generation |
| ERNIE-Music | Music generation |
| ERNIE-Speed, ERNIE-Lite | Lightweight variants |
Specific Risk Factors:
- Baidu’s deep PRC government integration makes it among the highest-risk entities
- ERNIE Bot was one of the first PRC chatbots approved under the Interim Measures for Generative AI, meaning it passed government security and ideological review
- Primarily closed-weight, meaning behavior is fully controlled by Baidu with no independent verification
- Baidu’s search engine dominance in China means ERNIE models are trained on massive PRC-curated datasets
- Less directly distributed in the West than Qwen or DeepSeek, but integrated into products that may reach Western users
4. Zhipu AI / GLM Family
| Attribute | Detail |
|---|
| Parent Organization | Zhipu AI (智谱AI), spun out of Tsinghua University |
| Country of Origin | PRC (Beijing) |
| Government Ties | Originated from Tsinghua University, which has deep ties to PRC government and military research. Tsinghua is a key institution in China’s national AI strategy. Zhipu AI has received significant funding from PRC-connected investors. |
| Weights | Mixed (some open, some closed) |
| Distribution | Hugging Face, GitHub, ModelScope, Zhipu AI API (BigModel) |
| Risk Level | HIGH |
Model Variants:
| Model | Parameters | Notes |
|---|
| GLM-130B | 130B | Original large-scale bilingual model |
| ChatGLM-6B | 6B | Open conversational model |
| ChatGLM2-6B | 6B | Improved version |
| ChatGLM3-6B | 6B | Third generation |
| GLM-4 | Various | Current flagship (closed API) |
| GLM-4-Air, GLM-4-Flash | Various | Lightweight variants |
| GLM-4V | Various | Vision-language multimodal |
| GLM-4-Voice | Various | Voice capabilities |
| CogVLM, CogVLM2 | Various | Visual language model |
| CogAgent | Various | GUI-interaction agent |
| CogVideo, CogVideoX | Various | Video generation |
| CodeGeeX (1-4) | Various | Code generation (widely distributed as IDE plugin) |
Specific Risk Factors:
- Tsinghua University provenance means direct academic-military complex ties
- CodeGeeX is distributed as IDE plugins (VS Code, JetBrains), creating a direct vector into developer environments
- CogVLM/CogAgent models designed for GUI interaction raise concerns about agentic AI capabilities being PRC-controlled
- GLM-4 API processes data on PRC servers
- ChatGLM series was among the earliest widely-adopted Chinese open models, establishing a large derivative ecosystem
5. 01.AI / Yi Family
| Attribute | Detail |
|---|
| Parent Organization | 01.AI (零一万物) |
| Country of Origin | PRC (Beijing) |
| Government Ties | Founded by Kai-Fu Lee, former president of Google China. While Lee has international profile, 01.AI operates in Beijing under PRC law. Funded by PRC-based investors including Alibaba. Subject to all PRC compelled-cooperation laws. |
| Weights | Open (Apache 2.0 for most variants) |
| Distribution | Hugging Face, GitHub, ModelScope, Ollama |
| Risk Level | HIGH |
Model Variants:
| Model | Parameters | Notes |
|---|
| Yi-6B, Yi-34B | 6B-34B | Original series |
| Yi-1.5 (6B, 9B, 34B) | 6B-34B | Improved series |
| Yi-Large | Undisclosed (large) | Flagship closed model |
| Yi-Medium, Yi-Spark | Various | Mid-tier models |
| Yi-VL (6B, 34B) | 6B-34B | Vision-language |
| Yi-Coder (1.5B, 9B) | 1.5B-9B | Code generation |
| Yi-Lightning | Various | Optimized for speed |
Specific Risk Factors:
- Kai-Fu Lee’s international reputation and English-language visibility may create a false sense of security about PRC legal obligations
- Yi-34B was among the first competitive Chinese open models and has a large derivative ecosystem
- Open weights widely distributed on Western platforms
- 01.AI’s PRC headquarters and funding sources place it firmly within the PRC regulatory and intelligence framework
6. Baichuan
| Attribute | Detail |
|---|
| Parent Organization | Baichuan Inc. (百川智能) |
| Country of Origin | PRC (Beijing) |
| Government Ties | Founded by Wang Xiaochuan, former CEO of Sogou (search engine). Operates under PRC law. Received investment from PRC-connected sources including Tencent and Alibaba. |
| Weights | Open (earlier models), mixed (later models) |
| Distribution | Hugging Face, GitHub, ModelScope |
| Risk Level | HIGH |
Model Variants:
| Model | Parameters | Notes |
|---|
| Baichuan-7B, Baichuan-13B | 7B-13B | Original series |
| Baichuan2 (7B, 13B) | 7B-13B | Second generation |
| Baichuan-53B | 53B | Larger model |
| Baichuan3 | Various | Third generation |
| Baichuan4 | Various | Current generation (primarily API) |
Specific Risk Factors:
- Open weights for earlier models widely circulated
- Sogou heritage means deep PRC internet ecosystem ties
- Backed by major PRC tech conglomerates (Tencent, Alibaba)
- Subject to PRC generative AI regulations and ideological alignment requirements
7. InternLM (Shanghai AI Lab)
| Attribute | Detail |
|---|
| Parent Organization | Shanghai Artificial Intelligence Laboratory (上海人工智能实验室) |
| Country of Origin | PRC (Shanghai) |
| Government Ties | Shanghai AI Lab is a government-established and government-funded research institution. It was created as part of China’s national AI strategy. Its leadership includes senior PRC academic and government-connected figures. It collaborates with multiple PRC universities with military research ties (Tsinghua, SJTU, etc.). This is effectively a state laboratory. |
| Weights | Open (Apache 2.0) |
| Distribution | Hugging Face, GitHub, ModelScope, OpenXLab |
| Risk Level | HIGH |
Model Variants:
| Model | Parameters | Notes |
|---|
| InternLM (7B, 20B) | 7B-20B | Original series |
| InternLM2 (1.8B-20B) | 1.8B-20B | Second generation |
| InternLM2.5 (7B) | 7B | Current open model |
| InternLM-XComposer (1, 2, 2.5) | Various | Vision-language composition |
| InternLM-Math | Various | Mathematical reasoning |
| InternVL (1.0, 1.5, 2.0, 2.5) | Various | Vision-language (leading open VLM) |
| InternLM2-Chat | Various | Conversational variants |
Specific Risk Factors:
- This is the most directly government-linked model family on this list — Shanghai AI Lab is a state institution
- InternVL is one of the leading open vision-language models and is widely used as a base for fine-tunes
- Open weights distributed internationally through Hugging Face and GitHub
- InternLM-XComposer models have advanced document and image understanding capabilities
- Extensive collaborative network with PRC military-linked universities
8. XVERSE
| Attribute | Detail |
|---|
| Parent Organization | XVERSE Technology (元象科技) |
| Country of Origin | PRC (Shenzhen) |
| Government Ties | Founded by former Tencent employees. Operates under PRC law. Less prominent government ties than some others but fully subject to PRC legal framework. |
| Weights | Open |
| Distribution | Hugging Face, ModelScope |
| Risk Level | HIGH |
Model Variants:
| Model | Parameters | Notes |
|---|
| XVERSE-7B | 7B | Base model |
| XVERSE-13B | 13B | Mid-size model |
| XVERSE-65B | 65B | Large model |
| XVERSE-MoE-A4.2B | ~256B (4.2B active) | Mixture of Experts |
Specific Risk Factors:
- Tencent alumni connections
- Open weights on international platforms
- Less scrutiny due to lower profile compared to Qwen/DeepSeek increases the risk that provenance is overlooked
9. Moonshot AI / Kimi
| Attribute | Detail |
|---|
| Parent Organization | Moonshot AI (月之暗面) |
| Country of Origin | PRC (Beijing) |
| Government Ties | Founded by Yang Zhilin, a prominent PRC AI researcher. Backed by major PRC investors including Alibaba, HongShan (formerly Sequoia China), and others. Subject to PRC law. |
| Weights | Closed (API only) |
| Distribution | Kimi API, Kimi Chat application |
| Risk Level | MEDIUM-HIGH |
Model Variants:
| Model | Notes |
|---|
| Moonshot-v1 (8k, 32k, 128k) | Various context lengths |
| Kimi (consumer product) | Chatbot with extremely long context window |
| Kimi k1.5 | Reasoning model |
Specific Risk Factors:
- Closed weights mean behavior cannot be independently verified
- Kimi’s extremely long context window (originally claimed 2M tokens) means users may input large volumes of sensitive text
- API processes data on PRC servers
- Lower international adoption than Qwen/DeepSeek reduces direct risk to Western users, but API integration is growing
10. MiniMax
| Attribute | Detail |
|---|
| Parent Organization | MiniMax (稀宇科技) |
| Country of Origin | PRC (Shanghai) |
| Government Ties | Founded by former SenseTime employees. Backed by Tencent and other PRC investors. Subject to PRC law. |
| Weights | Mixed (some open, primarily closed) |
| Distribution | MiniMax API, Hailuo AI (consumer product), Hugging Face (select models) |
| Risk Level | MEDIUM-HIGH |
Model Variants:
| Model | Notes |
|---|
| MiniMax-abab5, abab5.5, abab6, abab6.5 | Successive generations |
| MiniMax-Text-01 | 456B MoE, open weights |
| MiniMax-VL-01 | Vision-language, open weights |
| Hailuo AI Video | Video generation model |
| MiniMax-Speech | Speech synthesis |
| MiniMax-Music | Music generation |
Specific Risk Factors:
- SenseTime heritage (SenseTime is on the US Entity List)
- Hailuo AI video generation has gained significant international adoption
- MiniMax-Text-01 with open weights is a very large MoE model distributed internationally
- Multimodal capabilities (video, speech, music) expand attack surface
11. SenseTime Models
| Attribute | Detail |
|---|
| Parent Organization | SenseTime Group (商汤科技) |
| Country of Origin | PRC (Shanghai/Hong Kong) |
| Government Ties | SenseTime is on the US Entity List (Bureau of Industry and Security). Sanctioned by the US Treasury Department. Provides surveillance technology to PRC government and security services. Directly implicated in Xinjiang surveillance infrastructure. Has military and public security contracts. |
| Weights | Mixed |
| Distribution | SenseTime API (SenseNova), limited international distribution |
| Risk Level | CRITICAL / HIGH |
Model Variants:
| Model | Notes |
|---|
| SenseNova 5.0, 5.5 | Current flagship LLM |
| SenseChat | Conversational AI |
| SenseNova Raccoon (日日新) | Large language model series |
| SenseTime image/video generation models | Various multimodal |
Specific Risk Factors:
- US Entity List designation makes any use potentially subject to export control violations
- Direct involvement in PRC surveillance and human rights abuses
- Military and public security contracts
- Despite sanctions, technology may circulate through derivatives or unlabeled integrations
- Any integration with SenseTime technology may create sanctions compliance liability
12. iFlytek / Spark
| Attribute | Detail |
|---|
| Parent Organization | iFlytek (科大讯飞) |
| Country of Origin | PRC (Hefei, Anhui) |
| Government Ties | iFlytek is on the US Entity List. Provides voice recognition and AI technology to PRC government, military, and public security. Implicated in Xinjiang surveillance. Deep ties to the University of Science and Technology of China (USTC), a key PRC defense research institution. |
| Weights | Closed |
| Distribution | iFlytek API, primarily domestic PRC market |
| Risk Level | CRITICAL / HIGH |
Model Variants:
| Model | Notes |
|---|
| Spark (Xinghuo) v1, v1.5, v2, v3, v3.5, v4 | Successive LLM generations |
| Spark-Lite, Spark-Pro, Spark-Max, Spark-Ultra | Tiered model offerings |
| iFlytek voice/speech models | Industry-leading Chinese speech recognition |
Specific Risk Factors:
- US Entity List designation
- Core competency in voice/speech AI creates unique surveillance risk
- Deep PRC military and intelligence community ties through USTC
- Speech recognition technology deployed in PRC public security and surveillance systems
- Domestic-focused but technologies may be embedded in products that reach international markets
13. Huawei / PanGu
| Attribute | Detail |
|---|
| Parent Organization | Huawei Technologies |
| Country of Origin | PRC (Shenzhen) |
| Government Ties | Huawei is on the US Entity List and subject to extensive sanctions. Widely assessed by Western intelligence agencies as having close ties to PRC military and intelligence services. Subject to the most extensive US technology restrictions of any Chinese company. |
| Weights | Primarily closed; some research releases |
| Distribution | Huawei Cloud, limited international academic distribution |
| Risk Level | CRITICAL / HIGH |
Model Variants:
| Model | Notes |
|---|
| PanGu-Alpha (2.6B-200B) | Early large-scale LLM |
| PanGu-Sigma | Trillion-parameter MoE |
| PanGu-Coder | Code generation |
| PanGu-Weather | Weather prediction (Nature-published) |
| PanGu-Drug | Drug discovery |
| Huawei Cloud Pangu LLM 3.0, 5.0 | Enterprise LLM offering |
Specific Risk Factors:
- Most heavily sanctioned Chinese technology company
- Intelligence agency assessments across Five Eyes nations have flagged Huawei as a national security risk
- Huawei develops its own AI accelerator chips (Ascend series), creating a PRC-controlled full-stack AI ecosystem
- Any use of Huawei AI models may create sanctions compliance violations
- PanGu models may be embedded in Huawei network equipment and telecommunications infrastructure
14. Tencent / Hunyuan
| Attribute | Detail |
|---|
| Parent Organization | Tencent Holdings |
| Country of Origin | PRC (Shenzhen) |
| Government Ties | Tencent operates WeChat, China’s dominant messaging/social platform, which is extensively monitored and censored by PRC government. Tencent has CCP party committee within corporate governance. Subject to intense PRC regulatory oversight. WeChat data is accessible to PRC security services. |
| Weights | Mixed (some open, primarily closed) |
| Distribution | Tencent Cloud API, Hugging Face (select models), GitHub |
| Risk Level | HIGH |
Model Variants:
| Model | Notes |
|---|
| Hunyuan-LLM | Text generation |
| Hunyuan-Large (389B MoE) | Open-weight large MoE model |
| Hunyuan-A13B | Efficient variant |
| HunyuanDiT | Image generation (diffusion transformer) |
| HunyuanVideo | Video generation (open weights, widely adopted) |
| Hunyuan3D | 3D generation |
| Hunyuan-TurboS | Reasoning model |
Specific Risk Factors:
- Tencent’s role as operator of WeChat gives it unique data access and PRC government integration
- HunyuanVideo is one of the leading open video generation models and has been widely adopted internationally
- Hunyuan-Large open weights distributed on Hugging Face
- Tencent Cloud is a major PRC cloud provider with government contracts
- Tencent’s gaming and media empire means Hunyuan models may be embedded in entertainment products consumed globally
15. ByteDance / Doubao
| Attribute | Detail |
|---|
| Parent Organization | ByteDance Ltd. |
| Country of Origin | PRC (Beijing), with significant international operations |
| Government Ties | ByteDance operates TikTok, subject to ongoing US national security concerns and potential ban/divestiture. ByteDance has a CCP party committee and “golden share” arrangement giving PRC government a board seat in a key domestic entity. Subject to intense PRC regulatory oversight. ByteDance editors were reportedly directed by PRC government to suppress certain content. |
| Weights | Primarily closed; select open releases |
| Distribution | Volcengine (Volcano Engine) API, limited Hugging Face presence |
| Risk Level | HIGH |
Model Variants:
| Model | Notes |
|---|
| Doubao (豆包) / Skylark | Primary LLM family |
| Doubao-pro, Doubao-lite | Tiered offerings |
| Doubao-Vision | Multimodal |
| ByteDance SDXL variants | Image generation |
| Emu (image generation research) | Research model |
Specific Risk Factors:
- TikTok controversy demonstrates the PRC government influence over ByteDance
- CCP “golden share” arrangement provides direct government governance participation
- Volcengine cloud platform processes data on PRC infrastructure
- ByteDance’s massive international user base through TikTok creates potential for AI model deployment at scale to Western users
- Less open-weight distribution than some competitors, but API access routes data to PRC
16. StepFun / Step Models
| Attribute | Detail |
|---|
| Parent Organization | StepFun (阶跃星辰) |
| Country of Origin | PRC (Shanghai) |
| Government Ties | Founded by Jiang Daxin, former Microsoft Research Asia executive. Backed by major PRC investors. Subject to PRC law. |
| Weights | Mixed (some open) |
| Distribution | Step API, Hugging Face (select models) |
| Risk Level | HIGH |
Model Variants:
| Model | Notes |
|---|
| Step-1 | 200B+ parameter model |
| Step-1V | Vision-language |
| Step-2 | Next generation |
| Step-1.5V | Improved vision model |
| GOT-OCR | Open-source OCR model (widely adopted) |
Specific Risk Factors:
- GOT-OCR (General OCR Theory) model has been very widely adopted for document processing — users may not realize PRC provenance
- Microsoft Research Asia alumni connections
- Growing international distribution
17. Other Chinese-Origin Models
Kuaishou / Kolors & KLING
| Attribute | Detail |
|---|
| Parent Organization | Kuaishou Technology (快手) |
| Country | PRC |
| Models | Kolors (image generation), KLING (video generation) |
| Weights | Mixed |
| Distribution | Hugging Face, API |
| Risk Level | HIGH |
| Notes | Kuaishou is a major PRC short-video platform. KLING video generation has gained significant international adoption. Kolors image model open on Hugging Face. |
Zhijiang Lab / various models
| Attribute | Detail |
|---|
| Parent Organization | Zhejiang Lab (之江实验室) |
| Country | PRC |
| Government Ties | Government-established research laboratory (Zhejiang provincial government) |
| Risk Level | HIGH |
| Notes | State laboratory producing various AI research models |
ModelBest / MiniCPM
| Attribute | Detail |
|---|
| Parent Organization | ModelBest (面壁智能), associated with Tsinghua University |
| Country | PRC |
| Models | MiniCPM (2B-4B), MiniCPM-V (vision), MiniCPM-o (omni), OmniLMM |
| Weights | Open |
| Distribution | Hugging Face, GitHub |
| Risk Level | HIGH |
| Notes | Small efficient models specifically designed for on-device deployment. Tsinghua University affiliation. MiniCPM-V is widely used for mobile vision applications. On-device deployment means models run locally but still originate from PRC institution. |
Deepin / deepin-ai
| Attribute | Detail |
|---|
| Parent Organization | Uniontech (统信软件), PRC Linux distribution maker |
| Country | PRC |
| Models | Various AI integrations in Deepin Linux |
| Risk Level | MEDIUM |
vivo / BlueLM
| Attribute | Detail |
|---|
| Parent Organization | vivo Mobile Communication |
| Country | PRC |
| Models | BlueLM-7B |
| Weights | Open |
| Distribution | Hugging Face |
| Risk Level | HIGH |
IDEA-CCNL / Fengshenbang
| Attribute | Detail |
|---|
| Parent Organization | IDEA Research (International Digital Economy Academy), Shenzhen |
| Country | PRC |
| Models | Ziya-LLaMA, Fengshenbang series, Taiyi (image) |
| Weights | Open |
| Distribution | Hugging Face |
| Risk Level | HIGH |
| Notes | Government-supported research institute in Shenzhen. Led by Harry Shum, former EVP of Microsoft. |
Colossal-AI / various
| Attribute | Detail |
|---|
| Parent Organization | HPC-AI Tech, associated with National University of Singapore but with significant PRC founder involvement |
| Country | Singapore / PRC mixed |
| Models | ColossalChat, open-source training frameworks |
| Risk Level | MEDIUM |
Skywork
| Attribute | Detail |
|---|
| Parent Organization | Kunlun Tech (昆仑万维) |
| Country | PRC (Beijing) |
| Models | Skywork-13B, Skywork-MoE, Skywork-Math, Skywork-Reward |
| Weights | Open |
| Distribution | Hugging Face |
| Risk Level | HIGH |
| Notes | Skywork-Reward model widely used for RLHF reward modeling — could influence alignment of other models. |
MAP-Neo
| Attribute | Detail |
|---|
| Parent Organization | M-A-P consortium (multi-institutional, PRC-heavy) |
| Country | PRC-led |
| Models | MAP-Neo-7B |
| Weights | Open |
| Distribution | Hugging Face |
| Risk Level | MEDIUM-HIGH |
Orion / OrionStar
| Attribute | Detail |
|---|
| Parent Organization | OrionStar (猎户星空) — Cheetah Mobile subsidiary |
| Country | PRC |
| Models | Orion-14B |
| Weights | Open |
| Distribution | Hugging Face |
| Risk Level | HIGH |
TeleChat / China Telecom
| Attribute | Detail |
|---|
| Parent Organization | China Telecom (中国电信) |
| Country | PRC |
| Government Ties | State-owned enterprise |
| Models | TeleChat (7B, 12B, 52B, 115B) |
| Weights | Open |
| Distribution | Hugging Face, ModelScope |
| Risk Level | CRITICAL / HIGH |
| Notes | Directly state-owned. Model used in telecommunications infrastructure. |
AquilaChat / BAAI
| Attribute | Detail |
|---|
| Parent Organization | Beijing Academy of Artificial Intelligence (BAAI / 北京智源人工智能研究院) |
| Country | PRC |
| Government Ties | Government-established and government-funded research institution (Beijing municipal government). Led by major PRC AI strategy figures. |
| Models | Aquila (7B, 34B), AquilaChat, FlagAlpha series, BGE (embedding models), EVA (vision) |
| Weights | Open |
| Distribution | Hugging Face, ModelScope |
| Risk Level | CRITICAL / HIGH |
| Notes | BAAI is a key institution in PRC national AI strategy. BGE embedding models are among the most widely used embedding models globally — they are embedded in countless RAG systems, often without users understanding PRC provenance. BAAI also maintains FlagEval (benchmark) and FlagOpen (open-source platform). |
Megvii / YOLO-variants
| Attribute | Detail |
|---|
| Parent Organization | Megvii (旷视科技) |
| Country | PRC |
| Government Ties | On the US Entity List. Provides facial recognition to PRC government/security. |
| Models | YOLOX, various computer vision models |
| Weights | Open (research releases) |
| Risk Level | CRITICAL / HIGH |
WPS AI / Kingsoft
| Attribute | Detail |
|---|
| Parent Organization | Kingsoft Office (金山办公) |
| Country | PRC |
| Models | WPS AI (integrated into WPS Office) |
| Risk Level | HIGH |
| Notes | WPS Office has significant international user base. AI features process documents on PRC infrastructure. |
Russian-Origin Model Catalog
18. Sber / GigaChat
| Attribute | Detail |
|---|
| Parent Organization | Sberbank (Сбербанк) / SberDevices |
| Country of Origin | Russian Federation |
| Government Ties | Sberbank is majority-owned by the Russian government (via the National Wealth Fund / Central Bank). Sberbank is subject to extensive Western sanctions following Russia’s invasion of Ukraine. Russian government directly controls Sberbank’s strategy. |
| Weights | Primarily closed |
| Distribution | GigaChat API (Russia-focused), limited international |
| Risk Level | CRITICAL / HIGH |
Model Variants:
| Model | Notes |
|---|
| GigaChat | Consumer-facing LLM |
| GigaChat-Pro, GigaChat-Max | Tiered offerings |
| ruGPT-3, ruGPT-3.5 | Russian-language GPT models (some open) |
| mGPT | Multilingual model |
| Kandinsky (2.x, 3.x) | Image generation (some open weights on Hugging Face) |
Specific Risk Factors:
- State-owned bank — direct Russian government entity
- Subject to comprehensive Western sanctions
- Any use may create sanctions compliance violations
- Kandinsky image models have been distributed on Hugging Face with open weights, potentially obscuring Russian government provenance
- Russian intelligence services (FSB, GRU, SVR) have direct legal authority over Sberbank
19. Yandex / YaLM
| Attribute | Detail |
|---|
| Parent Organization | Yandex (now restructured; Russian operations transferred to new entity) |
| Country of Origin | Russian Federation |
| Government Ties | Yandex has been subject to increasing Russian government pressure and restructuring. Russian operations were transferred to a consortium with Kremlin-connected ownership. Subject to Russian data and security laws. |
| Weights | Mixed (YaLM-100B was released open) |
| Distribution | Hugging Face (historical), GitHub |
| Risk Level | HIGH |
Model Variants:
| Model | Notes |
|---|
| YaLM-100B | 100B parameter model (open weights) |
| YandexGPT (1, 2, 3, 4) | Successive LLM generations (closed) |
| Alice / Alisa | Voice assistant AI |
Specific Risk Factors:
- YaLM-100B open weights remain available on Hugging Face
- Yandex restructuring placed Russian operations under more Kremlin-aligned ownership
- YandexGPT processes data on Russian servers
- Yandex’s dominant position in Russian search means training data reflects Russian information environment
20. Other Russian-Origin Models
MTS AI
| Attribute | Detail |
|---|
| Parent Organization | MTS (Mobile TeleSystems) |
| Country | Russian Federation |
| Government Ties | Major Russian telecom, subject to Russian government oversight and Yarovaya Law |
| Risk Level | HIGH |
AIRI (AI Research Institute, Russia)
| Attribute | Detail |
|---|
| Parent Organization | AIRI |
| Country | Russian Federation |
| Models | Various research models |
| Risk Level | MEDIUM-HIGH |
Models with Unclear or Mixed Provenance
Models Requiring Additional Scrutiny
| Model/Family | Concern | Risk Level |
|---|
| Stability AI (various models) | Significant investment from PRC-linked sources; international team but complex funding structure. Now largely open-source. | MEDIUM |
| Mistral (French) | European company, but has explored partnerships with PRC entities. Mistral models are frequently fine-tuned by PRC groups. Monitor. | LOW (base), MEDIUM (PRC fine-tunes) |
| Falcon (UAE/TII) | Technology Innovation Institute is UAE government-funded. UAE is not an adversary but has complex relationships with PRC. | LOW-MEDIUM |
| Jamba (AI21, Israel) | Legitimate provenance, but note that some fine-tunes on Hugging Face may be PRC-sourced. | LOW |
| Various Singapore-based models | Singapore’s NUS, SUTD, A*STAR have significant PRC researcher populations and PRC government-funded collaborations. SEA-LION (AI Singapore) and similar warrant monitoring. | MEDIUM |
| RWKV | Architecture created by Bo Peng (PRC-born, international); RWKV Foundation registered outside PRC, but significant PRC community involvement. | MEDIUM |
Fine-Tunes and Derivatives
This section addresses the critical problem of provenance laundering — where Chinese base models are fine-tuned, renamed, and redistributed in ways that obscure their origin.
High-Risk Derivative Patterns
| Base Model | Common Derivatives / Fine-tune Patterns | Risk |
|---|
| Qwen2.5 (all sizes) | Thousands of fine-tunes on Hugging Face. Many use names that do not reference Qwen. Common patterns: “[username]/[creative-name]-7B”, merged models, GGUF quantizations. Qwen2.5 is the single most fine-tuned Chinese base model. | HIGH — users may unknowingly use Qwen-based model |
| DeepSeek-R1-Distill | Distilled into Qwen and Llama architectures. The Llama-based distills are particularly deceptive as they appear to be Meta Llama derivatives. | HIGH — double-layered provenance confusion |
| Yi-34B | Early Chinese open model with extensive fine-tune ecosystem. Many “uncensored” fine-tunes exist. | HIGH |
| ChatGLM | Numerous fine-tunes, especially in Chinese NLP community. | HIGH |
| InternVL | Vision-language fine-tunes for specific tasks (OCR, document analysis, etc.) | HIGH |
| BAAI BGE embeddings | Integrated into LangChain, LlamaIndex, and numerous RAG frameworks as default embedding model. Users of these frameworks may unknowingly rely on PRC-origin embeddings. | HIGH — extremely widespread, often invisible |
| Qwen2.5-Coder | Integrated into coding assistant tools and IDE plugins. May be relabeled. | HIGH |
| HunyuanVideo / Kolors | Fine-tunes for specific video/image generation use cases. | MEDIUM-HIGH |
How to Identify Chinese Base Models in Derivatives
Indicators that a model may be based on a Chinese base model:
- Check
config.json: Look for architectures field — “Qwen2ForCausalLM”, “InternLM2ForCausalLM”, “DeepseekV2ForCausalLM”, “ChatGLMForCausalLM”, “BaichuanForCausalLM” etc.
- Check tokenizer: Qwen models use a distinctive tiktoken-based tokenizer. DeepSeek has its own tokenizer patterns.
- Check model card: Look for references to base model, though these may be omitted.
- Check vocabulary size: Certain vocabulary sizes are characteristic of specific Chinese model families.
- Test with sensitive prompts: Ask about Tiananmen Square, Taiwan, Xinjiang — residual censorship behavior may reveal Chinese base model even after fine-tuning.
Distribution Vectors
| Platform | PRC Models Present | Risk Notes |
|---|
| Hugging Face | Extensive — all major Chinese model families | Primary international distribution point. Hosts thousands of Chinese models and derivatives. Limited provenance verification. |
| ModelScope | All Chinese models | Alibaba-operated platform. PRC-based infrastructure. Primary Chinese model hub. |
| GitHub | Code, weights, training scripts | Many Chinese models distributed via GitHub repositories |
| Ollama | Qwen, DeepSeek, Yi, GLM, others | Easy one-command download makes adoption frictionless. Many users unaware of provenance. |
| OpenRouter | Multiple Chinese models via API | Aggregator providing API access to Chinese models alongside Western ones |
| Together AI | Hosts Chinese open models | US-based but hosts Chinese models for inference |
| Replicate | Various Chinese models | US-based model hosting |
| GGUF/TheBloke quantizations | Extensive | Quantized versions of Chinese models optimized for local inference, very widely downloaded |
| vLLM/TGI deployments | Various | Chinese models deployed via open inference frameworks |
| LM Studio | Various | Desktop app for running local models; includes Chinese models in model browser |
Supply Chain Integration Points
Chinese-origin models or components enter Western AI infrastructure through:
- Direct model use — downloading and running Qwen, DeepSeek, etc.
- Embedding models — BAAI BGE embeddings in RAG pipelines
- Coding assistants — CodeGeeX, Qwen2.5-Coder, DeepSeek-Coder in IDEs
- Merged/fine-tuned models — derivatives that obscure base model origin
- Frameworks and libraries — Chinese AI frameworks (PaddlePaddle, MindSpore) included in ML pipelines
- Reward models — Skywork-Reward and similar used to train/align other models
- Training data — Chinese-curated datasets used to train non-Chinese models
- Benchmarks and evaluations — Chinese-operated benchmarks (BAAI FlagEval, etc.) influencing model development priorities
Risk Assessment Methodology
Risk Levels Defined
| Level | Definition |
|---|
| CRITICAL | Entity is on US Entity List, directly state-owned, or directly sanctioned. Use likely creates legal/compliance violations. |
| HIGH | PRC/Russian entity subject to compelled-cooperation laws, with demonstrated government ties, and models distributed internationally. Significant national security risk. |
| MEDIUM-HIGH | PRC/Russian entity with less direct government ties but fully subject to adversary-nation legal framework. Models have some international distribution. |
| MEDIUM | Mixed provenance, indirect PRC/Russian ties, or unclear funding. Warrants monitoring and due diligence. |
| LOW | Non-adversary origin but with some connection points (e.g., PRC fine-tunes of Western base models). |
Risk Factors Weighted
- Legal jurisdiction (30%) — Is the entity subject to PRC National Intelligence Law or Russian equivalent?
- Government relationship (25%) — State-owned, Entity-Listed, government-funded, party committee?
- Distribution reach (20%) — How widely are models distributed in Western ecosystems?
- Data exposure (15%) — Do models/APIs route data to adversary-nation servers?
- Opacity (10%) — Are weights open (auditable) or closed (unverifiable)?
Consolidated Risk Matrix
| # | Model Family | Parent Entity | Country | Risk Level | Weights | Key Risk Factor |
|---|
| 1 | SenseTime/SenseNova | SenseTime | PRC | CRITICAL | Mixed | US Entity List |
| 2 | iFlytek/Spark | iFlytek | PRC | CRITICAL | Closed | US Entity List |
| 3 | Huawei/PanGu | Huawei | PRC | CRITICAL | Mixed | US Entity List + sanctions |
| 4 | Megvii/YOLOX | Megvii | PRC | CRITICAL | Open | US Entity List |
| 5 | TeleChat | China Telecom | PRC | CRITICAL | Open | State-owned enterprise |
| 6 | BAAI/Aquila/BGE | BAAI | PRC | CRITICAL | Open | State-funded lab; BGE embeddings ubiquitous |
| 7 | Sber/GigaChat | Sberbank | Russia | CRITICAL | Mixed | State-owned bank; sanctioned |
| 8 | InternLM/InternVL | Shanghai AI Lab | PRC | HIGH | Open | Government-established lab |
| 9 | Qwen | Alibaba | PRC | HIGH | Open | Most-adopted PRC model family in West |
| 10 | DeepSeek | DeepSeek/High-Flyer | PRC | HIGH | Open | Massive adoption post-R1; data to PRC servers |
| 11 | ERNIE | Baidu | PRC | HIGH | Closed | Deep government integration |
| 12 | GLM/ChatGLM | Zhipu AI | PRC | HIGH | Mixed | Tsinghua origins; CodeGeeX in IDEs |
| 13 | Yi | 01.AI | PRC | HIGH | Open | Wide Western distribution; PRC legal jurisdiction |
| 14 | Baichuan | Baichuan Inc. | PRC | HIGH | Mixed | PRC investor network |
| 15 | XVERSE | XVERSE Tech | PRC | HIGH | Open | PRC jurisdiction; Tencent alumni |
| 16 | Hunyuan | Tencent | PRC | HIGH | Mixed | WeChat operator; HunyuanVideo widely adopted |
| 17 | Doubao/Skylark | ByteDance | PRC | HIGH | Mixed | TikTok parent; CCP golden share |
| 18 | Step | StepFun | PRC | HIGH | Mixed | GOT-OCR widely adopted; PRC jurisdiction |
| 19 | KLING/Kolors | Kuaishou | PRC | HIGH | Mixed | Video generation widely adopted |
| 20 | MiniCPM | ModelBest/Tsinghua | PRC | HIGH | Open | On-device deployment focus; Tsinghua ties |
| 21 | Skywork | Kunlun Tech | PRC | HIGH | Open | Reward model used in RLHF pipelines |
| 22 | Orion | OrionStar | PRC | HIGH | Open | PRC jurisdiction |
| 23 | BlueLM | vivo | PRC | HIGH | Open | PRC jurisdiction |
| 24 | Fengshenbang/Ziya | IDEA Research | PRC | HIGH | Open | Government-supported institute |
| 25 | Moonshot/Kimi | Moonshot AI | PRC | MEDIUM-HIGH | Closed | PRC API; growing international use |
| 26 | MiniMax | MiniMax | PRC | MEDIUM-HIGH | Mixed | SenseTime alumni; Hailuo AI video |
| 27 | YaLM/YandexGPT | Yandex | Russia | HIGH | Mixed | Kremlin-aligned restructuring |
| 28 | MTS AI models | MTS | Russia | HIGH | Various | Russian telecom; Yarovaya Law |
| 29 | RWKV | RWKV Foundation | Mixed | MEDIUM | Open | PRC community involvement; non-PRC entity |
| 30 | Singapore-origin models | Various | Singapore | MEDIUM | Various | PRC researcher/funding connections |
Recommendations
- Inventory all AI models in use across the organization, including embedded components (embeddings, reward models, tokenizers)
- Check for BAAI BGE embeddings — these are the single most widely adopted PRC-origin AI component in Western RAG/search systems, often included as defaults in LangChain, LlamaIndex, and similar frameworks
- Audit coding assistants and IDE plugins — check for CodeGeeX, Qwen-Coder, or DeepSeek-Coder integrations
- Block API access to PRC-hosted model endpoints (DeepSeek API, Moonshot API, Zhipu API, etc.) from all networks handling sensitive information
- Review Ollama and LM Studio installations — users may have downloaded Chinese models for local inference without organizational awareness
Policy Recommendations
- Establish a model provenance verification process — before any AI model is deployed, verify its origin by checking architecture identifiers, tokenizer patterns, and model card metadata
- Maintain an approved model list — whitelist specific model families and versions that have passed security review
- Treat open-weight PRC models as higher risk than closed PRC models for some threat scenarios — while open weights enable auditing, they also enable deployment without any PRC-side logging, which may be preferable for certain use cases, but open weights also enable adversary insight into what the model “knows” and can do
- Monitor fine-tune ecosystems — track popular community fine-tunes that may be based on Chinese base models
- Engage with Hugging Face and other platforms on provenance labeling and transparency requirements
- Treat any model processing classified or sensitive information as requiring provenance verification — no exceptions for “convenience” or “it works well”
What “Open Weights” Does and Does Not Mitigate
| Risk | Open Weights Mitigates? | Notes |
|---|
| Backdoors / trojans in weights | Partially | Weights can be scanned but neural network trojans are extremely difficult to detect with current techniques |
| Censorship / bias patterns | Yes | Can be tested and measured |
| Data exfiltration via API | Yes | Local inference eliminates network data flow to PRC |
| Training data poisoning | No | Cannot determine what was in training data from weights alone |
| Influence over model ecosystem | No | Widely adopted base models shape the entire derivative ecosystem |
| Supply chain dependency | No | Reliance on PRC base models creates strategic dependency regardless of weight openness |
| Steganographic communication | No | Models could theoretically embed information in outputs that is not detectable without the key |
Document ends.
This catalog should be updated quarterly as the Chinese and Russian AI model ecosystems are evolving rapidly. New model families, variants, and distribution channels emerge continuously.