Part II · Chapter 109 PROVEN

At-Risk Model Catalog

Comprehensive catalog of adversary-nation origin AI models -- 30+ model families across 17+ organizations with risk assessments.

Status Draft -- for peer review Updated 2026-03-02

Executive Summary
Legal and Regulatory Framework
Chinese-Origin Model Catalog
Russian-Origin Model Catalog
Models with Unclear or Mixed Provenance
Fine-Tunes and Derivatives
Distribution Vectors
Risk Assessment Methodology
Consolidated Risk Matrix
Recommendations

Executive Summary

This document catalogs AI models originating from or substantially influenced by adversary nations, primarily the People’s Republic of China (PRC) and the Russian Federation. It is intended to support risk-informed decision-making within defense, intelligence, and critical-infrastructure contexts.

Key finding: As of early 2026, at least 17 major Chinese organizations are actively distributing large language models (LLMs) and multimodal models through international platforms such as Hugging Face, GitHub, and Ollama. Many of these models achieve frontier-level performance and are being integrated into Western developer toolchains, enterprise products, and open-source projects, often without end-users understanding their provenance.

Critical legal context: Every entity on this list operating within the PRC is subject to:

The National Intelligence Law of 2017 (Article 7: all organizations and citizens must support, assist, and cooperate with national intelligence work)
The Data Security Law of 2021
The Cybersecurity Law of 2017
The Counter-Espionage Law (amended 2023)
The Personal Information Protection Law of 2021

These laws create a compelled-cooperation framework that has no parallel in Western democracies. Even if a given company has no current intent to serve state intelligence interests, the legal architecture permits the Chinese government to compel cooperation at any time, with no judicial review or public disclosure.

Legal and Regulatory Framework

PRC Laws Relevant to AI Model Risk

Law	Year	Key Provision	Risk Implication
National Intelligence Law	2017	Art. 7: Organizations and citizens shall support and cooperate with national intelligence work	Any PRC-based AI company can be compelled to embed capabilities, exfiltrate data, or modify model behavior on behalf of state intelligence services
Cybersecurity Law	2017	Data localization, security review requirements	Training data, user interaction logs, and telemetry from PRC models may be accessible to government
Data Security Law	2021	Government access to data classified as “important” or “core”	Model weights, training data, and deployment telemetry could be classified as state-relevant data
Personal Information Protection Law	2021	Cross-border data transfer restrictions, government access provisions	User data flowing to PRC infrastructure is subject to government access
Counter-Espionage Law (amended)	2023	Broadened definition of espionage; expanded state access to digital systems	Provides additional legal basis for compelling cooperation from AI companies
Interim Measures for Generative AI	2023	Models must align with “core socialist values”; providers must conduct security assessments	Models serving PRC domestic users are explicitly subject to ideological alignment requirements; this demonstrates the government’s willingness and capability to mandate model behavior modification
PRC Civil-Military Fusion Strategy	Ongoing	National strategy to eliminate barriers between civilian and military technology	All PRC AI capabilities are, by policy, available for military and intelligence application

Russian Federation Laws

Law	Key Provision	Risk Implication
Yarovaya Law (2016)	Telecom data retention and FSB access	Infrastructure hosting Russian models subject to surveillance
Sovereign Internet Law (2019)	State control over internet infrastructure	Russian AI services can be co-opted for state purposes
Data Localization Law (2015)	Personal data of Russian citizens must be stored in Russia	Data processed by Russian AI services held under state-accessible infrastructure

Chinese-Origin Model Catalog

1. Alibaba Cloud / Qwen Family

Attribute	Detail
Parent Organization	Alibaba Group / Alibaba Cloud (Tongyi Lab)
Country of Origin	PRC (Hangzhou, Zhejiang)
Government Ties	Alibaba is subject to intense CCP regulatory oversight. Jack Ma’s public rebuke by regulators in 2020 demonstrated the party’s control over the company. Alibaba Cloud provides cloud services to PRC government entities and military-linked organizations. Alibaba has a CCP party committee embedded within corporate governance.
Weights	Open (Apache 2.0 for most variants)
Distribution	Hugging Face, ModelScope, GitHub, Ollama, integrated into numerous third-party products
Risk Level	HIGH

Model Variants:

Model	Parameters	Modality	Notes
Qwen-7B, Qwen-14B, Qwen-72B	7B-72B	Text	Original Qwen series
Qwen-1.5 (0.5B to 110B)	0.5B-110B	Text	Improved series with MoE variant
Qwen2 (0.5B to 72B)	0.5B-72B	Text	Major architecture update
Qwen2.5 (0.5B to 72B)	0.5B-72B	Text	Current flagship text series
Qwen2.5-Coder (1.5B to 32B)	1.5B-32B	Code	Specialized code generation
Qwen2.5-Math	Various	Math	Mathematical reasoning
QwQ-32B	32B	Text (reasoning)	Reasoning-focused model (chain-of-thought)
Qwen-VL, Qwen2-VL, Qwen2.5-VL	Various	Vision-Language	Multimodal image understanding
Qwen-Audio, Qwen2-Audio	Various	Audio-Language	Audio understanding and generation
Qwen-Agent	Various	Agentic	Tool-use and agent framework
Qwen2.5-Turbo	Various	Text	Optimized for speed/efficiency
Qwen-Long	Various	Text	Extended context window

Specific Risk Factors:

Qwen is one of the most widely adopted Chinese model families in the West, embedded in hundreds of derivative products
Open weights enable integration without attribution, making provenance tracking extremely difficult
Alibaba Cloud infrastructure is used for PRC government and military-adjacent workloads
Qwen models have been observed to contain CCP-aligned content filtering and censorship behavior (e.g., refusing to discuss Tiananmen Square, Taiwan independence, Xinjiang)
Massive derivative ecosystem on Hugging Face (thousands of fine-tunes) obscures the Chinese base model origin
Qwen2.5-Coder is being integrated into coding assistants, creating potential for supply-chain influence over software development

2. DeepSeek

Attribute	Detail
Parent Organization	DeepSeek (深度求索), subsidiary/affiliate of High-Flyer Capital Management (幻方量化)
Country of Origin	PRC (Hangzhou, Zhejiang)
Government Ties	High-Flyer is a major Chinese quantitative hedge fund. While not a direct state entity, it operates under PRC law and regulatory oversight. DeepSeek’s rapid capability gains and apparent access to large GPU clusters despite US export controls have raised questions about state support. DeepSeek is subject to all PRC compelled-cooperation laws.
Weights	Open (MIT License for most variants)
Distribution	Hugging Face, GitHub, Ollama, DeepSeek API, ModelScope, widely mirrored
Risk Level	HIGH

Model Variants:

Model	Parameters	Modality	Notes
DeepSeek-LLM (7B, 67B)	7B-67B	Text	Original series
DeepSeek-V2	236B (21B active, MoE)	Text	Mixture-of-Experts architecture
DeepSeek-V2.5	236B MoE	Text	Merged chat and code capabilities
DeepSeek-V3	671B (37B active, MoE)	Text	Claimed training cost efficiency breakthrough
DeepSeek-R1	671B MoE	Text (reasoning)	Reasoning model, competitive with OpenAI o1
DeepSeek-R1-Distill (various)	1.5B-70B	Text (reasoning)	Distilled reasoning models based on Qwen and Llama
DeepSeek-Coder (1.3B-33B)	1.3B-33B	Code	Code generation and understanding
DeepSeek-Coder-V2	236B MoE	Code	Advanced code model
DeepSeek-Math (7B)	7B	Math	Mathematical reasoning
DeepSeek-VL, DeepSeek-VL2	Various	Vision-Language	Multimodal
DeepSeek-Prover	Various	Math/Proof	Theorem proving
Janus-Pro	Various	Vision-Language	Unified multimodal model

Specific Risk Factors:

DeepSeek-R1 received enormous global media attention in January 2025, driving massive adoption
The R1-Distill variants are based on Qwen2.5 and Llama 3 architectures, creating nested provenance concerns
DeepSeek’s claimed training cost efficiency ($5.6M for V3) has been questioned; possible undisclosed state subsidies or access to restricted compute
DeepSeek API sends data to PRC servers by default
Multiple governments (Italy, Australia, South Korea, Taiwan, US federal agencies) have restricted or investigated DeepSeek
DeepSeek models exhibit strong CCP-aligned censorship patterns
The connection to High-Flyer Capital raises questions about financial data collection interests
DeepSeek-R1’s open weights have been rapidly integrated into Western AI infrastructure and products

3. Baidu / ERNIE Family

Attribute	Detail
Parent Organization	Baidu, Inc.
Country of Origin	PRC (Beijing)
Government Ties	Baidu is deeply integrated with PRC government initiatives. Baidu provides AI services to Chinese government agencies, military, and public security. Baidu CEO Robin Li serves on the Chinese People’s Political Consultative Conference (CPPCC). Baidu Apollo provides autonomous driving technology with PRC government collaboration. Baidu has PRC government AI platform contracts.
Weights	Primarily closed (API access); some older variants partially open
Distribution	Baidu API (ERNIE Bot / Wenxin Yiyan), limited international distribution
Risk Level	HIGH

Model Variants:

Model	Notes
ERNIE 1.0, 2.0, 3.0	Earlier knowledge-enhanced models
ERNIE 3.5	Mid-generation; widely used in China
ERNIE 4.0	Current flagship
ERNIE 4.0 Turbo	Optimized variant
ERNIE Bot (Wenxin Yiyan)	Consumer-facing chatbot application
ERNIE-ViLG	Text-to-image generation
ERNIE-Code	Code generation
ERNIE-Music	Music generation
ERNIE-Speed, ERNIE-Lite	Lightweight variants

Specific Risk Factors:

Baidu’s deep PRC government integration makes it among the highest-risk entities
ERNIE Bot was one of the first PRC chatbots approved under the Interim Measures for Generative AI, meaning it passed government security and ideological review
Primarily closed-weight, meaning behavior is fully controlled by Baidu with no independent verification
Baidu’s search engine dominance in China means ERNIE models are trained on massive PRC-curated datasets
Less directly distributed in the West than Qwen or DeepSeek, but integrated into products that may reach Western users

4. Zhipu AI / GLM Family

Attribute	Detail
Parent Organization	Zhipu AI (智谱AI), spun out of Tsinghua University
Country of Origin	PRC (Beijing)
Government Ties	Originated from Tsinghua University, which has deep ties to PRC government and military research. Tsinghua is a key institution in China’s national AI strategy. Zhipu AI has received significant funding from PRC-connected investors.
Weights	Mixed (some open, some closed)
Distribution	Hugging Face, GitHub, ModelScope, Zhipu AI API (BigModel)
Risk Level	HIGH

Model Variants:

Model	Parameters	Notes
GLM-130B	130B	Original large-scale bilingual model
ChatGLM-6B	6B	Open conversational model
ChatGLM2-6B	6B	Improved version
ChatGLM3-6B	6B	Third generation
GLM-4	Various	Current flagship (closed API)
GLM-4-Air, GLM-4-Flash	Various	Lightweight variants
GLM-4V	Various	Vision-language multimodal
GLM-4-Voice	Various	Voice capabilities
CogVLM, CogVLM2	Various	Visual language model
CogAgent	Various	GUI-interaction agent
CogVideo, CogVideoX	Various	Video generation
CodeGeeX (1-4)	Various	Code generation (widely distributed as IDE plugin)

Specific Risk Factors:

Tsinghua University provenance means direct academic-military complex ties
CodeGeeX is distributed as IDE plugins (VS Code, JetBrains), creating a direct vector into developer environments
CogVLM/CogAgent models designed for GUI interaction raise concerns about agentic AI capabilities being PRC-controlled
GLM-4 API processes data on PRC servers
ChatGLM series was among the earliest widely-adopted Chinese open models, establishing a large derivative ecosystem

5. 01.AI / Yi Family

Attribute	Detail
Parent Organization	01.AI (零一万物)
Country of Origin	PRC (Beijing)
Government Ties	Founded by Kai-Fu Lee, former president of Google China. While Lee has international profile, 01.AI operates in Beijing under PRC law. Funded by PRC-based investors including Alibaba. Subject to all PRC compelled-cooperation laws.
Weights	Open (Apache 2.0 for most variants)
Distribution	Hugging Face, GitHub, ModelScope, Ollama
Risk Level	HIGH

Model Variants:

Model	Parameters	Notes
Yi-6B, Yi-34B	6B-34B	Original series
Yi-1.5 (6B, 9B, 34B)	6B-34B	Improved series
Yi-Large	Undisclosed (large)	Flagship closed model
Yi-Medium, Yi-Spark	Various	Mid-tier models
Yi-VL (6B, 34B)	6B-34B	Vision-language
Yi-Coder (1.5B, 9B)	1.5B-9B	Code generation
Yi-Lightning	Various	Optimized for speed

Specific Risk Factors:

Kai-Fu Lee’s international reputation and English-language visibility may create a false sense of security about PRC legal obligations
Yi-34B was among the first competitive Chinese open models and has a large derivative ecosystem
Open weights widely distributed on Western platforms
01.AI’s PRC headquarters and funding sources place it firmly within the PRC regulatory and intelligence framework

6. Baichuan

Attribute	Detail
Parent Organization	Baichuan Inc. (百川智能)
Country of Origin	PRC (Beijing)
Government Ties	Founded by Wang Xiaochuan, former CEO of Sogou (search engine). Operates under PRC law. Received investment from PRC-connected sources including Tencent and Alibaba.
Weights	Open (earlier models), mixed (later models)
Distribution	Hugging Face, GitHub, ModelScope
Risk Level	HIGH

Model Variants:

Model	Parameters	Notes
Baichuan-7B, Baichuan-13B	7B-13B	Original series
Baichuan2 (7B, 13B)	7B-13B	Second generation
Baichuan-53B	53B	Larger model
Baichuan3	Various	Third generation
Baichuan4	Various	Current generation (primarily API)

Specific Risk Factors:

Open weights for earlier models widely circulated
Sogou heritage means deep PRC internet ecosystem ties
Backed by major PRC tech conglomerates (Tencent, Alibaba)
Subject to PRC generative AI regulations and ideological alignment requirements

7. InternLM (Shanghai AI Lab)

Attribute	Detail
Parent Organization	Shanghai Artificial Intelligence Laboratory (上海人工智能实验室)
Country of Origin	PRC (Shanghai)
Government Ties	Shanghai AI Lab is a government-established and government-funded research institution. It was created as part of China’s national AI strategy. Its leadership includes senior PRC academic and government-connected figures. It collaborates with multiple PRC universities with military research ties (Tsinghua, SJTU, etc.). This is effectively a state laboratory.
Weights	Open (Apache 2.0)
Distribution	Hugging Face, GitHub, ModelScope, OpenXLab
Risk Level	HIGH

Model Variants:

Model	Parameters	Notes
InternLM (7B, 20B)	7B-20B	Original series
InternLM2 (1.8B-20B)	1.8B-20B	Second generation
InternLM2.5 (7B)	7B	Current open model
InternLM-XComposer (1, 2, 2.5)	Various	Vision-language composition
InternLM-Math	Various	Mathematical reasoning
InternVL (1.0, 1.5, 2.0, 2.5)	Various	Vision-language (leading open VLM)
InternLM2-Chat	Various	Conversational variants

Specific Risk Factors:

This is the most directly government-linked model family on this list — Shanghai AI Lab is a state institution
InternVL is one of the leading open vision-language models and is widely used as a base for fine-tunes
Open weights distributed internationally through Hugging Face and GitHub
InternLM-XComposer models have advanced document and image understanding capabilities
Extensive collaborative network with PRC military-linked universities

8. XVERSE

Attribute	Detail
Parent Organization	XVERSE Technology (元象科技)
Country of Origin	PRC (Shenzhen)
Government Ties	Founded by former Tencent employees. Operates under PRC law. Less prominent government ties than some others but fully subject to PRC legal framework.
Weights	Open
Distribution	Hugging Face, ModelScope
Risk Level	HIGH

Model Variants:

Model	Parameters	Notes
XVERSE-7B	7B	Base model
XVERSE-13B	13B	Mid-size model
XVERSE-65B	65B	Large model
XVERSE-MoE-A4.2B	~256B (4.2B active)	Mixture of Experts

Specific Risk Factors:

Tencent alumni connections
Open weights on international platforms
Less scrutiny due to lower profile compared to Qwen/DeepSeek increases the risk that provenance is overlooked

9. Moonshot AI / Kimi

Attribute	Detail
Parent Organization	Moonshot AI (月之暗面)
Country of Origin	PRC (Beijing)
Government Ties	Founded by Yang Zhilin, a prominent PRC AI researcher. Backed by major PRC investors including Alibaba, HongShan (formerly Sequoia China), and others. Subject to PRC law.
Weights	Closed (API only)
Distribution	Kimi API, Kimi Chat application
Risk Level	MEDIUM-HIGH

Model Variants:

Model	Notes
Moonshot-v1 (8k, 32k, 128k)	Various context lengths
Kimi (consumer product)	Chatbot with extremely long context window
Kimi k1.5	Reasoning model

Specific Risk Factors:

Closed weights mean behavior cannot be independently verified
Kimi’s extremely long context window (originally claimed 2M tokens) means users may input large volumes of sensitive text
API processes data on PRC servers
Lower international adoption than Qwen/DeepSeek reduces direct risk to Western users, but API integration is growing

10. MiniMax

Attribute	Detail
Parent Organization	MiniMax (稀宇科技)
Country of Origin	PRC (Shanghai)
Government Ties	Founded by former SenseTime employees. Backed by Tencent and other PRC investors. Subject to PRC law.
Weights	Mixed (some open, primarily closed)
Distribution	MiniMax API, Hailuo AI (consumer product), Hugging Face (select models)
Risk Level	MEDIUM-HIGH

Model Variants:

Model	Notes
MiniMax-abab5, abab5.5, abab6, abab6.5	Successive generations
MiniMax-Text-01	456B MoE, open weights
MiniMax-VL-01	Vision-language, open weights
Hailuo AI Video	Video generation model
MiniMax-Speech	Speech synthesis
MiniMax-Music	Music generation

Specific Risk Factors:

SenseTime heritage (SenseTime is on the US Entity List)
Hailuo AI video generation has gained significant international adoption
MiniMax-Text-01 with open weights is a very large MoE model distributed internationally
Multimodal capabilities (video, speech, music) expand attack surface

11. SenseTime Models

Attribute	Detail
Parent Organization	SenseTime Group (商汤科技)
Country of Origin	PRC (Shanghai/Hong Kong)
Government Ties	SenseTime is on the US Entity List (Bureau of Industry and Security). Sanctioned by the US Treasury Department. Provides surveillance technology to PRC government and security services. Directly implicated in Xinjiang surveillance infrastructure. Has military and public security contracts.
Weights	Mixed
Distribution	SenseTime API (SenseNova), limited international distribution
Risk Level	CRITICAL / HIGH

Model Variants:

Model	Notes
SenseNova 5.0, 5.5	Current flagship LLM
SenseChat	Conversational AI
SenseNova Raccoon (日日新)	Large language model series
SenseTime image/video generation models	Various multimodal

Specific Risk Factors:

US Entity List designation makes any use potentially subject to export control violations
Direct involvement in PRC surveillance and human rights abuses
Military and public security contracts
Despite sanctions, technology may circulate through derivatives or unlabeled integrations
Any integration with SenseTime technology may create sanctions compliance liability

12. iFlytek / Spark

Attribute	Detail
Parent Organization	iFlytek (科大讯飞)
Country of Origin	PRC (Hefei, Anhui)
Government Ties	iFlytek is on the US Entity List. Provides voice recognition and AI technology to PRC government, military, and public security. Implicated in Xinjiang surveillance. Deep ties to the University of Science and Technology of China (USTC), a key PRC defense research institution.
Weights	Closed
Distribution	iFlytek API, primarily domestic PRC market
Risk Level	CRITICAL / HIGH

Model Variants:

Model	Notes
Spark (Xinghuo) v1, v1.5, v2, v3, v3.5, v4	Successive LLM generations
Spark-Lite, Spark-Pro, Spark-Max, Spark-Ultra	Tiered model offerings
iFlytek voice/speech models	Industry-leading Chinese speech recognition

Specific Risk Factors:

US Entity List designation
Core competency in voice/speech AI creates unique surveillance risk
Deep PRC military and intelligence community ties through USTC
Speech recognition technology deployed in PRC public security and surveillance systems
Domestic-focused but technologies may be embedded in products that reach international markets

13. Huawei / PanGu

Attribute	Detail
Parent Organization	Huawei Technologies
Country of Origin	PRC (Shenzhen)
Government Ties	Huawei is on the US Entity List and subject to extensive sanctions. Widely assessed by Western intelligence agencies as having close ties to PRC military and intelligence services. Subject to the most extensive US technology restrictions of any Chinese company.
Weights	Primarily closed; some research releases
Distribution	Huawei Cloud, limited international academic distribution
Risk Level	CRITICAL / HIGH

Model Variants:

Model	Notes
PanGu-Alpha (2.6B-200B)	Early large-scale LLM
PanGu-Sigma	Trillion-parameter MoE
PanGu-Coder	Code generation
PanGu-Weather	Weather prediction (Nature-published)
PanGu-Drug	Drug discovery
Huawei Cloud Pangu LLM 3.0, 5.0	Enterprise LLM offering

Specific Risk Factors:

Most heavily sanctioned Chinese technology company
Intelligence agency assessments across Five Eyes nations have flagged Huawei as a national security risk
Huawei develops its own AI accelerator chips (Ascend series), creating a PRC-controlled full-stack AI ecosystem
Any use of Huawei AI models may create sanctions compliance violations
PanGu models may be embedded in Huawei network equipment and telecommunications infrastructure

14. Tencent / Hunyuan

Attribute	Detail
Parent Organization	Tencent Holdings
Country of Origin	PRC (Shenzhen)
Government Ties	Tencent operates WeChat, China’s dominant messaging/social platform, which is extensively monitored and censored by PRC government. Tencent has CCP party committee within corporate governance. Subject to intense PRC regulatory oversight. WeChat data is accessible to PRC security services.
Weights	Mixed (some open, primarily closed)
Distribution	Tencent Cloud API, Hugging Face (select models), GitHub
Risk Level	HIGH

Model Variants:

Model	Notes
Hunyuan-LLM	Text generation
Hunyuan-Large (389B MoE)	Open-weight large MoE model
Hunyuan-A13B	Efficient variant
HunyuanDiT	Image generation (diffusion transformer)
HunyuanVideo	Video generation (open weights, widely adopted)
Hunyuan3D	3D generation
Hunyuan-TurboS	Reasoning model

Specific Risk Factors:

Tencent’s role as operator of WeChat gives it unique data access and PRC government integration
HunyuanVideo is one of the leading open video generation models and has been widely adopted internationally
Hunyuan-Large open weights distributed on Hugging Face
Tencent Cloud is a major PRC cloud provider with government contracts
Tencent’s gaming and media empire means Hunyuan models may be embedded in entertainment products consumed globally

15. ByteDance / Doubao

Attribute	Detail
Parent Organization	ByteDance Ltd.
Country of Origin	PRC (Beijing), with significant international operations
Government Ties	ByteDance operates TikTok, subject to ongoing US national security concerns and potential ban/divestiture. ByteDance has a CCP party committee and “golden share” arrangement giving PRC government a board seat in a key domestic entity. Subject to intense PRC regulatory oversight. ByteDance editors were reportedly directed by PRC government to suppress certain content.
Weights	Primarily closed; select open releases
Distribution	Volcengine (Volcano Engine) API, limited Hugging Face presence
Risk Level	HIGH

Model Variants:

Model	Notes
Doubao (豆包) / Skylark	Primary LLM family
Doubao-pro, Doubao-lite	Tiered offerings
Doubao-Vision	Multimodal
ByteDance SDXL variants	Image generation
Emu (image generation research)	Research model

Specific Risk Factors:

TikTok controversy demonstrates the PRC government influence over ByteDance
CCP “golden share” arrangement provides direct government governance participation
Volcengine cloud platform processes data on PRC infrastructure
ByteDance’s massive international user base through TikTok creates potential for AI model deployment at scale to Western users
Less open-weight distribution than some competitors, but API access routes data to PRC

16. StepFun / Step Models

Attribute	Detail
Parent Organization	StepFun (阶跃星辰)
Country of Origin	PRC (Shanghai)
Government Ties	Founded by Jiang Daxin, former Microsoft Research Asia executive. Backed by major PRC investors. Subject to PRC law.
Weights	Mixed (some open)
Distribution	Step API, Hugging Face (select models)
Risk Level	HIGH

Model Variants:

Model	Notes
Step-1	200B+ parameter model
Step-1V	Vision-language
Step-2	Next generation
Step-1.5V	Improved vision model
GOT-OCR	Open-source OCR model (widely adopted)

Specific Risk Factors:

GOT-OCR (General OCR Theory) model has been very widely adopted for document processing — users may not realize PRC provenance
Microsoft Research Asia alumni connections
Growing international distribution

17. Other Chinese-Origin Models

Kuaishou / Kolors & KLING

Attribute	Detail
Parent Organization	Kuaishou Technology (快手)
Country	PRC
Models	Kolors (image generation), KLING (video generation)
Weights	Mixed
Distribution	Hugging Face, API
Risk Level	HIGH
Notes	Kuaishou is a major PRC short-video platform. KLING video generation has gained significant international adoption. Kolors image model open on Hugging Face.

Zhijiang Lab / various models

Attribute	Detail
Parent Organization	Zhejiang Lab (之江实验室)
Country	PRC
Government Ties	Government-established research laboratory (Zhejiang provincial government)
Risk Level	HIGH
Notes	State laboratory producing various AI research models

ModelBest / MiniCPM

Attribute	Detail
Parent Organization	ModelBest (面壁智能), associated with Tsinghua University
Country	PRC
Models	MiniCPM (2B-4B), MiniCPM-V (vision), MiniCPM-o (omni), OmniLMM
Weights	Open
Distribution	Hugging Face, GitHub
Risk Level	HIGH
Notes	Small efficient models specifically designed for on-device deployment. Tsinghua University affiliation. MiniCPM-V is widely used for mobile vision applications. On-device deployment means models run locally but still originate from PRC institution.

Deepin / deepin-ai

Attribute	Detail
Parent Organization	Uniontech (统信软件), PRC Linux distribution maker
Country	PRC
Models	Various AI integrations in Deepin Linux
Risk Level	MEDIUM

vivo / BlueLM

Attribute	Detail
Parent Organization	vivo Mobile Communication
Country	PRC
Models	BlueLM-7B
Weights	Open
Distribution	Hugging Face
Risk Level	HIGH

IDEA-CCNL / Fengshenbang

Attribute	Detail
Parent Organization	IDEA Research (International Digital Economy Academy), Shenzhen
Country	PRC
Models	Ziya-LLaMA, Fengshenbang series, Taiyi (image)
Weights	Open
Distribution	Hugging Face
Risk Level	HIGH
Notes	Government-supported research institute in Shenzhen. Led by Harry Shum, former EVP of Microsoft.

Colossal-AI / various

Attribute	Detail
Parent Organization	HPC-AI Tech, associated with National University of Singapore but with significant PRC founder involvement
Country	Singapore / PRC mixed
Models	ColossalChat, open-source training frameworks
Risk Level	MEDIUM

Skywork

Attribute	Detail
Parent Organization	Kunlun Tech (昆仑万维)
Country	PRC (Beijing)
Models	Skywork-13B, Skywork-MoE, Skywork-Math, Skywork-Reward
Weights	Open
Distribution	Hugging Face
Risk Level	HIGH
Notes	Skywork-Reward model widely used for RLHF reward modeling — could influence alignment of other models.

MAP-Neo

Attribute	Detail
Parent Organization	M-A-P consortium (multi-institutional, PRC-heavy)
Country	PRC-led
Models	MAP-Neo-7B
Weights	Open
Distribution	Hugging Face
Risk Level	MEDIUM-HIGH

Orion / OrionStar

Attribute	Detail
Parent Organization	OrionStar (猎户星空) — Cheetah Mobile subsidiary
Country	PRC
Models	Orion-14B
Weights	Open
Distribution	Hugging Face
Risk Level	HIGH

TeleChat / China Telecom

Attribute	Detail
Parent Organization	China Telecom (中国电信)
Country	PRC
Government Ties	State-owned enterprise
Models	TeleChat (7B, 12B, 52B, 115B)
Weights	Open
Distribution	Hugging Face, ModelScope
Risk Level	CRITICAL / HIGH
Notes	Directly state-owned. Model used in telecommunications infrastructure.

AquilaChat / BAAI

Attribute	Detail
Parent Organization	Beijing Academy of Artificial Intelligence (BAAI / 北京智源人工智能研究院)
Country	PRC
Government Ties	Government-established and government-funded research institution (Beijing municipal government). Led by major PRC AI strategy figures.
Models	Aquila (7B, 34B), AquilaChat, FlagAlpha series, BGE (embedding models), EVA (vision)
Weights	Open
Distribution	Hugging Face, ModelScope
Risk Level	CRITICAL / HIGH
Notes	BAAI is a key institution in PRC national AI strategy. BGE embedding models are among the most widely used embedding models globally — they are embedded in countless RAG systems, often without users understanding PRC provenance. BAAI also maintains FlagEval (benchmark) and FlagOpen (open-source platform).

Megvii / YOLO-variants

Attribute	Detail
Parent Organization	Megvii (旷视科技)
Country	PRC
Government Ties	On the US Entity List. Provides facial recognition to PRC government/security.
Models	YOLOX, various computer vision models
Weights	Open (research releases)
Risk Level	CRITICAL / HIGH

WPS AI / Kingsoft

Attribute	Detail
Parent Organization	Kingsoft Office (金山办公)
Country	PRC
Models	WPS AI (integrated into WPS Office)
Risk Level	HIGH
Notes	WPS Office has significant international user base. AI features process documents on PRC infrastructure.

Russian-Origin Model Catalog

18. Sber / GigaChat

Attribute	Detail
Parent Organization	Sberbank (Сбербанк) / SberDevices
Country of Origin	Russian Federation
Government Ties	Sberbank is majority-owned by the Russian government (via the National Wealth Fund / Central Bank). Sberbank is subject to extensive Western sanctions following Russia’s invasion of Ukraine. Russian government directly controls Sberbank’s strategy.
Weights	Primarily closed
Distribution	GigaChat API (Russia-focused), limited international
Risk Level	CRITICAL / HIGH

Model Variants:

Model	Notes
GigaChat	Consumer-facing LLM
GigaChat-Pro, GigaChat-Max	Tiered offerings
ruGPT-3, ruGPT-3.5	Russian-language GPT models (some open)
mGPT	Multilingual model
Kandinsky (2.x, 3.x)	Image generation (some open weights on Hugging Face)

Specific Risk Factors:

State-owned bank — direct Russian government entity
Subject to comprehensive Western sanctions
Any use may create sanctions compliance violations
Kandinsky image models have been distributed on Hugging Face with open weights, potentially obscuring Russian government provenance
Russian intelligence services (FSB, GRU, SVR) have direct legal authority over Sberbank

19. Yandex / YaLM

Attribute	Detail
Parent Organization	Yandex (now restructured; Russian operations transferred to new entity)
Country of Origin	Russian Federation
Government Ties	Yandex has been subject to increasing Russian government pressure and restructuring. Russian operations were transferred to a consortium with Kremlin-connected ownership. Subject to Russian data and security laws.
Weights	Mixed (YaLM-100B was released open)
Distribution	Hugging Face (historical), GitHub
Risk Level	HIGH

Model Variants:

Model	Notes
YaLM-100B	100B parameter model (open weights)
YandexGPT (1, 2, 3, 4)	Successive LLM generations (closed)
Alice / Alisa	Voice assistant AI

Specific Risk Factors:

YaLM-100B open weights remain available on Hugging Face
Yandex restructuring placed Russian operations under more Kremlin-aligned ownership
YandexGPT processes data on Russian servers
Yandex’s dominant position in Russian search means training data reflects Russian information environment

20. Other Russian-Origin Models

MTS AI

Attribute	Detail
Parent Organization	MTS (Mobile TeleSystems)
Country	Russian Federation
Government Ties	Major Russian telecom, subject to Russian government oversight and Yarovaya Law
Risk Level	HIGH

AIRI (AI Research Institute, Russia)

Attribute	Detail
Parent Organization	AIRI
Country	Russian Federation
Models	Various research models
Risk Level	MEDIUM-HIGH

Models with Unclear or Mixed Provenance

Models Requiring Additional Scrutiny

Model/Family	Concern	Risk Level
Stability AI (various models)	Significant investment from PRC-linked sources; international team but complex funding structure. Now largely open-source.	MEDIUM
Mistral (French)	European company, but has explored partnerships with PRC entities. Mistral models are frequently fine-tuned by PRC groups. Monitor.	LOW (base), MEDIUM (PRC fine-tunes)
Falcon (UAE/TII)	Technology Innovation Institute is UAE government-funded. UAE is not an adversary but has complex relationships with PRC.	LOW-MEDIUM
Jamba (AI21, Israel)	Legitimate provenance, but note that some fine-tunes on Hugging Face may be PRC-sourced.	LOW
Various Singapore-based models	Singapore’s NUS, SUTD, A*STAR have significant PRC researcher populations and PRC government-funded collaborations. SEA-LION (AI Singapore) and similar warrant monitoring.	MEDIUM
RWKV	Architecture created by Bo Peng (PRC-born, international); RWKV Foundation registered outside PRC, but significant PRC community involvement.	MEDIUM

Fine-Tunes and Derivatives

This section addresses the critical problem of provenance laundering — where Chinese base models are fine-tuned, renamed, and redistributed in ways that obscure their origin.

High-Risk Derivative Patterns

Base Model	Common Derivatives / Fine-tune Patterns	Risk
Qwen2.5 (all sizes)	Thousands of fine-tunes on Hugging Face. Many use names that do not reference Qwen. Common patterns: “[username]/[creative-name]-7B”, merged models, GGUF quantizations. Qwen2.5 is the single most fine-tuned Chinese base model.	HIGH — users may unknowingly use Qwen-based model
DeepSeek-R1-Distill	Distilled into Qwen and Llama architectures. The Llama-based distills are particularly deceptive as they appear to be Meta Llama derivatives.	HIGH — double-layered provenance confusion
Yi-34B	Early Chinese open model with extensive fine-tune ecosystem. Many “uncensored” fine-tunes exist.	HIGH
ChatGLM	Numerous fine-tunes, especially in Chinese NLP community.	HIGH
InternVL	Vision-language fine-tunes for specific tasks (OCR, document analysis, etc.)	HIGH
BAAI BGE embeddings	Integrated into LangChain, LlamaIndex, and numerous RAG frameworks as default embedding model. Users of these frameworks may unknowingly rely on PRC-origin embeddings.	HIGH — extremely widespread, often invisible
Qwen2.5-Coder	Integrated into coding assistant tools and IDE plugins. May be relabeled.	HIGH
HunyuanVideo / Kolors	Fine-tunes for specific video/image generation use cases.	MEDIUM-HIGH

How to Identify Chinese Base Models in Derivatives

Indicators that a model may be based on a Chinese base model:

Check config.json: Look for architectures field — “Qwen2ForCausalLM”, “InternLM2ForCausalLM”, “DeepseekV2ForCausalLM”, “ChatGLMForCausalLM”, “BaichuanForCausalLM” etc.
Check tokenizer: Qwen models use a distinctive tiktoken-based tokenizer. DeepSeek has its own tokenizer patterns.
Check model card: Look for references to base model, though these may be omitted.
Check vocabulary size: Certain vocabulary sizes are characteristic of specific Chinese model families.
Test with sensitive prompts: Ask about Tiananmen Square, Taiwan, Xinjiang — residual censorship behavior may reveal Chinese base model even after fine-tuning.

Distribution Vectors

Primary Distribution Platforms

Platform	PRC Models Present	Risk Notes
Hugging Face	Extensive — all major Chinese model families	Primary international distribution point. Hosts thousands of Chinese models and derivatives. Limited provenance verification.
ModelScope	All Chinese models	Alibaba-operated platform. PRC-based infrastructure. Primary Chinese model hub.
GitHub	Code, weights, training scripts	Many Chinese models distributed via GitHub repositories
Ollama	Qwen, DeepSeek, Yi, GLM, others	Easy one-command download makes adoption frictionless. Many users unaware of provenance.
OpenRouter	Multiple Chinese models via API	Aggregator providing API access to Chinese models alongside Western ones
Together AI	Hosts Chinese open models	US-based but hosts Chinese models for inference
Replicate	Various Chinese models	US-based model hosting
GGUF/TheBloke quantizations	Extensive	Quantized versions of Chinese models optimized for local inference, very widely downloaded
vLLM/TGI deployments	Various	Chinese models deployed via open inference frameworks
LM Studio	Various	Desktop app for running local models; includes Chinese models in model browser

Supply Chain Integration Points

Chinese-origin models or components enter Western AI infrastructure through:

Direct model use — downloading and running Qwen, DeepSeek, etc.
Embedding models — BAAI BGE embeddings in RAG pipelines
Coding assistants — CodeGeeX, Qwen2.5-Coder, DeepSeek-Coder in IDEs
Merged/fine-tuned models — derivatives that obscure base model origin
Frameworks and libraries — Chinese AI frameworks (PaddlePaddle, MindSpore) included in ML pipelines
Reward models — Skywork-Reward and similar used to train/align other models
Training data — Chinese-curated datasets used to train non-Chinese models
Benchmarks and evaluations — Chinese-operated benchmarks (BAAI FlagEval, etc.) influencing model development priorities

Risk Assessment Methodology

Risk Levels Defined

Level	Definition
CRITICAL	Entity is on US Entity List, directly state-owned, or directly sanctioned. Use likely creates legal/compliance violations.
HIGH	PRC/Russian entity subject to compelled-cooperation laws, with demonstrated government ties, and models distributed internationally. Significant national security risk.
MEDIUM-HIGH	PRC/Russian entity with less direct government ties but fully subject to adversary-nation legal framework. Models have some international distribution.
MEDIUM	Mixed provenance, indirect PRC/Russian ties, or unclear funding. Warrants monitoring and due diligence.
LOW	Non-adversary origin but with some connection points (e.g., PRC fine-tunes of Western base models).

Risk Factors Weighted

Legal jurisdiction (30%) — Is the entity subject to PRC National Intelligence Law or Russian equivalent?
Government relationship (25%) — State-owned, Entity-Listed, government-funded, party committee?
Distribution reach (20%) — How widely are models distributed in Western ecosystems?
Data exposure (15%) — Do models/APIs route data to adversary-nation servers?
Opacity (10%) — Are weights open (auditable) or closed (unverifiable)?

Consolidated Risk Matrix

#	Model Family	Parent Entity	Country	Risk Level	Weights	Key Risk Factor
1	SenseTime/SenseNova	SenseTime	PRC	CRITICAL	Mixed	US Entity List
2	iFlytek/Spark	iFlytek	PRC	CRITICAL	Closed	US Entity List
3	Huawei/PanGu	Huawei	PRC	CRITICAL	Mixed	US Entity List + sanctions
4	Megvii/YOLOX	Megvii	PRC	CRITICAL	Open	US Entity List
5	TeleChat	China Telecom	PRC	CRITICAL	Open	State-owned enterprise
6	BAAI/Aquila/BGE	BAAI	PRC	CRITICAL	Open	State-funded lab; BGE embeddings ubiquitous
7	Sber/GigaChat	Sberbank	Russia	CRITICAL	Mixed	State-owned bank; sanctioned
8	InternLM/InternVL	Shanghai AI Lab	PRC	HIGH	Open	Government-established lab
9	Qwen	Alibaba	PRC	HIGH	Open	Most-adopted PRC model family in West
10	DeepSeek	DeepSeek/High-Flyer	PRC	HIGH	Open	Massive adoption post-R1; data to PRC servers
11	ERNIE	Baidu	PRC	HIGH	Closed	Deep government integration
12	GLM/ChatGLM	Zhipu AI	PRC	HIGH	Mixed	Tsinghua origins; CodeGeeX in IDEs
13	Yi	01.AI	PRC	HIGH	Open	Wide Western distribution; PRC legal jurisdiction
14	Baichuan	Baichuan Inc.	PRC	HIGH	Mixed	PRC investor network
15	XVERSE	XVERSE Tech	PRC	HIGH	Open	PRC jurisdiction; Tencent alumni
16	Hunyuan	Tencent	PRC	HIGH	Mixed	WeChat operator; HunyuanVideo widely adopted
17	Doubao/Skylark	ByteDance	PRC	HIGH	Mixed	TikTok parent; CCP golden share
18	Step	StepFun	PRC	HIGH	Mixed	GOT-OCR widely adopted; PRC jurisdiction
19	KLING/Kolors	Kuaishou	PRC	HIGH	Mixed	Video generation widely adopted
20	MiniCPM	ModelBest/Tsinghua	PRC	HIGH	Open	On-device deployment focus; Tsinghua ties
21	Skywork	Kunlun Tech	PRC	HIGH	Open	Reward model used in RLHF pipelines
22	Orion	OrionStar	PRC	HIGH	Open	PRC jurisdiction
23	BlueLM	vivo	PRC	HIGH	Open	PRC jurisdiction
24	Fengshenbang/Ziya	IDEA Research	PRC	HIGH	Open	Government-supported institute
25	Moonshot/Kimi	Moonshot AI	PRC	MEDIUM-HIGH	Closed	PRC API; growing international use
26	MiniMax	MiniMax	PRC	MEDIUM-HIGH	Mixed	SenseTime alumni; Hailuo AI video
27	YaLM/YandexGPT	Yandex	Russia	HIGH	Mixed	Kremlin-aligned restructuring
28	MTS AI models	MTS	Russia	HIGH	Various	Russian telecom; Yarovaya Law
29	RWKV	RWKV Foundation	Mixed	MEDIUM	Open	PRC community involvement; non-PRC entity
30	Singapore-origin models	Various	Singapore	MEDIUM	Various	PRC researcher/funding connections

Recommendations

Immediate Actions

Inventory all AI models in use across the organization, including embedded components (embeddings, reward models, tokenizers)
Check for BAAI BGE embeddings — these are the single most widely adopted PRC-origin AI component in Western RAG/search systems, often included as defaults in LangChain, LlamaIndex, and similar frameworks
Audit coding assistants and IDE plugins — check for CodeGeeX, Qwen-Coder, or DeepSeek-Coder integrations
Block API access to PRC-hosted model endpoints (DeepSeek API, Moonshot API, Zhipu API, etc.) from all networks handling sensitive information
Review Ollama and LM Studio installations — users may have downloaded Chinese models for local inference without organizational awareness

Policy Recommendations

Establish a model provenance verification process — before any AI model is deployed, verify its origin by checking architecture identifiers, tokenizer patterns, and model card metadata
Maintain an approved model list — whitelist specific model families and versions that have passed security review
Treat open-weight PRC models as higher risk than closed PRC models for some threat scenarios — while open weights enable auditing, they also enable deployment without any PRC-side logging, which may be preferable for certain use cases, but open weights also enable adversary insight into what the model “knows” and can do
Monitor fine-tune ecosystems — track popular community fine-tunes that may be based on Chinese base models
Engage with Hugging Face and other platforms on provenance labeling and transparency requirements
Treat any model processing classified or sensitive information as requiring provenance verification — no exceptions for “convenience” or “it works well”

What “Open Weights” Does and Does Not Mitigate

Risk	Open Weights Mitigates?	Notes
Backdoors / trojans in weights	Partially	Weights can be scanned but neural network trojans are extremely difficult to detect with current techniques
Censorship / bias patterns	Yes	Can be tested and measured
Data exfiltration via API	Yes	Local inference eliminates network data flow to PRC
Training data poisoning	No	Cannot determine what was in training data from weights alone
Influence over model ecosystem	No	Widely adopted base models shape the entire derivative ecosystem
Supply chain dependency	No	Reliance on PRC base models creates strategic dependency regardless of weight openness
Steganographic communication	No	Models could theoretically embed information in outputs that is not detectable without the key

Document ends.

This catalog should be updated quarterly as the Chinese and Russian AI model ecosystems are evolving rapidly. New model families, variants, and distribution channels emerge continuously.

Table of Contents

Executive Summary

Legal and Regulatory Framework

PRC Laws Relevant to AI Model Risk

Russian Federation Laws

Chinese-Origin Model Catalog

1. Alibaba Cloud / Qwen Family

2. DeepSeek

3. Baidu / ERNIE Family

4. Zhipu AI / GLM Family

5. 01.AI / Yi Family

6. Baichuan

7. InternLM (Shanghai AI Lab)

8. XVERSE

9. Moonshot AI / Kimi

10. MiniMax

11. SenseTime Models

12. iFlytek / Spark

13. Huawei / PanGu

14. Tencent / Hunyuan

15. ByteDance / Doubao

16. StepFun / Step Models

17. Other Chinese-Origin Models

Kuaishou / Kolors & KLING

Zhijiang Lab / various models

ModelBest / MiniCPM

Deepin / deepin-ai

vivo / BlueLM

IDEA-CCNL / Fengshenbang

Colossal-AI / various

Skywork

MAP-Neo

Orion / OrionStar

TeleChat / China Telecom

AquilaChat / BAAI

Megvii / YOLO-variants

WPS AI / Kingsoft

Russian-Origin Model Catalog

18. Sber / GigaChat

19. Yandex / YaLM

20. Other Russian-Origin Models

MTS AI

AIRI (AI Research Institute, Russia)

Models with Unclear or Mixed Provenance

Models Requiring Additional Scrutiny

Fine-Tunes and Derivatives

High-Risk Derivative Patterns

How to Identify Chinese Base Models in Derivatives

Distribution Vectors

Primary Distribution Platforms

Supply Chain Integration Points

Risk Assessment Methodology

Risk Levels Defined

Risk Factors Weighted

Consolidated Risk Matrix

Recommendations

Immediate Actions

Policy Recommendations

What “Open Weights” Does and Does Not Mitigate