Part II · Chapter 109 PROVEN

At-Risk Model Catalog

Comprehensive catalog of adversary-nation origin AI models -- 30+ model families across 17+ organizations with risk assessments.

Table of Contents

  1. Executive Summary
  2. Legal and Regulatory Framework
  3. Chinese-Origin Model Catalog
  4. Russian-Origin Model Catalog
  5. Models with Unclear or Mixed Provenance
  6. Fine-Tunes and Derivatives
  7. Distribution Vectors
  8. Risk Assessment Methodology
  9. Consolidated Risk Matrix
  10. Recommendations

Executive Summary

This document catalogs AI models originating from or substantially influenced by adversary nations, primarily the People’s Republic of China (PRC) and the Russian Federation. It is intended to support risk-informed decision-making within defense, intelligence, and critical-infrastructure contexts.

Key finding: As of early 2026, at least 17 major Chinese organizations are actively distributing large language models (LLMs) and multimodal models through international platforms such as Hugging Face, GitHub, and Ollama. Many of these models achieve frontier-level performance and are being integrated into Western developer toolchains, enterprise products, and open-source projects, often without end-users understanding their provenance.

Critical legal context: Every entity on this list operating within the PRC is subject to:

  • The National Intelligence Law of 2017 (Article 7: all organizations and citizens must support, assist, and cooperate with national intelligence work)
  • The Data Security Law of 2021
  • The Cybersecurity Law of 2017
  • The Counter-Espionage Law (amended 2023)
  • The Personal Information Protection Law of 2021

These laws create a compelled-cooperation framework that has no parallel in Western democracies. Even if a given company has no current intent to serve state intelligence interests, the legal architecture permits the Chinese government to compel cooperation at any time, with no judicial review or public disclosure.


PRC Laws Relevant to AI Model Risk

LawYearKey ProvisionRisk Implication
National Intelligence Law2017Art. 7: Organizations and citizens shall support and cooperate with national intelligence workAny PRC-based AI company can be compelled to embed capabilities, exfiltrate data, or modify model behavior on behalf of state intelligence services
Cybersecurity Law2017Data localization, security review requirementsTraining data, user interaction logs, and telemetry from PRC models may be accessible to government
Data Security Law2021Government access to data classified as “important” or “core”Model weights, training data, and deployment telemetry could be classified as state-relevant data
Personal Information Protection Law2021Cross-border data transfer restrictions, government access provisionsUser data flowing to PRC infrastructure is subject to government access
Counter-Espionage Law (amended)2023Broadened definition of espionage; expanded state access to digital systemsProvides additional legal basis for compelling cooperation from AI companies
Interim Measures for Generative AI2023Models must align with “core socialist values”; providers must conduct security assessmentsModels serving PRC domestic users are explicitly subject to ideological alignment requirements; this demonstrates the government’s willingness and capability to mandate model behavior modification
PRC Civil-Military Fusion StrategyOngoingNational strategy to eliminate barriers between civilian and military technologyAll PRC AI capabilities are, by policy, available for military and intelligence application

Russian Federation Laws

LawKey ProvisionRisk Implication
Yarovaya Law (2016)Telecom data retention and FSB accessInfrastructure hosting Russian models subject to surveillance
Sovereign Internet Law (2019)State control over internet infrastructureRussian AI services can be co-opted for state purposes
Data Localization Law (2015)Personal data of Russian citizens must be stored in RussiaData processed by Russian AI services held under state-accessible infrastructure

Chinese-Origin Model Catalog

1. Alibaba Cloud / Qwen Family

AttributeDetail
Parent OrganizationAlibaba Group / Alibaba Cloud (Tongyi Lab)
Country of OriginPRC (Hangzhou, Zhejiang)
Government TiesAlibaba is subject to intense CCP regulatory oversight. Jack Ma’s public rebuke by regulators in 2020 demonstrated the party’s control over the company. Alibaba Cloud provides cloud services to PRC government entities and military-linked organizations. Alibaba has a CCP party committee embedded within corporate governance.
WeightsOpen (Apache 2.0 for most variants)
DistributionHugging Face, ModelScope, GitHub, Ollama, integrated into numerous third-party products
Risk LevelHIGH

Model Variants:

ModelParametersModalityNotes
Qwen-7B, Qwen-14B, Qwen-72B7B-72BTextOriginal Qwen series
Qwen-1.5 (0.5B to 110B)0.5B-110BTextImproved series with MoE variant
Qwen2 (0.5B to 72B)0.5B-72BTextMajor architecture update
Qwen2.5 (0.5B to 72B)0.5B-72BTextCurrent flagship text series
Qwen2.5-Coder (1.5B to 32B)1.5B-32BCodeSpecialized code generation
Qwen2.5-MathVariousMathMathematical reasoning
QwQ-32B32BText (reasoning)Reasoning-focused model (chain-of-thought)
Qwen-VL, Qwen2-VL, Qwen2.5-VLVariousVision-LanguageMultimodal image understanding
Qwen-Audio, Qwen2-AudioVariousAudio-LanguageAudio understanding and generation
Qwen-AgentVariousAgenticTool-use and agent framework
Qwen2.5-TurboVariousTextOptimized for speed/efficiency
Qwen-LongVariousTextExtended context window

Specific Risk Factors:

  • Qwen is one of the most widely adopted Chinese model families in the West, embedded in hundreds of derivative products
  • Open weights enable integration without attribution, making provenance tracking extremely difficult
  • Alibaba Cloud infrastructure is used for PRC government and military-adjacent workloads
  • Qwen models have been observed to contain CCP-aligned content filtering and censorship behavior (e.g., refusing to discuss Tiananmen Square, Taiwan independence, Xinjiang)
  • Massive derivative ecosystem on Hugging Face (thousands of fine-tunes) obscures the Chinese base model origin
  • Qwen2.5-Coder is being integrated into coding assistants, creating potential for supply-chain influence over software development

2. DeepSeek

AttributeDetail
Parent OrganizationDeepSeek (深度求索), subsidiary/affiliate of High-Flyer Capital Management (幻方量化)
Country of OriginPRC (Hangzhou, Zhejiang)
Government TiesHigh-Flyer is a major Chinese quantitative hedge fund. While not a direct state entity, it operates under PRC law and regulatory oversight. DeepSeek’s rapid capability gains and apparent access to large GPU clusters despite US export controls have raised questions about state support. DeepSeek is subject to all PRC compelled-cooperation laws.
WeightsOpen (MIT License for most variants)
DistributionHugging Face, GitHub, Ollama, DeepSeek API, ModelScope, widely mirrored
Risk LevelHIGH

Model Variants:

ModelParametersModalityNotes
DeepSeek-LLM (7B, 67B)7B-67BTextOriginal series
DeepSeek-V2236B (21B active, MoE)TextMixture-of-Experts architecture
DeepSeek-V2.5236B MoETextMerged chat and code capabilities
DeepSeek-V3671B (37B active, MoE)TextClaimed training cost efficiency breakthrough
DeepSeek-R1671B MoEText (reasoning)Reasoning model, competitive with OpenAI o1
DeepSeek-R1-Distill (various)1.5B-70BText (reasoning)Distilled reasoning models based on Qwen and Llama
DeepSeek-Coder (1.3B-33B)1.3B-33BCodeCode generation and understanding
DeepSeek-Coder-V2236B MoECodeAdvanced code model
DeepSeek-Math (7B)7BMathMathematical reasoning
DeepSeek-VL, DeepSeek-VL2VariousVision-LanguageMultimodal
DeepSeek-ProverVariousMath/ProofTheorem proving
Janus-ProVariousVision-LanguageUnified multimodal model

Specific Risk Factors:

  • DeepSeek-R1 received enormous global media attention in January 2025, driving massive adoption
  • The R1-Distill variants are based on Qwen2.5 and Llama 3 architectures, creating nested provenance concerns
  • DeepSeek’s claimed training cost efficiency ($5.6M for V3) has been questioned; possible undisclosed state subsidies or access to restricted compute
  • DeepSeek API sends data to PRC servers by default
  • Multiple governments (Italy, Australia, South Korea, Taiwan, US federal agencies) have restricted or investigated DeepSeek
  • DeepSeek models exhibit strong CCP-aligned censorship patterns
  • The connection to High-Flyer Capital raises questions about financial data collection interests
  • DeepSeek-R1’s open weights have been rapidly integrated into Western AI infrastructure and products

3. Baidu / ERNIE Family

AttributeDetail
Parent OrganizationBaidu, Inc.
Country of OriginPRC (Beijing)
Government TiesBaidu is deeply integrated with PRC government initiatives. Baidu provides AI services to Chinese government agencies, military, and public security. Baidu CEO Robin Li serves on the Chinese People’s Political Consultative Conference (CPPCC). Baidu Apollo provides autonomous driving technology with PRC government collaboration. Baidu has PRC government AI platform contracts.
WeightsPrimarily closed (API access); some older variants partially open
DistributionBaidu API (ERNIE Bot / Wenxin Yiyan), limited international distribution
Risk LevelHIGH

Model Variants:

ModelNotes
ERNIE 1.0, 2.0, 3.0Earlier knowledge-enhanced models
ERNIE 3.5Mid-generation; widely used in China
ERNIE 4.0Current flagship
ERNIE 4.0 TurboOptimized variant
ERNIE Bot (Wenxin Yiyan)Consumer-facing chatbot application
ERNIE-ViLGText-to-image generation
ERNIE-CodeCode generation
ERNIE-MusicMusic generation
ERNIE-Speed, ERNIE-LiteLightweight variants

Specific Risk Factors:

  • Baidu’s deep PRC government integration makes it among the highest-risk entities
  • ERNIE Bot was one of the first PRC chatbots approved under the Interim Measures for Generative AI, meaning it passed government security and ideological review
  • Primarily closed-weight, meaning behavior is fully controlled by Baidu with no independent verification
  • Baidu’s search engine dominance in China means ERNIE models are trained on massive PRC-curated datasets
  • Less directly distributed in the West than Qwen or DeepSeek, but integrated into products that may reach Western users

4. Zhipu AI / GLM Family

AttributeDetail
Parent OrganizationZhipu AI (智谱AI), spun out of Tsinghua University
Country of OriginPRC (Beijing)
Government TiesOriginated from Tsinghua University, which has deep ties to PRC government and military research. Tsinghua is a key institution in China’s national AI strategy. Zhipu AI has received significant funding from PRC-connected investors.
WeightsMixed (some open, some closed)
DistributionHugging Face, GitHub, ModelScope, Zhipu AI API (BigModel)
Risk LevelHIGH

Model Variants:

ModelParametersNotes
GLM-130B130BOriginal large-scale bilingual model
ChatGLM-6B6BOpen conversational model
ChatGLM2-6B6BImproved version
ChatGLM3-6B6BThird generation
GLM-4VariousCurrent flagship (closed API)
GLM-4-Air, GLM-4-FlashVariousLightweight variants
GLM-4VVariousVision-language multimodal
GLM-4-VoiceVariousVoice capabilities
CogVLM, CogVLM2VariousVisual language model
CogAgentVariousGUI-interaction agent
CogVideo, CogVideoXVariousVideo generation
CodeGeeX (1-4)VariousCode generation (widely distributed as IDE plugin)

Specific Risk Factors:

  • Tsinghua University provenance means direct academic-military complex ties
  • CodeGeeX is distributed as IDE plugins (VS Code, JetBrains), creating a direct vector into developer environments
  • CogVLM/CogAgent models designed for GUI interaction raise concerns about agentic AI capabilities being PRC-controlled
  • GLM-4 API processes data on PRC servers
  • ChatGLM series was among the earliest widely-adopted Chinese open models, establishing a large derivative ecosystem

5. 01.AI / Yi Family

AttributeDetail
Parent Organization01.AI (零一万物)
Country of OriginPRC (Beijing)
Government TiesFounded by Kai-Fu Lee, former president of Google China. While Lee has international profile, 01.AI operates in Beijing under PRC law. Funded by PRC-based investors including Alibaba. Subject to all PRC compelled-cooperation laws.
WeightsOpen (Apache 2.0 for most variants)
DistributionHugging Face, GitHub, ModelScope, Ollama
Risk LevelHIGH

Model Variants:

ModelParametersNotes
Yi-6B, Yi-34B6B-34BOriginal series
Yi-1.5 (6B, 9B, 34B)6B-34BImproved series
Yi-LargeUndisclosed (large)Flagship closed model
Yi-Medium, Yi-SparkVariousMid-tier models
Yi-VL (6B, 34B)6B-34BVision-language
Yi-Coder (1.5B, 9B)1.5B-9BCode generation
Yi-LightningVariousOptimized for speed

Specific Risk Factors:

  • Kai-Fu Lee’s international reputation and English-language visibility may create a false sense of security about PRC legal obligations
  • Yi-34B was among the first competitive Chinese open models and has a large derivative ecosystem
  • Open weights widely distributed on Western platforms
  • 01.AI’s PRC headquarters and funding sources place it firmly within the PRC regulatory and intelligence framework

6. Baichuan

AttributeDetail
Parent OrganizationBaichuan Inc. (百川智能)
Country of OriginPRC (Beijing)
Government TiesFounded by Wang Xiaochuan, former CEO of Sogou (search engine). Operates under PRC law. Received investment from PRC-connected sources including Tencent and Alibaba.
WeightsOpen (earlier models), mixed (later models)
DistributionHugging Face, GitHub, ModelScope
Risk LevelHIGH

Model Variants:

ModelParametersNotes
Baichuan-7B, Baichuan-13B7B-13BOriginal series
Baichuan2 (7B, 13B)7B-13BSecond generation
Baichuan-53B53BLarger model
Baichuan3VariousThird generation
Baichuan4VariousCurrent generation (primarily API)

Specific Risk Factors:

  • Open weights for earlier models widely circulated
  • Sogou heritage means deep PRC internet ecosystem ties
  • Backed by major PRC tech conglomerates (Tencent, Alibaba)
  • Subject to PRC generative AI regulations and ideological alignment requirements

7. InternLM (Shanghai AI Lab)

AttributeDetail
Parent OrganizationShanghai Artificial Intelligence Laboratory (上海人工智能实验室)
Country of OriginPRC (Shanghai)
Government TiesShanghai AI Lab is a government-established and government-funded research institution. It was created as part of China’s national AI strategy. Its leadership includes senior PRC academic and government-connected figures. It collaborates with multiple PRC universities with military research ties (Tsinghua, SJTU, etc.). This is effectively a state laboratory.
WeightsOpen (Apache 2.0)
DistributionHugging Face, GitHub, ModelScope, OpenXLab
Risk LevelHIGH

Model Variants:

ModelParametersNotes
InternLM (7B, 20B)7B-20BOriginal series
InternLM2 (1.8B-20B)1.8B-20BSecond generation
InternLM2.5 (7B)7BCurrent open model
InternLM-XComposer (1, 2, 2.5)VariousVision-language composition
InternLM-MathVariousMathematical reasoning
InternVL (1.0, 1.5, 2.0, 2.5)VariousVision-language (leading open VLM)
InternLM2-ChatVariousConversational variants

Specific Risk Factors:

  • This is the most directly government-linked model family on this list — Shanghai AI Lab is a state institution
  • InternVL is one of the leading open vision-language models and is widely used as a base for fine-tunes
  • Open weights distributed internationally through Hugging Face and GitHub
  • InternLM-XComposer models have advanced document and image understanding capabilities
  • Extensive collaborative network with PRC military-linked universities

8. XVERSE

AttributeDetail
Parent OrganizationXVERSE Technology (元象科技)
Country of OriginPRC (Shenzhen)
Government TiesFounded by former Tencent employees. Operates under PRC law. Less prominent government ties than some others but fully subject to PRC legal framework.
WeightsOpen
DistributionHugging Face, ModelScope
Risk LevelHIGH

Model Variants:

ModelParametersNotes
XVERSE-7B7BBase model
XVERSE-13B13BMid-size model
XVERSE-65B65BLarge model
XVERSE-MoE-A4.2B~256B (4.2B active)Mixture of Experts

Specific Risk Factors:

  • Tencent alumni connections
  • Open weights on international platforms
  • Less scrutiny due to lower profile compared to Qwen/DeepSeek increases the risk that provenance is overlooked

9. Moonshot AI / Kimi

AttributeDetail
Parent OrganizationMoonshot AI (月之暗面)
Country of OriginPRC (Beijing)
Government TiesFounded by Yang Zhilin, a prominent PRC AI researcher. Backed by major PRC investors including Alibaba, HongShan (formerly Sequoia China), and others. Subject to PRC law.
WeightsClosed (API only)
DistributionKimi API, Kimi Chat application
Risk LevelMEDIUM-HIGH

Model Variants:

ModelNotes
Moonshot-v1 (8k, 32k, 128k)Various context lengths
Kimi (consumer product)Chatbot with extremely long context window
Kimi k1.5Reasoning model

Specific Risk Factors:

  • Closed weights mean behavior cannot be independently verified
  • Kimi’s extremely long context window (originally claimed 2M tokens) means users may input large volumes of sensitive text
  • API processes data on PRC servers
  • Lower international adoption than Qwen/DeepSeek reduces direct risk to Western users, but API integration is growing

10. MiniMax

AttributeDetail
Parent OrganizationMiniMax (稀宇科技)
Country of OriginPRC (Shanghai)
Government TiesFounded by former SenseTime employees. Backed by Tencent and other PRC investors. Subject to PRC law.
WeightsMixed (some open, primarily closed)
DistributionMiniMax API, Hailuo AI (consumer product), Hugging Face (select models)
Risk LevelMEDIUM-HIGH

Model Variants:

ModelNotes
MiniMax-abab5, abab5.5, abab6, abab6.5Successive generations
MiniMax-Text-01456B MoE, open weights
MiniMax-VL-01Vision-language, open weights
Hailuo AI VideoVideo generation model
MiniMax-SpeechSpeech synthesis
MiniMax-MusicMusic generation

Specific Risk Factors:

  • SenseTime heritage (SenseTime is on the US Entity List)
  • Hailuo AI video generation has gained significant international adoption
  • MiniMax-Text-01 with open weights is a very large MoE model distributed internationally
  • Multimodal capabilities (video, speech, music) expand attack surface

11. SenseTime Models

AttributeDetail
Parent OrganizationSenseTime Group (商汤科技)
Country of OriginPRC (Shanghai/Hong Kong)
Government TiesSenseTime is on the US Entity List (Bureau of Industry and Security). Sanctioned by the US Treasury Department. Provides surveillance technology to PRC government and security services. Directly implicated in Xinjiang surveillance infrastructure. Has military and public security contracts.
WeightsMixed
DistributionSenseTime API (SenseNova), limited international distribution
Risk LevelCRITICAL / HIGH

Model Variants:

ModelNotes
SenseNova 5.0, 5.5Current flagship LLM
SenseChatConversational AI
SenseNova Raccoon (日日新)Large language model series
SenseTime image/video generation modelsVarious multimodal

Specific Risk Factors:

  • US Entity List designation makes any use potentially subject to export control violations
  • Direct involvement in PRC surveillance and human rights abuses
  • Military and public security contracts
  • Despite sanctions, technology may circulate through derivatives or unlabeled integrations
  • Any integration with SenseTime technology may create sanctions compliance liability

12. iFlytek / Spark

AttributeDetail
Parent OrganizationiFlytek (科大讯飞)
Country of OriginPRC (Hefei, Anhui)
Government TiesiFlytek is on the US Entity List. Provides voice recognition and AI technology to PRC government, military, and public security. Implicated in Xinjiang surveillance. Deep ties to the University of Science and Technology of China (USTC), a key PRC defense research institution.
WeightsClosed
DistributioniFlytek API, primarily domestic PRC market
Risk LevelCRITICAL / HIGH

Model Variants:

ModelNotes
Spark (Xinghuo) v1, v1.5, v2, v3, v3.5, v4Successive LLM generations
Spark-Lite, Spark-Pro, Spark-Max, Spark-UltraTiered model offerings
iFlytek voice/speech modelsIndustry-leading Chinese speech recognition

Specific Risk Factors:

  • US Entity List designation
  • Core competency in voice/speech AI creates unique surveillance risk
  • Deep PRC military and intelligence community ties through USTC
  • Speech recognition technology deployed in PRC public security and surveillance systems
  • Domestic-focused but technologies may be embedded in products that reach international markets

13. Huawei / PanGu

AttributeDetail
Parent OrganizationHuawei Technologies
Country of OriginPRC (Shenzhen)
Government TiesHuawei is on the US Entity List and subject to extensive sanctions. Widely assessed by Western intelligence agencies as having close ties to PRC military and intelligence services. Subject to the most extensive US technology restrictions of any Chinese company.
WeightsPrimarily closed; some research releases
DistributionHuawei Cloud, limited international academic distribution
Risk LevelCRITICAL / HIGH

Model Variants:

ModelNotes
PanGu-Alpha (2.6B-200B)Early large-scale LLM
PanGu-SigmaTrillion-parameter MoE
PanGu-CoderCode generation
PanGu-WeatherWeather prediction (Nature-published)
PanGu-DrugDrug discovery
Huawei Cloud Pangu LLM 3.0, 5.0Enterprise LLM offering

Specific Risk Factors:

  • Most heavily sanctioned Chinese technology company
  • Intelligence agency assessments across Five Eyes nations have flagged Huawei as a national security risk
  • Huawei develops its own AI accelerator chips (Ascend series), creating a PRC-controlled full-stack AI ecosystem
  • Any use of Huawei AI models may create sanctions compliance violations
  • PanGu models may be embedded in Huawei network equipment and telecommunications infrastructure

14. Tencent / Hunyuan

AttributeDetail
Parent OrganizationTencent Holdings
Country of OriginPRC (Shenzhen)
Government TiesTencent operates WeChat, China’s dominant messaging/social platform, which is extensively monitored and censored by PRC government. Tencent has CCP party committee within corporate governance. Subject to intense PRC regulatory oversight. WeChat data is accessible to PRC security services.
WeightsMixed (some open, primarily closed)
DistributionTencent Cloud API, Hugging Face (select models), GitHub
Risk LevelHIGH

Model Variants:

ModelNotes
Hunyuan-LLMText generation
Hunyuan-Large (389B MoE)Open-weight large MoE model
Hunyuan-A13BEfficient variant
HunyuanDiTImage generation (diffusion transformer)
HunyuanVideoVideo generation (open weights, widely adopted)
Hunyuan3D3D generation
Hunyuan-TurboSReasoning model

Specific Risk Factors:

  • Tencent’s role as operator of WeChat gives it unique data access and PRC government integration
  • HunyuanVideo is one of the leading open video generation models and has been widely adopted internationally
  • Hunyuan-Large open weights distributed on Hugging Face
  • Tencent Cloud is a major PRC cloud provider with government contracts
  • Tencent’s gaming and media empire means Hunyuan models may be embedded in entertainment products consumed globally

15. ByteDance / Doubao

AttributeDetail
Parent OrganizationByteDance Ltd.
Country of OriginPRC (Beijing), with significant international operations
Government TiesByteDance operates TikTok, subject to ongoing US national security concerns and potential ban/divestiture. ByteDance has a CCP party committee and “golden share” arrangement giving PRC government a board seat in a key domestic entity. Subject to intense PRC regulatory oversight. ByteDance editors were reportedly directed by PRC government to suppress certain content.
WeightsPrimarily closed; select open releases
DistributionVolcengine (Volcano Engine) API, limited Hugging Face presence
Risk LevelHIGH

Model Variants:

ModelNotes
Doubao (豆包) / SkylarkPrimary LLM family
Doubao-pro, Doubao-liteTiered offerings
Doubao-VisionMultimodal
ByteDance SDXL variantsImage generation
Emu (image generation research)Research model

Specific Risk Factors:

  • TikTok controversy demonstrates the PRC government influence over ByteDance
  • CCP “golden share” arrangement provides direct government governance participation
  • Volcengine cloud platform processes data on PRC infrastructure
  • ByteDance’s massive international user base through TikTok creates potential for AI model deployment at scale to Western users
  • Less open-weight distribution than some competitors, but API access routes data to PRC

16. StepFun / Step Models

AttributeDetail
Parent OrganizationStepFun (阶跃星辰)
Country of OriginPRC (Shanghai)
Government TiesFounded by Jiang Daxin, former Microsoft Research Asia executive. Backed by major PRC investors. Subject to PRC law.
WeightsMixed (some open)
DistributionStep API, Hugging Face (select models)
Risk LevelHIGH

Model Variants:

ModelNotes
Step-1200B+ parameter model
Step-1VVision-language
Step-2Next generation
Step-1.5VImproved vision model
GOT-OCROpen-source OCR model (widely adopted)

Specific Risk Factors:

  • GOT-OCR (General OCR Theory) model has been very widely adopted for document processing — users may not realize PRC provenance
  • Microsoft Research Asia alumni connections
  • Growing international distribution

17. Other Chinese-Origin Models

Kuaishou / Kolors & KLING

AttributeDetail
Parent OrganizationKuaishou Technology (快手)
CountryPRC
ModelsKolors (image generation), KLING (video generation)
WeightsMixed
DistributionHugging Face, API
Risk LevelHIGH
NotesKuaishou is a major PRC short-video platform. KLING video generation has gained significant international adoption. Kolors image model open on Hugging Face.

Zhijiang Lab / various models

AttributeDetail
Parent OrganizationZhejiang Lab (之江实验室)
CountryPRC
Government TiesGovernment-established research laboratory (Zhejiang provincial government)
Risk LevelHIGH
NotesState laboratory producing various AI research models

ModelBest / MiniCPM

AttributeDetail
Parent OrganizationModelBest (面壁智能), associated with Tsinghua University
CountryPRC
ModelsMiniCPM (2B-4B), MiniCPM-V (vision), MiniCPM-o (omni), OmniLMM
WeightsOpen
DistributionHugging Face, GitHub
Risk LevelHIGH
NotesSmall efficient models specifically designed for on-device deployment. Tsinghua University affiliation. MiniCPM-V is widely used for mobile vision applications. On-device deployment means models run locally but still originate from PRC institution.

Deepin / deepin-ai

AttributeDetail
Parent OrganizationUniontech (统信软件), PRC Linux distribution maker
CountryPRC
ModelsVarious AI integrations in Deepin Linux
Risk LevelMEDIUM

vivo / BlueLM

AttributeDetail
Parent Organizationvivo Mobile Communication
CountryPRC
ModelsBlueLM-7B
WeightsOpen
DistributionHugging Face
Risk LevelHIGH

IDEA-CCNL / Fengshenbang

AttributeDetail
Parent OrganizationIDEA Research (International Digital Economy Academy), Shenzhen
CountryPRC
ModelsZiya-LLaMA, Fengshenbang series, Taiyi (image)
WeightsOpen
DistributionHugging Face
Risk LevelHIGH
NotesGovernment-supported research institute in Shenzhen. Led by Harry Shum, former EVP of Microsoft.

Colossal-AI / various

AttributeDetail
Parent OrganizationHPC-AI Tech, associated with National University of Singapore but with significant PRC founder involvement
CountrySingapore / PRC mixed
ModelsColossalChat, open-source training frameworks
Risk LevelMEDIUM

Skywork

AttributeDetail
Parent OrganizationKunlun Tech (昆仑万维)
CountryPRC (Beijing)
ModelsSkywork-13B, Skywork-MoE, Skywork-Math, Skywork-Reward
WeightsOpen
DistributionHugging Face
Risk LevelHIGH
NotesSkywork-Reward model widely used for RLHF reward modeling — could influence alignment of other models.

MAP-Neo

AttributeDetail
Parent OrganizationM-A-P consortium (multi-institutional, PRC-heavy)
CountryPRC-led
ModelsMAP-Neo-7B
WeightsOpen
DistributionHugging Face
Risk LevelMEDIUM-HIGH

Orion / OrionStar

AttributeDetail
Parent OrganizationOrionStar (猎户星空) — Cheetah Mobile subsidiary
CountryPRC
ModelsOrion-14B
WeightsOpen
DistributionHugging Face
Risk LevelHIGH

TeleChat / China Telecom

AttributeDetail
Parent OrganizationChina Telecom (中国电信)
CountryPRC
Government TiesState-owned enterprise
ModelsTeleChat (7B, 12B, 52B, 115B)
WeightsOpen
DistributionHugging Face, ModelScope
Risk LevelCRITICAL / HIGH
NotesDirectly state-owned. Model used in telecommunications infrastructure.

AquilaChat / BAAI

AttributeDetail
Parent OrganizationBeijing Academy of Artificial Intelligence (BAAI / 北京智源人工智能研究院)
CountryPRC
Government TiesGovernment-established and government-funded research institution (Beijing municipal government). Led by major PRC AI strategy figures.
ModelsAquila (7B, 34B), AquilaChat, FlagAlpha series, BGE (embedding models), EVA (vision)
WeightsOpen
DistributionHugging Face, ModelScope
Risk LevelCRITICAL / HIGH
NotesBAAI is a key institution in PRC national AI strategy. BGE embedding models are among the most widely used embedding models globally — they are embedded in countless RAG systems, often without users understanding PRC provenance. BAAI also maintains FlagEval (benchmark) and FlagOpen (open-source platform).

Megvii / YOLO-variants

AttributeDetail
Parent OrganizationMegvii (旷视科技)
CountryPRC
Government TiesOn the US Entity List. Provides facial recognition to PRC government/security.
ModelsYOLOX, various computer vision models
WeightsOpen (research releases)
Risk LevelCRITICAL / HIGH

WPS AI / Kingsoft

AttributeDetail
Parent OrganizationKingsoft Office (金山办公)
CountryPRC
ModelsWPS AI (integrated into WPS Office)
Risk LevelHIGH
NotesWPS Office has significant international user base. AI features process documents on PRC infrastructure.

Russian-Origin Model Catalog

18. Sber / GigaChat

AttributeDetail
Parent OrganizationSberbank (Сбербанк) / SberDevices
Country of OriginRussian Federation
Government TiesSberbank is majority-owned by the Russian government (via the National Wealth Fund / Central Bank). Sberbank is subject to extensive Western sanctions following Russia’s invasion of Ukraine. Russian government directly controls Sberbank’s strategy.
WeightsPrimarily closed
DistributionGigaChat API (Russia-focused), limited international
Risk LevelCRITICAL / HIGH

Model Variants:

ModelNotes
GigaChatConsumer-facing LLM
GigaChat-Pro, GigaChat-MaxTiered offerings
ruGPT-3, ruGPT-3.5Russian-language GPT models (some open)
mGPTMultilingual model
Kandinsky (2.x, 3.x)Image generation (some open weights on Hugging Face)

Specific Risk Factors:

  • State-owned bank — direct Russian government entity
  • Subject to comprehensive Western sanctions
  • Any use may create sanctions compliance violations
  • Kandinsky image models have been distributed on Hugging Face with open weights, potentially obscuring Russian government provenance
  • Russian intelligence services (FSB, GRU, SVR) have direct legal authority over Sberbank

19. Yandex / YaLM

AttributeDetail
Parent OrganizationYandex (now restructured; Russian operations transferred to new entity)
Country of OriginRussian Federation
Government TiesYandex has been subject to increasing Russian government pressure and restructuring. Russian operations were transferred to a consortium with Kremlin-connected ownership. Subject to Russian data and security laws.
WeightsMixed (YaLM-100B was released open)
DistributionHugging Face (historical), GitHub
Risk LevelHIGH

Model Variants:

ModelNotes
YaLM-100B100B parameter model (open weights)
YandexGPT (1, 2, 3, 4)Successive LLM generations (closed)
Alice / AlisaVoice assistant AI

Specific Risk Factors:

  • YaLM-100B open weights remain available on Hugging Face
  • Yandex restructuring placed Russian operations under more Kremlin-aligned ownership
  • YandexGPT processes data on Russian servers
  • Yandex’s dominant position in Russian search means training data reflects Russian information environment

20. Other Russian-Origin Models

MTS AI

AttributeDetail
Parent OrganizationMTS (Mobile TeleSystems)
CountryRussian Federation
Government TiesMajor Russian telecom, subject to Russian government oversight and Yarovaya Law
Risk LevelHIGH

AIRI (AI Research Institute, Russia)

AttributeDetail
Parent OrganizationAIRI
CountryRussian Federation
ModelsVarious research models
Risk LevelMEDIUM-HIGH

Models with Unclear or Mixed Provenance

Models Requiring Additional Scrutiny

Model/FamilyConcernRisk Level
Stability AI (various models)Significant investment from PRC-linked sources; international team but complex funding structure. Now largely open-source.MEDIUM
Mistral (French)European company, but has explored partnerships with PRC entities. Mistral models are frequently fine-tuned by PRC groups. Monitor.LOW (base), MEDIUM (PRC fine-tunes)
Falcon (UAE/TII)Technology Innovation Institute is UAE government-funded. UAE is not an adversary but has complex relationships with PRC.LOW-MEDIUM
Jamba (AI21, Israel)Legitimate provenance, but note that some fine-tunes on Hugging Face may be PRC-sourced.LOW
Various Singapore-based modelsSingapore’s NUS, SUTD, A*STAR have significant PRC researcher populations and PRC government-funded collaborations. SEA-LION (AI Singapore) and similar warrant monitoring.MEDIUM
RWKVArchitecture created by Bo Peng (PRC-born, international); RWKV Foundation registered outside PRC, but significant PRC community involvement.MEDIUM

Fine-Tunes and Derivatives

This section addresses the critical problem of provenance laundering — where Chinese base models are fine-tuned, renamed, and redistributed in ways that obscure their origin.

High-Risk Derivative Patterns

Base ModelCommon Derivatives / Fine-tune PatternsRisk
Qwen2.5 (all sizes)Thousands of fine-tunes on Hugging Face. Many use names that do not reference Qwen. Common patterns: “[username]/[creative-name]-7B”, merged models, GGUF quantizations. Qwen2.5 is the single most fine-tuned Chinese base model.HIGH — users may unknowingly use Qwen-based model
DeepSeek-R1-DistillDistilled into Qwen and Llama architectures. The Llama-based distills are particularly deceptive as they appear to be Meta Llama derivatives.HIGH — double-layered provenance confusion
Yi-34BEarly Chinese open model with extensive fine-tune ecosystem. Many “uncensored” fine-tunes exist.HIGH
ChatGLMNumerous fine-tunes, especially in Chinese NLP community.HIGH
InternVLVision-language fine-tunes for specific tasks (OCR, document analysis, etc.)HIGH
BAAI BGE embeddingsIntegrated into LangChain, LlamaIndex, and numerous RAG frameworks as default embedding model. Users of these frameworks may unknowingly rely on PRC-origin embeddings.HIGH — extremely widespread, often invisible
Qwen2.5-CoderIntegrated into coding assistant tools and IDE plugins. May be relabeled.HIGH
HunyuanVideo / KolorsFine-tunes for specific video/image generation use cases.MEDIUM-HIGH

How to Identify Chinese Base Models in Derivatives

Indicators that a model may be based on a Chinese base model:

  1. Check config.json: Look for architectures field — “Qwen2ForCausalLM”, “InternLM2ForCausalLM”, “DeepseekV2ForCausalLM”, “ChatGLMForCausalLM”, “BaichuanForCausalLM” etc.
  2. Check tokenizer: Qwen models use a distinctive tiktoken-based tokenizer. DeepSeek has its own tokenizer patterns.
  3. Check model card: Look for references to base model, though these may be omitted.
  4. Check vocabulary size: Certain vocabulary sizes are characteristic of specific Chinese model families.
  5. Test with sensitive prompts: Ask about Tiananmen Square, Taiwan, Xinjiang — residual censorship behavior may reveal Chinese base model even after fine-tuning.

Distribution Vectors

Primary Distribution Platforms

PlatformPRC Models PresentRisk Notes
Hugging FaceExtensive — all major Chinese model familiesPrimary international distribution point. Hosts thousands of Chinese models and derivatives. Limited provenance verification.
ModelScopeAll Chinese modelsAlibaba-operated platform. PRC-based infrastructure. Primary Chinese model hub.
GitHubCode, weights, training scriptsMany Chinese models distributed via GitHub repositories
OllamaQwen, DeepSeek, Yi, GLM, othersEasy one-command download makes adoption frictionless. Many users unaware of provenance.
OpenRouterMultiple Chinese models via APIAggregator providing API access to Chinese models alongside Western ones
Together AIHosts Chinese open modelsUS-based but hosts Chinese models for inference
ReplicateVarious Chinese modelsUS-based model hosting
GGUF/TheBloke quantizationsExtensiveQuantized versions of Chinese models optimized for local inference, very widely downloaded
vLLM/TGI deploymentsVariousChinese models deployed via open inference frameworks
LM StudioVariousDesktop app for running local models; includes Chinese models in model browser

Supply Chain Integration Points

Chinese-origin models or components enter Western AI infrastructure through:

  1. Direct model use — downloading and running Qwen, DeepSeek, etc.
  2. Embedding models — BAAI BGE embeddings in RAG pipelines
  3. Coding assistants — CodeGeeX, Qwen2.5-Coder, DeepSeek-Coder in IDEs
  4. Merged/fine-tuned models — derivatives that obscure base model origin
  5. Frameworks and libraries — Chinese AI frameworks (PaddlePaddle, MindSpore) included in ML pipelines
  6. Reward models — Skywork-Reward and similar used to train/align other models
  7. Training data — Chinese-curated datasets used to train non-Chinese models
  8. Benchmarks and evaluations — Chinese-operated benchmarks (BAAI FlagEval, etc.) influencing model development priorities

Risk Assessment Methodology

Risk Levels Defined

LevelDefinition
CRITICALEntity is on US Entity List, directly state-owned, or directly sanctioned. Use likely creates legal/compliance violations.
HIGHPRC/Russian entity subject to compelled-cooperation laws, with demonstrated government ties, and models distributed internationally. Significant national security risk.
MEDIUM-HIGHPRC/Russian entity with less direct government ties but fully subject to adversary-nation legal framework. Models have some international distribution.
MEDIUMMixed provenance, indirect PRC/Russian ties, or unclear funding. Warrants monitoring and due diligence.
LOWNon-adversary origin but with some connection points (e.g., PRC fine-tunes of Western base models).

Risk Factors Weighted

  1. Legal jurisdiction (30%) — Is the entity subject to PRC National Intelligence Law or Russian equivalent?
  2. Government relationship (25%) — State-owned, Entity-Listed, government-funded, party committee?
  3. Distribution reach (20%) — How widely are models distributed in Western ecosystems?
  4. Data exposure (15%) — Do models/APIs route data to adversary-nation servers?
  5. Opacity (10%) — Are weights open (auditable) or closed (unverifiable)?

Consolidated Risk Matrix

#Model FamilyParent EntityCountryRisk LevelWeightsKey Risk Factor
1SenseTime/SenseNovaSenseTimePRCCRITICALMixedUS Entity List
2iFlytek/SparkiFlytekPRCCRITICALClosedUS Entity List
3Huawei/PanGuHuaweiPRCCRITICALMixedUS Entity List + sanctions
4Megvii/YOLOXMegviiPRCCRITICALOpenUS Entity List
5TeleChatChina TelecomPRCCRITICALOpenState-owned enterprise
6BAAI/Aquila/BGEBAAIPRCCRITICALOpenState-funded lab; BGE embeddings ubiquitous
7Sber/GigaChatSberbankRussiaCRITICALMixedState-owned bank; sanctioned
8InternLM/InternVLShanghai AI LabPRCHIGHOpenGovernment-established lab
9QwenAlibabaPRCHIGHOpenMost-adopted PRC model family in West
10DeepSeekDeepSeek/High-FlyerPRCHIGHOpenMassive adoption post-R1; data to PRC servers
11ERNIEBaiduPRCHIGHClosedDeep government integration
12GLM/ChatGLMZhipu AIPRCHIGHMixedTsinghua origins; CodeGeeX in IDEs
13Yi01.AIPRCHIGHOpenWide Western distribution; PRC legal jurisdiction
14BaichuanBaichuan Inc.PRCHIGHMixedPRC investor network
15XVERSEXVERSE TechPRCHIGHOpenPRC jurisdiction; Tencent alumni
16HunyuanTencentPRCHIGHMixedWeChat operator; HunyuanVideo widely adopted
17Doubao/SkylarkByteDancePRCHIGHMixedTikTok parent; CCP golden share
18StepStepFunPRCHIGHMixedGOT-OCR widely adopted; PRC jurisdiction
19KLING/KolorsKuaishouPRCHIGHMixedVideo generation widely adopted
20MiniCPMModelBest/TsinghuaPRCHIGHOpenOn-device deployment focus; Tsinghua ties
21SkyworkKunlun TechPRCHIGHOpenReward model used in RLHF pipelines
22OrionOrionStarPRCHIGHOpenPRC jurisdiction
23BlueLMvivoPRCHIGHOpenPRC jurisdiction
24Fengshenbang/ZiyaIDEA ResearchPRCHIGHOpenGovernment-supported institute
25Moonshot/KimiMoonshot AIPRCMEDIUM-HIGHClosedPRC API; growing international use
26MiniMaxMiniMaxPRCMEDIUM-HIGHMixedSenseTime alumni; Hailuo AI video
27YaLM/YandexGPTYandexRussiaHIGHMixedKremlin-aligned restructuring
28MTS AI modelsMTSRussiaHIGHVariousRussian telecom; Yarovaya Law
29RWKVRWKV FoundationMixedMEDIUMOpenPRC community involvement; non-PRC entity
30Singapore-origin modelsVariousSingaporeMEDIUMVariousPRC researcher/funding connections

Recommendations

Immediate Actions

  1. Inventory all AI models in use across the organization, including embedded components (embeddings, reward models, tokenizers)
  2. Check for BAAI BGE embeddings — these are the single most widely adopted PRC-origin AI component in Western RAG/search systems, often included as defaults in LangChain, LlamaIndex, and similar frameworks
  3. Audit coding assistants and IDE plugins — check for CodeGeeX, Qwen-Coder, or DeepSeek-Coder integrations
  4. Block API access to PRC-hosted model endpoints (DeepSeek API, Moonshot API, Zhipu API, etc.) from all networks handling sensitive information
  5. Review Ollama and LM Studio installations — users may have downloaded Chinese models for local inference without organizational awareness

Policy Recommendations

  1. Establish a model provenance verification process — before any AI model is deployed, verify its origin by checking architecture identifiers, tokenizer patterns, and model card metadata
  2. Maintain an approved model list — whitelist specific model families and versions that have passed security review
  3. Treat open-weight PRC models as higher risk than closed PRC models for some threat scenarios — while open weights enable auditing, they also enable deployment without any PRC-side logging, which may be preferable for certain use cases, but open weights also enable adversary insight into what the model “knows” and can do
  4. Monitor fine-tune ecosystems — track popular community fine-tunes that may be based on Chinese base models
  5. Engage with Hugging Face and other platforms on provenance labeling and transparency requirements
  6. Treat any model processing classified or sensitive information as requiring provenance verification — no exceptions for “convenience” or “it works well”

What “Open Weights” Does and Does Not Mitigate

RiskOpen Weights Mitigates?Notes
Backdoors / trojans in weightsPartiallyWeights can be scanned but neural network trojans are extremely difficult to detect with current techniques
Censorship / bias patternsYesCan be tested and measured
Data exfiltration via APIYesLocal inference eliminates network data flow to PRC
Training data poisoningNoCannot determine what was in training data from weights alone
Influence over model ecosystemNoWidely adopted base models shape the entire derivative ecosystem
Supply chain dependencyNoReliance on PRC base models creates strategic dependency regardless of weight openness
Steganographic communicationNoModels could theoretically embed information in outputs that is not detectable without the key

Document ends.

This catalog should be updated quarterly as the Chinese and Russian AI model ecosystems are evolving rapidly. New model families, variants, and distribution channels emerge continuously.