DeepSeek — 深度求索

DeepSeek（深度求索）是一家中国的人工智能研究公司，致力于开发大语言模型（LLM）和多模态系统。该公司因其开源模型权重和高经济效益而广为人知，其举措在2024年底至2025年初引发了人工智能市场的价格调整。^[1]

历史

DeepSeek的创始人是企业家及对冲基金“幻方量化”（High‑Flyer）的联合创始人梁文峰。2023年春，High‑Flyer将其人工智能研究部门分拆出来，该部门于同年5月成立为“DeepSeek AI”公司。到2025年，员工人数已增至约160人。^[2] 从成立之初，公司就宣称其开放路线——在宽松的许可证下发布模型权重（“open‑weight”），并专注于AGI的基础研究。

与大多数初创公司不同，DeepSeek的资金来自High-Flyer的研发预算，其创始人表示，这使得公司能够专注于长期目标，而非眼前的商业变现。^[3]

2025年1月，该公司发布DeepSeek-R1模型后，在科技界和金融界引起了巨大反响。其声称训练一个与GPT-4相媲美的模型成本不到600万美元（而GPT-4的估计成本为1亿美元以上），这一消息导致科技巨头股价暴跌，并促使整个行业重新思考“更多计算=更好模型”的范式。^[4]

架构特点

Mixture‑of‑Experts (MoE) (DeepSeekMoE): DeepSeek的大多数旗舰模型都使用混合专家（MoE）架构。与“密集”模型（处理请求时所有参数都会被激活）不同，在MoE模型中，每个token只激活一小部分专门化的子网络（即“专家”）。DeepSeek开发了自己实现的MoE，具有“共享”专家、细粒度分段和无需辅助损失的负载均衡功能，这使得模型只需激活数千亿参数中的一部分，从而大幅降低计算成本。^[5]
Multi‑Head Latent Attention (MLA) - 多头潜在注意力: 一种将KV缓存压缩为潜在向量的方法，可节省高达93%的内存，并支持高达128,000个token的上下文窗口。这项技术是高效处理长文本的关键。^[6]
FP8 训练与 Multi‑Token Prediction (多词元预测): V3系列模型采用FP8混合精度（8位浮点数）和多词元预测技术，从而加速了训练和推理过程。^[7]

模型家族

DeepSeek LLM — 包含70亿和670亿参数的基础模型（2023年），首个双语（中/英）版本，在多项任务上超越了LLaMA‑2 70B。^[8]
DeepSeek‑Coder (2023) — 专为编程设计的模型系列（13亿至330亿参数），及其后续版本Coder‑V2（160亿 / 2360亿 MoE，上下文长度128K，支持338种编程语言）。^[9]
DeepSeek‑V2 (2024年5月) — 2360亿参数（210亿激活）的MoE‑LLM，采用MLA架构；在8.1万亿词元上进行训练。^[10]
DeepSeek‑V3 (2024年12月) — 6710亿参数（370亿激活）；使用Nvidia H800训练约280万GPU小时，成本约550万美元。^[11]
DeepSeek‑R1 (2025年1月) — 专为逻辑推理（reasoning）设计的模型系列；R1‑0528版本在AIME 2025和LiveCodeBench上接近OpenAI o3的水平。^[12]
DeepSeek‑VL / VL2 — 多模态VL模型（最高45亿激活参数），采用1024×1024动态拼接图像处理技术。^[13]
DeepSeek‑Math 7B — 专用模型，在MATH基准测试中准确率达到51.7%；接近GPT‑4的水平。^[14]
DeepSeek‑Prover‑V2 — 6710亿参数的MoE模型，用于Lean 4中的定理证明；在miniF2F上达到63.5%的准确率。
R1蒸馏模型 — 基于Llama和Qwen的开源版本，参数量从15亿到700亿不等。^[15]

关键发布时间线

日期	发布版本及主要特点
2023年11月2日	DeepSeek‑Coder v1: 首批用于代码生成的open‑weight模型。
2023年11月29日	DeepSeek LLM 7B/67B: 在2万亿词元上训练的双语模型。
2024年1月11日	DeepSeek‑MoE 16B: MoE架构首次亮相。
2024年2月6日	DeepSeek‑Math 7B: 专为数学设计的模型（MATH准确率51.7%）。
2024年5月6日	DeepSeek‑V2 236B: 引入MLA和MoE架构。
2024年6月17日	DeepSeek‑Coder‑V2: 128K上下文长度，支持338种编程语言。
2024年12月13日	DeepSeek‑VL2: 基于MoE的多模态模型。
2024年12月27日	DeepSeek‑V3 671B: 训练成本低于600万美元的旗舰模型。
2025年1月20日	DeepSeek‑R1 / R1‑Zero: 通过强化学习（RL）训练的推理模型。
2025年1月27日	Janus‑Pro: 超越DALL‑E 3的图像生成模型。

性能与基准测试

DeepSeek‑V3 在 MMLU 和 GPQA‑Diamond 基准测试中超越了 Llama 3.1 和 Qwen 2.5，接近 GPT‑4 的水平。^[16]
DeepSeek‑Coder‑V2 在 Arena‑Hard 上得分72.9%，与 GPT‑4o 持平，并且优于除 Claude‑3.5‑Sonnet 之外的所有开源模型。^[17]
DeepSeek‑Math 7B — 在 MATH 基准上达到51.7%的准确率，与 Gemini‑Ultra 水平相近，但模型规模仅为其十分之一。^[18]
R1‑Zero 仅通过强化学习（RL）训练，就将 AIME 2024 pass@1 的得分从15.6%提升至71%。^[19]

经济性与 API

DeepSeek 为 V3 和 R1 模型提供公共 API，在缓存命中的情况下，每百万输入token的价格为0.07至0.14美元，每百万输出token的价格为1.10至2.19美元，比 GPT‑4o 的费率便宜数十倍。^[20]

许可与开源

大多数模型在 MIT 或 Apache 2.0 许可下分发，允许商业用途。该公司在 Hugging Face 和 GitHub 上发布模型权重，但保留了完整的数据集和训练流程的非公开性（“开放权重，但非完全开源”）。

行业影响

R1 的发布引发了 NVIDIA、微软等公司股价的单日下跌，其背景是关于“成本600万美元的GPT‑4级模型”的新闻。^[21]
在受出口限制的 Nvidia H800 芯片上成功进行训练的演示，引发了关于美国制裁有效性的讨论，并加速了中国国产AI加速器（如华为昇腾910B）的开发。

批评与局限性

安全性：在 HarmBench 测试中，R1 模型未能拦截100%的不良请求（“越狱”）。
政治审查：聊天版本会过滤掉对中国政府“敏感”的话题（如1989年天安门事件、台湾地位等）。
数据存储：用户数据存储在中国的服务器上，这限制了受 GDPR 及类似法律制度约束的西方公司使用其 API。^[22]

参考文献

Dai, D. et al. (2024). DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture‑of‑Experts Language Models. arXiv:2401.06066.
Ding, Y. et al. (2024). LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens. arXiv:2402.13753.
Fedus, W.; Zoph, B.; Shazeer, N. (2021). Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. arXiv:2101.03961.
He, L. et al. (2025). Scaling Instruction‑Tuned LLMs to Million‑Token Contexts via Hierarchical Synthetic Data Generation. arXiv:2504.12637.
Jegham, N. et al. (2025). Visual Reasoning Evaluation of Grok, Deepseek Janus, Gemini, Qwen, Mistral, and ChatGPT. arXiv:2502.16428.
Lepikhin, D. et al. (2020). GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding. arXiv:2006.16668.
Peng, B. et al. (2023). YaRN: Efficient Context Window Extension of Large Language Models. arXiv:2309.00071.
Shen, Y. et al. (2025). Long‑VITA: Scaling Large Multi‑modal Models to 1 Million Tokens with Leading Short‑Context Accuracy. arXiv:2502.05177.
Su, J. et al. (2021). RoFormer: Enhanced Transformer with Rotary Position Embedding. arXiv:2104.09864.
Zhong, M. et al. (2024). Understanding the RoPE Extensions of Long‑Context LLMs: An Attention Perspective. arXiv:2406.13282.

注释

↑ DeepSeek's low-cost AI spotlights billions spent by US tech // Reuters. 2025-01-27.
↑ Who is Liang Wenfeng, the founder of DeepSeek? // Reuters. 2025-01-28.
↑ Who is Liang Wenfeng, the founder of DeepSeek? // Reuters. 2025-01-28.
↑ DeepSeek's low-cost AI spotlights billions spent by US tech // Reuters. 2025-01-27.
↑ DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model // Hugging Face. 2024.
↑ DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model // Hugging Face. 2024.
↑ DeepSeek-V3: A Parameter-Efficient MoE Large Language Model with Better Performance // arXiv. 2024.
↑ DeepSeek LLM: Scaling Open-Source Language Models with Longtermism // arXiv. 2024.
↑ DeepSeek-Coder-V2: A More Powerful and Economical Coder // arXiv. 2024.
↑ DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model // Hugging Face. 2024.
↑ DeepSeek-V3: A Parameter-Efficient MoE Large Language Model with Better Performance // arXiv. 2024.
↑ DeepSeek-R1: A 671B Parameter MoE LLM with Unprecedented Reasoning Capabilities // arXiv. 2025.
↑ GitHub - deepseek-ai/DeepSeek-VL: Towards Real-World Vision-Language Understanding // GitHub.
↑ DeepSeek-Math: Pushing the Limits of Mathematical Reasoning in Open-Source Models // arXiv. 2024.
↑ DeepSeek-R1: A 671B Parameter MoE LLM with Unprecedented Reasoning Capabilities // arXiv. 2025.
↑ DeepSeek-V3: A Parameter-Efficient MoE Large Language Model with Better Performance // arXiv. 2024.
↑ DeepSeek-Coder-V2: A More Powerful and Economical Coder // arXiv. 2024.
↑ DeepSeek-Math: Pushing the Limits of Mathematical Reasoning in Open-Source Models // arXiv. 2024.
↑ DeepSeek-R1: A 671B Parameter MoE LLM with Unprecedented Reasoning Capabilities // arXiv. 2025.
↑ DeepSeek Explained: Why This AI Model Is Gaining Popularity // DigitalOcean.
↑ DeepSeek's low-cost AI spotlights billions spent by US tech // Reuters. 2025-01-27.
↑ DeepSeek's low-cost AI spotlights billions spent by US tech // Reuters. 2025-01-27.

参见

OpenAI 的大语言模型
混合专家模型 (Mixture-of-Experts)

[1] DeepSeek's low-cost AI spotlights billions spent by US tech // Reuters. 2025-01-27.

[2] Who is Liang Wenfeng, the founder of DeepSeek? // Reuters. 2025-01-28.

[3] Who is Liang Wenfeng, the founder of DeepSeek? // Reuters. 2025-01-28.

[4] DeepSeek's low-cost AI spotlights billions spent by US tech // Reuters. 2025-01-27.

[5] DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model // Hugging Face. 2024.

[6] DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model // Hugging Face. 2024.

[7] DeepSeek-V3: A Parameter-Efficient MoE Large Language Model with Better Performance // arXiv. 2024.

[8] DeepSeek LLM: Scaling Open-Source Language Models with Longtermism // arXiv. 2024.

[9] DeepSeek-Coder-V2: A More Powerful and Economical Coder // arXiv. 2024.

[10] DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model // Hugging Face. 2024.

[11] DeepSeek-V3: A Parameter-Efficient MoE Large Language Model with Better Performance // arXiv. 2024.

[12] DeepSeek-R1: A 671B Parameter MoE LLM with Unprecedented Reasoning Capabilities // arXiv. 2025.

[13] GitHub - deepseek-ai/DeepSeek-VL: Towards Real-World Vision-Language Understanding // GitHub.

[14] DeepSeek-Math: Pushing the Limits of Mathematical Reasoning in Open-Source Models // arXiv. 2024.

[15] DeepSeek-R1: A 671B Parameter MoE LLM with Unprecedented Reasoning Capabilities // arXiv. 2025.

[16] DeepSeek-V3: A Parameter-Efficient MoE Large Language Model with Better Performance // arXiv. 2024.

[17] DeepSeek-Coder-V2: A More Powerful and Economical Coder // arXiv. 2024.

[18] DeepSeek-Math: Pushing the Limits of Mathematical Reasoning in Open-Source Models // arXiv. 2024.

[19] DeepSeek-R1: A 671B Parameter MoE LLM with Unprecedented Reasoning Capabilities // arXiv. 2025.

[20] DeepSeek Explained: Why This AI Model Is Gaining Popularity // DigitalOcean.

[21] DeepSeek's low-cost AI spotlights billions spent by US tech // Reuters. 2025-01-27.

[22] DeepSeek's low-cost AI spotlights billions spent by US tech // Reuters. 2025-01-27.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

DeepSeek — 深度求索

Contents

历史

架构特点

模型家族

关键发布时间线

性能与基准测试

经济性与 API

许可与开源

行业影响

批评与局限性

参考文献

注释

参见

Navigation menu

DeepSeek — 深度求索

历史

架构特点

模型家族

关键发布时间线

性能与基准测试

经济性与 API

许可与开源

行业影响

批评与局限性

参考文献

注释

参见

Navigation menu

Search