DeepSeek V4
DeepSeek 버전 히스토리 | V1, V2, V3, R1, V4 진화 타임라인
Witness complete evolution of DeepSeek from Jan 2024 first release to V4 in 2026
Since first release in Jan 2024, each DeepSeek iteration brings major technical breakthroughs. From initial 67B parameter model to upcoming V4, DeepSeek continuously pushes boundaries of open-source AI.
DeepSeek Official Release
DeepSeek LLM
First open-source version, offering 7B and 67B scales. 67B version surpasses LLaMA-2 70B in code, math, reasoning tasks. Trained on 2T tokens, proving strength of Chinese team in large models.
Vision-Language Model Released
DeepSeek-VL
Open-source multimodal model, supports 1024×1024 high-resolution image understanding. Excellent performance in multiple vision-language tasks, adding multimodal capability to DeepSeek ecosystem.
MoE Architecture Major Breakthrough
DeepSeek-V2
Adopts Mixture-of-Experts (MoE) architecture, 236B total params, 21B active, supports 128K context. Training cost reduced 42.5%, KV cache reduced 93.3%, throughput improved 5.76x.
Code Expert Model
DeepSeek-Coder-V2
Code-focused MoE model, supports 338 programming languages, 128K context. Additional 6T tokens code data training, HumanEval score 89.5%.
Flagship Model Performance Leap
DeepSeek-V3
DeepSeek's strongest model, 671B total params, 37B active. Trained on 14.8T tokens, only 2.788M H800 GPU hours needed. Stable training with no rollbacks.
Reasoning Model Released
DeepSeek-R1
Model focused on complex reasoning, excels in math, programming, logical reasoning tasks.
V4 Launching Soon (Expected)
DeepSeek-V4
Brand new MODEL1 architecture, expected to support million-level token context, FP8 mixed precision inference. Performance expected to leap again, cost efficiency further optimized.
📊 Key Metrics Evolution
| Metric | V1 (2024.01) | V2 (2024.05) | V3 (2024.12) | V4 (2026.02) |
|---|---|---|---|---|
| Total Parameters | 67B | 236B | 671B | TBD |
| Active Parameters | 67B | 21B | 37B | Expected optimized |
| Context Length | 4K | 128K | 128K | Million-level (expected) |
| Training Data | 2T | TBD | 14.8T | Expected more |
| Cost Efficiency | Baseline | ↓ 42.5% | Continuous optimization | ↓ 30%+ (expected) |
🌟 Community Milestones
Join DeepSeek's Evolution Journey
Experience latest version on Atlas Cloud, witness birth of next-gen AI
Try Free