DeepSeek V4

DeepSeek Version History | V1, V2, V3, R1, V4 Evolution Timeline

DeepSeek release history: all versions from V1 (2024) to V4 (2026) | Complete changelog and updates

Since first release in Jan 2024, each DeepSeek iteration brings major technical breakthroughs. From initial 67B parameter model to upcoming V4, DeepSeek continuously pushes boundaries of open-source AI.

2024
01

DeepSeek Official Release

DeepSeek LLM

First open-source version, offering 7B and 67B scales. 67B version surpasses LLaMA-2 70B in code, math, reasoning tasks. Trained on 2T tokens, proving strength of Chinese team in large models.

7B and 67B dual versions
Trained on 2T tokens
Surpasses LLaMA-2 70B
Fully open-source model weights
2024
03

Vision-Language Model Released

DeepSeek-VL

Open-source multimodal model, supports 1024×1024 high-resolution image understanding. Excellent performance in multiple vision-language tasks, adding multimodal capability to DeepSeek ecosystem.

1024×1024 high resolution
Multimodal understanding
Open-source weights and training code
Excellent vision Q&A capability
2024
05

MoE Architecture Major Breakthrough

DeepSeek-V2

Adopts Mixture-of-Experts (MoE) architecture, 236B total params, 21B active, supports 128K context. Training cost reduced 42.5%, KV cache reduced 93.3%, throughput improved 5.76x.

236B total, 21B active params
128K ultra-long context
Training cost reduced 42.5%
KV cache reduced 93.3%
Throughput improved 5.76x
2024
06

Code Expert Model

DeepSeek-Coder-V2

Code-focused MoE model, supports 338 programming languages, 128K context. Additional 6T tokens code data training, HumanEval score 89.5%.

Supports 338 programming languages
128K code context
Additional 6T tokens training
HumanEval 89.5% score
2024
12

Flagship Model Performance Leap

DeepSeek-V3

DeepSeek's strongest model, 671B total params, 37B active. Trained on 14.8T tokens, only 2.788M H800 GPU hours needed. Stable training with no rollbacks.

671B total, 37B active params
Trained on 14.8T tokens
Cost only 2.788M GPU hours
Stable training no rollbacks
Performance approaches GPT-4
2025
01

Reasoning Model Released

DeepSeek-R1

Model focused on complex reasoning, excels in math, programming, logical reasoning tasks.

Enhanced reasoning capability
Improved math reasoning accuracy
Multi-step logical reasoning
Long-chain reasoning stability
2026
02
Launching Soon

V4 Launching Soon (Expected)

DeepSeek-V4

Brand new MODEL1 architecture, expected to support million-level token context, FP8 mixed precision inference. Performance expected to leap again, cost efficiency further optimized.

MODEL1 brand new architecture
Million-level token context (expected)
FP8 mixed precision
Sparse+dense hybrid inference

📊 Key Metrics Evolution

MetricV1 (2024.01)V2 (2024.05)V3 (2024.12)V4 (2026.03)
Total Parameters67B236B671BTBD
Active Parameters67B21B37BExpected optimized
Context Length4K128K128KMillion-level (expected)
Training Data2TTBD14.8TExpected more
Cost EfficiencyBaseline↓ 42.5%Continuous optimization↓ 30%+ (expected)

🌟 Community Milestones

50,000+
GitHub Stars
Highly recognized by open-source community
1,000,000+
Model Downloads
HuggingFace downloads
100,000+
Developers
Active users
1,500+
Citations
Widely cited in academia

Join DeepSeek's Evolution Journey

Experience latest version on Atlas Cloud, witness birth of next-gen AI

Try Free