DeepSeek V4

DeepSeek 버전 히스토리 | V1, V2, V3, R1, V4 진화 타임라인

Witness complete evolution of DeepSeek from Jan 2024 first release to V4 in 2026

Since first release in Jan 2024, each DeepSeek iteration brings major technical breakthroughs. From the initial 67B parameter model to the V4 released in April 2026, DeepSeek continuously pushes the boundaries of open-source AI.

2024

DeepSeek Official Release

DeepSeek LLM

First open-source version, offering 7B and 67B scales. 67B version surpasses LLaMA-2 70B in code, math, reasoning tasks. Trained on 2T tokens, proving strength of Chinese team in large models.

7B and 67B dual versions

Trained on 2T tokens

Surpasses LLaMA-2 70B

Fully open-source model weights

2024

Vision-Language Model Released

DeepSeek-VL

Open-source multimodal model, supports 1024×1024 high-resolution image understanding. Excellent performance in multiple vision-language tasks, adding multimodal capability to DeepSeek ecosystem.

1024×1024 high resolution

Multimodal understanding

Open-source weights and training code

Excellent vision Q&A capability

2024

MoE Architecture Major Breakthrough

DeepSeek-V2

Adopts Mixture-of-Experts (MoE) architecture, 236B total params, 21B active, supports 128K context. Training cost reduced 42.5%, KV cache reduced 93.3%, throughput improved 5.76x.

236B total, 21B active params

128K ultra-long context

Training cost reduced 42.5%

KV cache reduced 93.3%

Throughput improved 5.76x

2024

Code Expert Model

DeepSeek-Coder-V2

Code-focused MoE model, supports 338 programming languages, 128K context. Additional 6T tokens code data training, HumanEval score 89.5%.

Supports 338 programming languages

128K code context

Additional 6T tokens training

HumanEval 89.5% score

2024

Flagship Model Performance Leap

DeepSeek-V3

DeepSeek's strongest model, 671B total params, 37B active. Trained on 14.8T tokens, only 2.788M H800 GPU hours needed. Stable training with no rollbacks.

671B total, 37B active params

Trained on 14.8T tokens

Cost only 2.788M GPU hours

Stable training no rollbacks

Performance approaches GPT-4

2025

Reasoning Model Released

DeepSeek-R1

Model focused on complex reasoning, excels in math, programming, logical reasoning tasks.

Enhanced reasoning capability

Improved math reasoning accuracy

Multi-step logical reasoning

Long-chain reasoning stability

2026

V4 Released & Open-Sourced (MIT)

DeepSeek-V4

Released on April 24, 2026 and open-sourced under MIT, with weights on Hugging Face. Two versions: V4-Pro (1.6T total / 49B active) and V4-Flash (284B / 13B). MoE with CSA+HCA hybrid attention delivers a 1M-token context at very low cost (~27% compute, ~10% KV cache per token vs V3.2). SWE-bench Verified 80.6%.

MoE + CSA+HCA hybrid attention

1M token context (both versions)

V4-Pro 1.6T/49B, V4-Flash 284B/13B

SWE-bench Verified 80.6%, open-source (MIT)

📊 Key Metrics Evolution

Metric	V1 (2024.01)	V2 (2024.05)	V3 (2024.12)	V4 (2026.04)
Total Parameters	67B	236B	671B	1.6T (Pro) / 284B (Flash)
Active Parameters	67B	21B	37B	49B (Pro) / 13B (Flash)
Context Length	4K	128K	128K	1M
Training Data	2T	TBD	14.8T	Large-scale
Cost Efficiency	Baseline	↓ 42.5%	Continuous optimization	~27% compute, ~10% KV cache @1M (vs V3.2)

🌟 Community Milestones

50,000+

GitHub Stars

Highly recognized by open-source community

1,000,000+

Model Downloads

HuggingFace downloads

100,000+

Developers

Active users

1,500+

Citations

Widely cited in academia

Join DeepSeek's Evolution Journey

Experience latest version on Atlas Cloud, witness birth of next-gen AI

Try Free