DeepSeek-V3
ReleasedFeaturedRevolutionary 671B MoE model rivaling GPT-4o at fraction of cost
Released on 2024.12.26
Overview
DeepSeek-V3 is a groundbreaking 671B parameter MoE model that achieves performance comparable to leading closed-source models while being trained at a fraction of the cost. It introduces auxiliary-loss-free load balancing and multi-token prediction.
Key Features
- 671B total parameters (37B activated)
- Trained for only $5.58M
- Auxiliary-loss-free load balancing
- Multi-token prediction (MTP)
- FP8 mixed precision training
- Outperforms Claude 3.5 Sonnet on many benchmarks
Specifications
- Parameters
- 671B (37B activated)
- Architecture
- MoE + MLA + MTP
- Context Length
- 128K tokens
- Training Tokens
- 14.8T tokens
- Benchmark
- MMLU 88.5%, HumanEval 82.6%