DeepSeek-V3

ReleasedFeatured

Revolutionary 671B MoE model rivaling GPT-4o at fraction of cost

Released on 2024.12.26

Overview

DeepSeek-V3 is a groundbreaking 671B parameter MoE model that achieves performance comparable to leading closed-source models while being trained at a fraction of the cost. It introduces auxiliary-loss-free load balancing and multi-token prediction.

Key Features

  • 671B total parameters (37B activated)
  • Trained for only $5.58M
  • Auxiliary-loss-free load balancing
  • Multi-token prediction (MTP)
  • FP8 mixed precision training
  • Outperforms Claude 3.5 Sonnet on many benchmarks

Specifications

Parameters
671B (37B activated)
Architecture
MoE + MLA + MTP
Context Length
128K tokens
Training Tokens
14.8T tokens
Benchmark
MMLU 88.5%, HumanEval 82.6%

Resources