DeepSeek-V2

Released

Revolutionary MoE model with 236B parameters and MLA architecture

Released on 2024.05.06

Overview

DeepSeek-V2 is a strong Mixture-of-Experts (MoE) language model with 236B total parameters, of which 21B are activated for each token. It introduces innovative Multi-head Latent Attention (MLA) architecture for efficient inference.

Key Features

  • 236B total parameters (21B activated)
  • Multi-head Latent Attention (MLA)
  • DeepSeekMoE architecture
  • 128K context length
  • Significantly reduced inference cost

Specifications

Parameters
236B (21B activated)
Architecture
MoE + MLA
Context Length
128K tokens
Training Tokens
8.1T tokens
License
DeepSeek License

Resources