DeepSeek-V2
ReleasedRevolutionary MoE model with 236B parameters and MLA architecture
Released on 2024.05.06
Overview
DeepSeek-V2 is a strong Mixture-of-Experts (MoE) language model with 236B total parameters, of which 21B are activated for each token. It introduces innovative Multi-head Latent Attention (MLA) architecture for efficient inference.
Key Features
- 236B total parameters (21B activated)
- Multi-head Latent Attention (MLA)
- DeepSeekMoE architecture
- 128K context length
- Significantly reduced inference cost
Specifications
- Parameters
- 236B (21B activated)
- Architecture
- MoE + MLA
- Context Length
- 128K tokens
- Training Tokens
- 8.1T tokens
- License
- DeepSeek License