DeepSeek-R1

ReleasedFeatured

OpenAI o1 level reasoning through pure reinforcement learning

Released on 2025.01.20

Overview

DeepSeek-R1 is a revolutionary reasoning model that matches OpenAI o1 performance through pure reinforcement learning, without supervised fine-tuning on chain-of-thought data. It represents a breakthrough in AI reasoning capabilities.

Key Features

Matches OpenAI o1 on reasoning benchmarks
Pure RL without SFT on CoT data
Open-source with MIT license
Distilled versions available (1.5B to 70B)
Emergent reasoning behaviors

Specifications

Parameters: 671B (based on V3)
Architecture: MoE + RL Reasoning
Context Length: 128K tokens
Benchmark: AIME 2024: 79.8%, MATH-500: 97.3%
License: MIT License

Resources

Research Paper GitHub Hugging Face API