DeepSeek-OCR
ReleasedVision-language model for document understanding and OCR
Released on 2025.10.20
Overview
DeepSeek-OCR is a vision-language model designed for visual text compression and document understanding. It excels at converting documents to markdown, OCR tasks, chart parsing, and image description with high accuracy and efficiency.
Key Features
- Document to Markdown conversion
- High-precision OCR with layout preservation
- Chart and figure parsing
- Multiple resolution modes (512-1280px)
- Grounding capability for text localization
- ~2500 tokens/s inference speed on A100
Specifications
- Architecture
- Vision-Language Model
- Context Length
- 8K tokens
- License
- MIT License