DeepSeek-OCR

Released

Vision-language model for document understanding and OCR

Released on 2025.10.20

Overview

DeepSeek-OCR is a vision-language model designed for visual text compression and document understanding. It excels at converting documents to markdown, OCR tasks, chart parsing, and image description with high accuracy and efficiency.

Key Features

Document to Markdown conversion
High-precision OCR with layout preservation
Chart and figure parsing
Multiple resolution modes (512-1280px)
Grounding capability for text localization
~2500 tokens/s inference speed on A100

Specifications

Architecture: Vision-Language Model
Context Length: 8K tokens
License: MIT License

Resources

GitHub Hugging Face