DeepSeek-OCR

Released

Vision-language model for document understanding and OCR

Released on 2025.10.20

Overview

DeepSeek-OCR is a vision-language model designed for visual text compression and document understanding. It excels at converting documents to markdown, OCR tasks, chart parsing, and image description with high accuracy and efficiency.

Key Features

  • Document to Markdown conversion
  • High-precision OCR with layout preservation
  • Chart and figure parsing
  • Multiple resolution modes (512-1280px)
  • Grounding capability for text localization
  • ~2500 tokens/s inference speed on A100

Specifications

Architecture
Vision-Language Model
Context Length
8K tokens
License
MIT License

Resources