2024-10-14 14:13:13 +02:00
<p align="center">
2024-10-22 15:29:36 +02:00
<img loading="lazy" alt="Docling" src="assets/docling_processing.png" width="100%" />
2024-11-05 13:57:06 +01:00
<a href="https://trendshift.io/repositories/12132" target="_blank"><img src="https://trendshift.io/api/badge/repositories/12132" alt="DS4SD%2Fdocling | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
2024-10-14 14:13:13 +02:00
</p>
[](https://arxiv.org/abs/2408.09869)
[](https://pypi.org/project/docling/)
2024-11-21 17:23:04 +01:00
[](https://pypi.org/project/docling/)
2025-06-03 15:18:54 +02:00
[](https://github.com/astral-sh/uv)
[](https://github.com/astral-sh/ruff)
2024-10-14 14:13:13 +02:00
[](https://pydantic.dev)
[](https://github.com/pre-commit/pre-commit)
2025-03-14 12:35:29 +01:00
[](https://opensource.org/licenses/MIT)
2024-11-21 17:23:04 +01:00
[](https://pepy.tech/projects/docling)
2025-06-03 15:18:54 +02:00
[](https://apify.com/vancura/docling)
2025-07-24 11:07:36 +02:00
[](https://app.dosu.dev/097760a8-135e-4789-8234-90c8837d7f1c/ask?utm_source=github)
2025-10-16 10:13:50 +02:00
[](https://docling.ai/discord)
2025-04-22 11:23:28 +02:00
[](https://www.bestpractices.dev/projects/10101)
2025-03-19 09:05:57 +01:00
[](https://lfaidata.foundation/projects/)
2024-10-14 14:13:13 +02:00
2025-01-28 13:23:30 +01:00
Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.
2024-10-14 14:13:13 +02:00
2025-11-14 06:29:25 -06:00
## Getting started
🐣 Ready to kick off your Docling journey? Let's dive right into it!
<div class="grid">
2025-11-19 01:19:56 -06:00
<a href="../docling/getting_started/installation/" class="card"><b>⬇️ Installation</b><br />Quickly install Docling in your environment</a>
<a href="../docling/getting_started/quickstart/" class="card"><b>▶️ Quickstart</b><br />Get a jumpstart on basic Docling usage</a>
<a href="../docling/concepts/" class="card"><b>🧩 Concepts</b><br />Learn Docling fundamentals and get a glimpse under the hood</a>
<a href="../docling/examples/" class="card"><b>🧑🏽🍳 Examples</b><br />Try out recipes for various use cases, including conversion, RAG, and more</a>
<a href="../docling/integrations/" class="card"><b>🤖 Integrations</b><br />Check out integrations with popular AI tools and frameworks</a>
<a href="../docling/reference/document_converter/" class="card"><b>📖 Reference</b><br />See more API details</a>
2025-11-14 06:29:25 -06:00
</div>
2024-10-14 14:13:13 +02:00
## Features
2026-03-20 16:38:16 +01:00
* 🗂️ Parsing of [multiple document formats][supported_formats] incl. PDF, DOCX, PPTX, XLSX, HTML, WAV, MP3, WebVTT, images (PNG, TIFF, JPEG, ...), LaTeX, plain text, and more
2025-01-28 13:23:30 +01:00
* 📑 Advanced PDF understanding incl. page layout, reading order, table structure, code, formulas, image classification, and more
* 🧬 Unified, expressive [DoclingDocument][docling_document] representation format
2026-02-27 14:22:52 +01:00
* ↪️ Various [export formats][supported_formats] and options, including Markdown, HTML, WebVTT, [DocTags ](https://arxiv.org/abs/2503.11576 ) and lossless JSON
2026-03-02 18:17:02 +11:00
* 📜 Support of several application-specifc XML schemas incl. [USPTO ](https://www.uspto.gov/patents ) patents, [JATS ](https://jats.nlm.nih.gov/ ) articles, and [XBRL ](https://www.xbrl.org/ ) financial reports
2025-01-28 13:23:30 +01:00
* 🔒 Local execution capabilities for sensitive data and air-gapped environments
* 🤖 Plug-and-play [integrations][integrations] incl. LangChain, LlamaIndex, Crew AI & Haystack for agentic AI
* 🔍 Extensive OCR support for scanned PDFs and images
2025-09-17 15:15:49 +02:00
* 👓 Support of several Visual Language Models ([GraniteDocling ](https://huggingface.co/ibm-granite/granite-docling-258M ))
2026-02-27 16:09:19 +01:00
* 🎙️ Audio support with Automatic Speech Recognition (ASR) models
* 🔌 Connect to any agent using the [MCP server ](https://docling-project.github.io/docling/usage/mcp/ )
2024-10-16 21:02:03 +02:00
* 💻 Simple and convenient CLI
2024-11-05 08:53:02 +01:00
2025-09-10 01:02:28 -07:00
### What's new
* 📤 Structured [information extraction][extraction] \[🧪 beta\]
* 📑 New layout model (**Heron**) by default, for faster PDF parsing
* 🔌 [MCP server ](https://docling-project.github.io/docling/usage/mcp/ ) for agentic applications
2026-02-27 16:09:19 +01:00
* 💼 Parsing of XBRL (eXtensible Business Reporting Language) documents for financial reports
* 💬 Parsing of WebVTT (Web Video Text Tracks) files
2026-02-10 19:59:23 +01:00
* 💬 Parsing of LaTeX files
2026-03-20 16:38:16 +01:00
* 📝 Parsing of plain-text files (`.txt` , `.text` ) and Markdown supersets (`.qmd` , `.Rmd` )
2025-09-10 01:02:28 -07:00
2024-11-05 08:53:02 +01:00
### Coming soon
* 📝 Metadata extraction, including title, authors, references & language
2025-01-30 09:52:54 +01:00
* 📝 Chart understanding (Barchart, Piechart, LinePlot, etc)
* 📝 Complex chemistry understanding (Molecular structures)
2024-11-05 13:57:06 +01:00
2025-11-14 06:29:25 -06:00
## What's next
2025-11-19 01:19:56 -06:00
🚀 The journey has just begun! Join us and become a part of the growing Docling community.
2025-01-07 14:15:54 +01:00
2025-11-14 06:29:25 -06:00
- <a href="https://github.com/docling-project/docling">:fontawesome-brands-github: GitHub</a>
- <a href="https://docling.ai/discord">:fontawesome-brands-discord: Discord</a>
- <a href="https://linkedin.com/company/docling/">:fontawesome-brands-linkedin: LinkedIn</a>
2025-01-07 14:15:54 +01:00
2025-07-24 11:07:36 +02:00
## Live assistant
2025-08-21 08:44:53 +02:00
Do you want to leverage the power of AI and get live support on Docling?
2025-07-24 11:07:36 +02:00
Try out the [Chat with Dosu ](https://app.dosu.dev/097760a8-135e-4789-8234-90c8837d7f1c/ask?utm_source=github ) functionalities provided by our friends at [Dosu ](https://dosu.dev/ ).
[](https://app.dosu.dev/097760a8-135e-4789-8234-90c8837d7f1c/ask?utm_source=github)
2025-03-19 09:05:57 +01:00
## LF AI & Data
2024-11-05 13:57:06 +01:00
2025-03-19 09:05:57 +01:00
Docling is hosted as a project in the [LF AI & Data Foundation ](https://lfaidata.foundation/projects/ ).
### IBM ❤️ Open Source AI
The project was started by the AI for knowledge team at IBM Research Zurich.
2025-01-28 13:23:30 +01:00
2025-03-04 14:24:38 +01:00
[supported_formats]: ./usage/supported_formats.md
2025-01-28 13:23:30 +01:00
[docling_document]: ./concepts/docling_document.md
[integrations]: ./integrations/index.md