Commit Graph

  • ff351fd40c docs: Describe examples (#2262) Mingxuan Zhao 2025-09-16 10:00:38 -04:00
  • 0e95171dd6 feat(RapidOcr): Support generic extra arguments for RapidOcr (#2266) dmorady1 2025-09-16 07:26:10 +02:00
  • 43d3c74bb2 update docs and README Michele Dolfi 2025-09-15 15:44:42 +02:00
  • c5a59eb979 use granite-docling and add to the model downloader Michele Dolfi 2025-09-15 15:39:08 +02:00
  • 0f8728a8d4 typo Michele Dolfi 2025-09-15 15:28:04 +02:00
  • 6a2cfbdbb8 Merge remote-tracking branch 'origin/main' into dev/add-granite-docling-extension Michele Dolfi 2025-09-15 15:26:45 +02:00
  • ad2f738231 chore: update lock (#2265) Michele Dolfi 2025-09-15 11:19:15 +02:00
  • 609d902eef fix: handle empty result from RapidOCR to avoid crash (#2264) Yuie. 2025-09-15 17:04:33 +09:00
  • 10bb0aee2d chore: bump version to 2.52.0 [skip ci] v2.52.0 github-actions[bot] 2025-09-11 16:11:20 +00:00
  • 0700af212c fix: Add missing features in ThreadedStandardPdfPipeline (#2252) Christoph Auer 2025-09-11 16:26:02 +02:00
  • 2c9123419f feat: enrichment steps on all convert pipelines (incl docx, html, etc) (#2251) Michele Dolfi 2025-09-11 15:09:00 +02:00
  • c6965495a2 fix: address deprecation warnings of dependencies (#2237) Michele Dolfi 2025-09-10 14:38:34 +02:00
  • f8cc545bab docs: add an example of RAG with OpenSearch (#2238) Cesar Berrospi Ramis 2025-09-10 14:37:22 +02:00
  • e5cd7020bd docs: Add instructions for using Docling with MCP to README (#2219) Roy Derks 2025-09-10 01:02:28 -07:00
  • 1324eb75fc add modified test results dev-granite-docling-table Michele Dolfi 2025-09-10 08:43:29 +02:00
  • a4efd70410 dev: use granite-docling for table structure Michele Dolfi 2025-09-09 18:16:16 +02:00
  • 55f5f3752f docs: Document VLM support requirement in extraction example (#2231) Tamás Bitai 2025-09-09 13:45:55 +02:00
  • ae9ec37cf1 doing some experiments with granite-docling dev/analysis-for-granite-docling Peter Staar 2025-09-08 06:03:18 +02:00
  • 0e2f370f4f updated the model specs Peter Staar 2025-09-05 16:58:43 +02:00
  • df60673992 chore: bump version to 2.51.0 [skip ci] v2.51.0 github-actions[bot] 2025-09-05 13:01:33 +00:00
  • c1dcb0597d adding granite-docling preview Peter Staar 2025-09-05 15:00:05 +02:00
  • b49d1ad4f1 feat: updating default parameters to get better performance with docling-parse (#2208) Peter W. J. Staar 2025-09-05 14:06:21 +02:00
  • a9f41b088e docs: add information extraction example (#2199) Panos Vagenas 2025-09-05 11:27:09 +02:00
  • b3d7542061 feat: updated the backend for new docling-parse (#2187) Peter W. J. Staar 2025-09-05 10:42:31 +02:00
  • 2c3f6faf3d chore: update deprecation note for OcrEngine (#2200) Alina Ryan 2025-09-05 02:24:14 -04:00
  • effd9de250 updated the ground-truth output dev/update-to-latest-docling-parse-again Peter Staar 2025-09-04 05:22:54 +02:00
  • cffa6e05d0 reformatted code Peter Staar 2025-09-03 16:22:19 +02:00
  • 0ec99e0f37 updated docling to start running the tests ... Peter Staar 2025-09-03 16:09:51 +02:00
  • 3419c42f10 chore: bump version to 2.50.0 [skip ci] v2.50.0 github-actions[bot] 2025-09-03 11:39:08 +00:00
  • e38aa0f7f2 feat: Heron layout model as new default (#1971) Nikos Livathinos 2025-09-03 12:45:22 +02:00
  • 293e81bf9d fix(html): access to variable not yet declared (#2171) Cesar Berrospi Ramis 2025-09-02 07:59:55 +02:00
  • d68d8b678e chore: bump version to 2.49.0 [skip ci] v2.49.0 github-actions[bot] 2025-09-01 16:39:43 +00:00
  • 4d94e38223 fix(pypdfium2): Fix OCR bounding box misalignment caused by mismatched rotation metadata (#2039) AndrewTsai0406 2025-09-01 23:22:43 +08:00
  • 9f4bc5b2f1 feat: [Beta] Extraction with schema (#2138) Christoph Auer 2025-09-01 16:09:48 +02:00
  • a283ccff25 feat(msexcel): set ContentLayer.INVISIBLE for invisible sheet (#1876) Qiefan Jiang 2025-09-01 19:53:45 +08:00
  • be26044f14 chore: update docling-core lock (#2169) Panos Vagenas 2025-09-01 13:46:10 +02:00
  • 9f0286bcac fix: translation example (#2166) Shikhar Bhardwaj 2025-09-01 14:34:46 +05:30
  • 9904d14e6a fix: extend offline mode for rapidocr fonts (#2155) geoHeil 2025-09-01 09:15:47 +02:00
  • 96cab6b536 docs: enrich landing pages (#2165) Panos Vagenas 2025-08-29 17:19:05 +02:00
  • 946ea1c2cb chore: Replace the layout_predictor.predict_batch() with layout_predictor.predict() in a loop nli/layout_heron2 Nikos Livathinos 2025-08-28 15:14:51 +02:00
  • 36d44f1225 chore: Add more logs in LayoutModel Nikos Livathinos 2025-08-28 14:24:47 +02:00
  • baaf2698b4 chore: debug_heron.py: prepend the name in the saved files Nikos Livathinos 2025-08-28 13:47:50 +02:00
  • d8ca358ae8 chore: Add debugging logs in LayoutModel Nikos Livathinos 2025-08-28 13:45:33 +02:00
  • 78f81e2c59 chore: Print the PagElements input to the ReadingOrder model Nikos Livathinos 2025-08-28 10:15:27 +02:00
  • 7debe3d5ec chore: debug_heron.py: Save exported json with pretty format Nikos Livathinos 2025-08-27 18:19:52 +02:00
  • 32461ff258 chore: debug_heron.py: Update test file Nikos Livathinos 2025-08-27 18:04:22 +02:00
  • c54d511c20 chore: debug_heron.py: Disable OCR Nikos Livathinos 2025-08-27 17:31:33 +02:00
  • 6ce3cd5763 chore: debug_heron.py update the test file Nikos Livathinos 2025-08-27 16:38:40 +02:00
  • 784283a50a chore: Update test data for Heron in Linux Nikos Livathinos 2025-08-27 14:16:13 +00:00
  • 552a606b4e chore: TMP script to debug heron Nikos Livathinos 2025-08-27 16:07:49 +02:00
  • 13255ad718 Merge from main cau/multi-stage-vlm-pipeline Christoph Auer 2025-08-27 15:28:47 +02:00
  • a9dcd43a7c fix: Ensure that the visualisations happen on copies of the page image Nikos Livathinos 2025-08-27 14:16:56 +02:00
  • fb3b7b93ae chore: bump version to 2.48.0 [skip ci] v2.48.0 github-actions[bot] 2025-08-26 05:29:31 +00:00
  • fa3327e1a6 fix(html): preserve code blocks in list items (#2131) Cesar Berrospi Ramis 2025-08-26 06:43:48 +02:00
  • c0268416cf chore: add analytics (#2133) Michele Dolfi 2025-08-25 18:25:38 +02:00
  • 1435fc3b81 Update test GT Christoph Auer 2025-07-23 14:05:30 +02:00
  • 83c45b5648 Update docling-models tag for TableFormer Christoph Auer 2025-07-23 13:39:50 +02:00
  • 969115b1dd Use default layout model in model_downloader default args Christoph Auer 2025-07-23 08:51:07 +02:00
  • e0482723c4 Use default layout model in model_downloader default args Christoph Auer 2025-07-23 08:50:22 +02:00
  • a982995fb7 feat: Switch default layout model to DOCLING_LAYOUT_HERON. Update the unit test data. Nikos Livathinos 2025-07-22 17:30:16 +02:00
  • d32d2c97e1 chore: PR approval reminder (#2132) Michele Dolfi 2025-08-25 15:08:37 +02:00
  • 3f60a0fa78 feat: Upgrade to RapidOCR 3.x (#2088) geoHeil 2025-08-25 13:10:33 +03:00
  • 2aef5cf328 chore: bump version to 2.47.1 [skip ci] v2.47.1 github-actions[bot] 2025-08-23 14:11:33 +00:00
  • 488f6cdd2d fix: vllm extra only for linux x86_64 (#2126) Michele Dolfi 2025-08-23 13:33:15 +02:00
  • 6736e66bb4 style: show converted page count in PaginatedPipeline debug statement (#2124) Raphael Norman-Tenazas 2025-08-23 06:13:20 -04:00
  • b04e205d1e chore: bump version to 2.47.0 [skip ci] v2.47.0 github-actions[bot] 2025-08-22 14:15:39 +00:00
  • cdf079dd06 feat(CLI): Option to download arbitrary HuggingFace model (#2123) VIktor Kuropiantnyk 2025-08-22 15:23:29 +02:00
  • 449bde0a6c test: update docx reference results (#2122) Michele Dolfi 2025-08-22 14:26:36 +02:00
  • 3c660c0511 feat: batching support for VLMs in transformers backend, add initial VLLM backend (#2094) Christoph Auer 2025-08-22 13:17:33 +02:00
  • 3f03709885 fix: Improve numbered list detection for msword docs (#2100) Nikhil Verma 2025-08-22 14:08:34 +05:30
  • 94fcc46aa9 feat(html): Support formatting tags in HTML texts (#2111) krrome 2025-08-22 10:37:34 +02:00
  • e76298c40d docs: DPK pipeline example using docling library (#2112) Maroun Touma 2025-08-21 04:14:36 -04:00
  • cc66773890 draft for model and stages redesign adr-model-stages Michele Dolfi 2025-08-21 10:13:17 +02:00
  • 8996d612aa docs: add Getting Started page (#2113) Panos Vagenas 2025-08-21 08:44:53 +02:00
  • 555506d8e6 chore: bump version to 2.46.0 [skip ci] v2.46.0 github-actions[bot] 2025-08-20 15:25:07 +00:00
  • 76d2cb76b3 chore: update docling-core lock (#2110) Panos Vagenas 2025-08-20 16:41:48 +02:00
  • 684adc17df Add extra_processor_kwargs Christoph Auer 2025-08-20 14:19:50 +02:00
  • 5f57ff2a45 perf: Clean up resources with docling-parse v4, no parsed_page output by default (#2105) Christoph Auer 2025-08-20 10:46:31 +02:00
  • c5f2e2fdd6 fix(HTML): parse footer tag as a group in furniture content layer (#2106) Cesar Berrospi Ramis 2025-08-20 08:42:25 +02:00
  • 8820b5558b perf: speed up function _parse_orientation (#1934) mohammed ahmed 2025-08-19 11:55:18 +03:00
  • 956f82f115 chore: upgrade dependencies in lock file (#2093) Michele Dolfi 2025-08-19 10:11:44 +02:00
  • 6bbb8e6340 Add GoT OCR 2.0 Christoph Auer 2025-08-18 15:57:06 +02:00
  • b5b7e6dd5c Add GoT OCR 2.0 Christoph Auer 2025-08-18 15:57:06 +02:00
  • d2494da8b8 feat: new code formula model (#2042) Matteo 2025-08-18 16:01:46 +02:00
  • 4a107f4f57 Adjust example instatiation of multi-stage VLM pipeline Christoph Auer 2025-08-18 14:36:42 +02:00
  • 3d07f1c78e Cleanup hf_transformers_model batching impl Christoph Auer 2025-08-18 13:37:46 +02:00
  • c3a7d1d999 chore: bump version to 2.45.0 [skip ci] v2.45.0 github-actions[bot] 2025-08-18 10:25:51 +00:00
  • 31087f3fcc feat: add backend for METS with Google Books profile (#1989) Michele Dolfi 2025-08-18 11:43:20 +02:00
  • fead482e92 Merge from main, include decode_response Christoph Auer 2025-08-18 11:29:15 +02:00
  • e372cfe01a Small fixes Christoph Auer 2025-08-18 11:12:02 +02:00
  • 9687297262 feat(html): Support in-line anchor tags in HTML texts (#1659) krrome 2025-08-18 09:57:16 +02:00
  • 76c1fbd6e8 docs: Add docling Quarkus integration (#2083) Eric Deandrea 2025-08-18 00:55:51 -04:00
  • f42676aab9 Implement proper batch inference for HuggingFaceTransformersVlmModel Christoph Auer 2025-08-15 17:56:14 +02:00
  • 1aa522792a Tweak defaults Christoph Auer 2025-08-15 14:49:34 +02:00
  • 16fea9cd8b Add VLLM backend support, optimize process_images Christoph Auer 2025-08-15 13:18:02 +02:00
  • 18b1a43744 Fix KeyboardInterrupt behaviour Christoph Auer 2025-08-14 21:11:40 +02:00
  • 52b54b21c3 Remove prints Christoph Auer 2025-08-14 20:48:34 +02:00
  • c4de11bdb3 Add VLM task interpreters Christoph Auer 2025-08-14 20:48:10 +02:00
  • c8737f71da Add VLM task interpreters Christoph Auer 2025-08-14 20:44:23 +02:00
  • 78c13e1dad Add multithreaded VLM pipeline Christoph Auer 2025-08-13 14:54:23 +02:00