Commit Graph

  • f7f31137f1 fix: allow custom torch_dtype in vlm models (#1735) Michele Dolfi 2025-06-10 03:52:15 -05:00
  • 3a76433b83 Update test files dev/fix_msword_backend_identify_text_after_image Christoph Auer 2025-06-10 09:52:31 +02:00
  • 5fac357995 Merge branch 'main' of github.com:docling-project/docling into dev/fix_msword_backend_identify_text_after_image Christoph Auer 2025-06-10 09:52:15 +02:00
  • 49b10e7419 docs: add open webui (#1734) Michele Dolfi 2025-06-10 02:35:20 -05:00
  • 52b8b9163f Merge branch 'main' of https://github.com/docling-project/docling into dev/fix_msword_backend_identify_text_after_image Michael Krissgau 2025-06-06 20:53:40 +02:00
  • 9dbcb3d7d4 fix: Improve extraction from textboxes in Word docs (#1701) AndrewTsai0406 2025-06-06 17:37:46 +08:00
  • 2bc564ccef Merge branch 'main' of https://github.com/docling-project/docling into dev/fix_msword_backend_identify_text_after_image Michael Krissgau 2025-06-05 22:20:09 +02:00
  • a2b83fe4ae fix: Add WEBP to the list of image file extensions (#1711) Eugene 2025-06-05 11:09:27 +04:00
  • 40df0d74ad chore: bump version to 2.36.1 [skip ci] v2.36.1 github-actions[bot] 2025-06-04 11:43:13 +00:00
  • 8846f1a393 fix: remove typer and click constraints (#1707) Michele Dolfi 2025-06-04 13:06:23 +02:00
  • be42b03f9b docs: flash-attn usage and install (#1706) Michele Dolfi 2025-06-04 11:09:54 +02:00
  • 96c54dba91 chore: bump version to 2.36.0 [skip ci] v2.36.0 github-actions[bot] 2025-06-03 13:54:25 +00:00
  • cdd401847a feat: simplify dependencies, switch to uv (#1700) Michele Dolfi 2025-06-03 15:18:54 +02:00
  • 61d0d6c755 test: mark flaky test (#1698) Panos Vagenas 2025-06-03 13:13:44 +02:00
  • cfdf4cea25 feat: new vlm-models support (#1570) Peter W. J. Staar 2025-06-02 17:01:06 +02:00
  • 08dcacc5cb chore: bump version to 2.35.0 [skip ci] v2.35.0 github-actions[bot] 2025-06-02 12:30:26 +00:00
  • 11ca4f7a7b docs: fix typo in index.md (#1676) Edgar Hipp 2025-06-02 12:35:59 +02:00
  • 1c8a1283c4 test: ensure utf-8 in test data utils (#1691) Panos Vagenas 2025-06-02 12:13:19 +02:00
  • 984cb137f6 fix: guess HTML content starting with script tag (#1673) cp_main_20250602 Cesar Berrospi Ramis 2025-06-02 08:43:24 +02:00
  • fa561170f6 chore: Update lock with the dependencies for D-FINE nli/layout_dfine Nikos Livathinos 2025-05-31 16:57:09 +02:00
  • dcc63ae00b Merge branch 'main' into nli/layout_rtdetr_v2 nli/layout_rtdetr_v2 Nikos Livathinos 2025-05-31 16:55:06 +02:00
  • 7aa2be93d6 Merge branch 'main' into nli/layout_dfine Nikos Livathinos 2025-05-31 16:48:28 +02:00
  • 30dafd976d chore: Update dependencies to docling-ibm-models and transformers to support D-FINE layout model Nikos Livathinos 2025-05-31 16:39:25 +02:00
  • 93d98dfa63 test: added groundtruth test files for fix(msword_backend): Identify text in the same line after an image / image anchor #1425 Michael Krissgau 2025-05-29 15:12:55 +02:00
  • 84dc120d39 Merge branch 'main' of https://github.com/docling-project/docling into dev/fix_msword_backend_identify_text_after_image Michael Krissgau 2025-05-29 15:04:06 +02:00
  • 3942923125 chore: fix or ignore runtime and deprecation warnings (#1660) Cesar Berrospi Ramis 2025-05-28 17:55:31 +02:00
  • b3e0042813 chore: exclude data from GH Linguist (#1671) Panos Vagenas 2025-05-28 15:42:34 +02:00
  • 106951e71e test: add missing ground truth files (#1667) Cesar Berrospi Ramis 2025-05-28 13:26:49 +02:00
  • b356b33059 feat: Add visualization of bbox on page with html export. (#1663) Peter W. J. Staar 2025-05-28 13:10:38 +02:00
  • 51d3450915 fix: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 0: invalid continuation byte (#1665) DavidLee 2025-05-27 20:06:05 +08:00
  • 2579d89510 chore: bump version to 2.34.0 [skip ci] v2.34.0 github-actions[bot] 2025-05-22 18:44:45 +00:00
  • fffa865014 test: add test file and case for fix(msword_backend): Identify text in the same line after an image / image anchor #1425 Michael Krissgau 2025-05-22 19:02:59 +02:00
  • af4aaa28af fix(msword_backend): Identify text in the same line after an image / image anchor #1425 Michael Krissgau 2025-05-22 17:45:15 +02:00
  • c2f595d283 fix: fix ZeroDivisionError for cell_bbox.area() (#1636) Said Gürbüz 2025-05-22 13:43:33 +02:00
  • 45265bf8b1 feat(ocr): auto-detect rotated pages in Tesseract (#1167) Clément Doumouro 2025-05-21 18:12:33 +02:00
  • 90875247e5 feat: Establish confidence estimation for document and pages (#1313) Christoph Auer 2025-05-21 12:32:49 +02:00
  • 14d4f5b109 fix(integration): update the Apify Actor integration (#1619) Václav Vančura 2025-05-21 02:47:55 +02:00
  • 84d0889829 chore: bump version to 2.33.0 [skip ci] v2.33.0 github-actions[bot] 2025-05-20 19:54:51 +00:00
  • f4d9d4111b fix: Fix issue with detecting docx files, and files with upper case extensions (#1609) MoheyElDin Badr 2025-05-20 20:42:37 +03:00
  • 0e00a263fa fix: load_from_doctags static usage (#1617) Said Gürbüz 2025-05-20 15:06:12 +02:00
  • f2e9c0784c fix: incorrect force_backend_text behaviour for VLM DocTag pipelines (#1371) Krishnan 2025-05-20 13:29:38 +05:30
  • 98b5eeb844 fix(pypdfium): resolve overlapping text when merging bounding boxes (#1549) Pedro Ribeiro 2025-05-19 14:26:00 +01:00
  • 12a0e64892 feat: add textbox content extraction in msword_backend (#1538) AndrewTsai0406 2025-05-19 21:01:36 +08:00
  • 7c4c356e76 chore: fix chunking example data link (#1596) Panos Vagenas 2025-05-16 08:44:47 +02:00
  • aeb0716bbb chore: bump version to 2.32.0 [skip ci] v2.32.0 github-actions[bot] 2025-05-14 14:28:21 +00:00
  • 3a04f2a367 feat: Improve parallelization for remote services API calls (#1548) Vinay R Damodaran 2025-05-14 06:47:55 -07:00
  • 9f8b479f17 fix(ocr): orig field in TesseractOcrCliModel as str (#1553) jimkarag02 2025-05-14 16:05:52 +03:00
  • 9f28abf061 docs: add advanced chunking & serialization example (#1589) Panos Vagenas 2025-05-14 13:35:07 +01:00
  • 2efb7a7c06 fix(settings): fix nested settings load via environment variables (#1551) Alex Sokolov 2025-05-14 14:42:10 +03:00
  • 12dab0a1e8 feat: support image/webp file type (#1415) Elwin 2025-05-14 15:47:28 +08:00
  • 23238c241f chore: bump version to 2.31.2 [skip ci] v2.31.2 github-actions[bot] 2025-05-13 10:09:19 +00:00
  • 4046d0b2f3 fix: AsciiDoc header identification (#1562) (#1563) Marco Fargetta 2025-05-13 11:17:26 +02:00
  • 8baa85a49d fix: restrict click version and update lock file (#1582) Michele Dolfi 2025-05-13 10:40:08 +02:00
  • 0d0fa6cbe3 chore: bump version to 2.31.1 [skip ci] v2.31.1 github-actions[bot] 2025-05-12 09:44:26 +00:00
  • 127e38646f fix: add smoldocling in download utils (#1577) Michele Dolfi 2025-05-12 10:48:07 +02:00
  • 76501331d2 need to fix ruff linter dev/add-asr-pipeline Peter Staar 2025-05-12 07:34:24 +02:00
  • 32ad65cb9f work in progress: slowly adding ASR pipeline and its derivatives Peter Staar 2025-05-12 07:33:38 +02:00
  • 844babb390 docs: update links in data_prep_kit (#1559) Oleg Lavrovsky 2025-05-11 20:38:25 +02:00
  • 776e7ecf9a fix(HTML): handle row spans in header rows (#1536) Cesar Berrospi Ramis 2025-05-09 15:14:32 +02:00
  • 6e956dc551 Merge branch 'main' into nli/layoutmodel_improvements nli/layoutmodel_improvements Nikos Livathinos 2025-05-09 14:47:44 +02:00
  • 3220a592e7 docs: add serialization docs, update chunking docs (#1556) Panos Vagenas 2025-05-08 21:43:01 +02:00
  • f1658edbad fix: mime error in document streams (#1523) DavidLee 2025-05-06 15:30:46 +08:00
  • 7c705739f9 fix: usage of hashlib for FIPS (#1512) Michele Dolfi 2025-05-02 15:03:29 +02:00
  • 99d8572f6d chore: propagate docling-core fixes propagate-core-fixes-20250502 Panos Vagenas 2025-05-02 14:47:21 +02:00
  • de56523974 chore: format JSON test files to enable comparison (#1511) Panos Vagenas 2025-05-02 11:52:18 +03:00
  • b147331f2a chore: restore typing hint for self.script_readers (#1500) Ihar Hrachyshka 2025-04-30 14:33:27 -04:00
  • 4ab7e9ddfb fix: Guard against attribute errors in TesseractOcrModel __del__ (#1494) Ben Browning 2025-04-30 11:51:33 -04:00
  • cc453961a9 fix: enable cuda_use_flash_attention2 for PictureDescriptionVlmModel (#1496) Zach Cox 2025-04-30 02:02:52 -04:00
  • 976e92e289 fix: updated the time-recorder label for reading order (#1490) Peter W. J. Staar 2025-04-29 13:02:53 +02:00
  • d8959c6b19 chore: update dependencies in lock file (#1458) Michele Dolfi 2025-04-28 08:52:46 +02:00
  • a097ccd8d5 chore: typo fix (#1465) nkh0472 2025-04-28 14:52:09 +08:00
  • 3afbe6c969 docs: update supported formats guide (#1463) Emmanuel Ferdman 2025-04-28 09:51:54 +03:00
  • 94d66a0765 fix: Incorrect scaling of TableModel bboxes when do_cell_matching is False (#1459) Maxim Lysak 2025-04-25 12:34:12 +02:00
  • c67133dde4 chore: bump version to 2.31.0 [skip ci] v2.31.0 github-actions[bot] 2025-04-25 08:28:25 +00:00
  • a2fbbba9f7 feat: add tutorial using Milvus and Docling for RAG pipeline (#1449) Ryan Lin 2025-04-25 03:12:35 -04:00
  • a553a1e5bf Merge branch 'main' into nli/layoutmodel_improvements Nikos Livathinos 2025-04-24 10:03:05 +02:00
  • 976431ed7f chore: update locked deps (#1442) Michele Dolfi 2025-04-23 14:59:31 +02:00
  • ed20124544 fix(html): handle address, details, and summary tags (#1436) Cesar Berrospi Ramis 2025-04-23 09:30:59 +02:00
  • c2470ed216 docs: Fix wrong output format in example code (#1427) nkh0472 2025-04-22 18:32:55 +08:00
  • 64918a81ac docs: Add OpenSSF Best Practices badge (#1430) Michele Dolfi 2025-04-22 11:23:28 +02:00
  • 32710d5fac test: Allow pypdfium2 5.x versions cau/test-pypdfium2-beta Christoph Auer 2025-04-22 09:06:25 +02:00
  • 995b3b0ab1 docs: Typo fixes in docling_document.md (#1400) Ben Cox 2025-04-22 07:49:08 +01:00
  • 8012a3e4d6 fix: Treat overflowing -v flags as DEBUG (#1419) Eugene 2025-04-19 13:02:41 +04:00
  • 88948b0bba docs: Updated the [Usage] link in architecture.md (#1416) Leandro Rosas 2025-04-19 09:20:52 +01:00
  • 4ce338f455 fix: Adjust the LayoutModel default paths for the docling-layout-heron Nikos Livathinos 2025-04-15 23:29:01 +02:00
  • fa7fc9e63d fix(codecov): fix codecov argument and yaml file (#1399) Cesar Berrospi Ramis 2025-04-15 18:12:57 +02:00
  • e5f8bb086d Merge branch 'main' into nli/layoutmodel_improvements Nikos Livathinos 2025-04-15 16:08:12 +02:00
  • 51463e3c1f feat: Refactor the LayoutModel to use docling-layout-heron. Pinpoint docling-ibm-models to the branch of new layout model Nikos Livathinos 2025-04-15 16:04:55 +02:00
  • 0782086009 Merge branch 'main' into nli/layoutmodel_improvements Nikos Livathinos 2025-04-15 13:24:09 +02:00
  • 550b1ca2f8 chore: propagate docling-core fix (#1389) Panos Vagenas 2025-04-15 10:51:47 +02:00
  • a7dd59c5cb docs(ocr): Add docs entry for OnnxTR OCR plugin (#1382) Felix Dittrich 2025-04-15 09:46:59 +02:00
  • 06227e9970 ci: sign pypi packages (#1392) Michele Dolfi 2025-04-15 08:59:16 +02:00
  • 5458a88464 ci: add coverage and ruff (#1383) Michele Dolfi 2025-04-14 18:01:26 +02:00
  • 293c28ca7c docs(security): more statements about secure development (#1381) Michele Dolfi 2025-04-14 13:53:26 +02:00
  • 01fbfd5652 docs: Add testing in the docs (#1379) Michele Dolfi 2025-04-14 12:31:48 +02:00
  • d9c3999175 chore: update lock file (#1378) Michele Dolfi 2025-04-14 10:38:10 +02:00
  • a026b4e84b docs: Add Notes for Installing in Intel macOS (#1377) Juil Park 2025-04-14 17:21:13 +09:00
  • c391adb5f0 chore: bump version to 2.30.0 [skip ci] v2.30.0 github-actions[bot] 2025-04-14 08:20:31 +00:00
  • 7e40ad3261 fix(deps): widen typer upper bound (#1375) Michele Dolfi 2025-04-14 09:23:39 +02:00
  • c0ba88edf1 feat(cli): add option for html with split-page mode (#1355) Peter W. J. Staar 2025-04-14 08:41:50 +02:00