docling

mirror of https://github.com/docling-project/docling.git synced 2026-03-26 06:01:04 +00:00

Files

History

Maxim Lysak 1c74a9b9c7 feat: Implementation of HTML backend with headless browser (#2969 )

- Implementation of HTML backend that (optionally) uses headless browser (via Playwright) to materialize HTML pages into images, and add provenances with bboxes to all elements in the converted docling document.
- Conversion preserves reading order given by HTML DOM tree
- Added support for HTML "input" fields: checkboxes, radiobuttons, text inputs, etc.
- Added support to Key-Value convention in HTML (i.e. elements with id "key1" and "key1_value1" will be paired as key-values, see test cases as examples)
- Heuristic that glues independent inline HTML elements with single-character text in them into larger text blocks
- Support for inline styling (bold, italic, etc.)

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>

2026-03-24 14:28:57 +01:00

ISSUE_TEMPLATE

chore: add issue templates (#251 )

2024-11-05 23:18:20 +01:00

scripts

feat: simplify dependencies, switch to uv (#1700 )

2025-06-03 15:18:54 +02:00

workflows

feat: Implementation of HTML backend with headless browser (#2969 )

2026-03-24 14:28:57 +01:00

codecov.yml

fix(codecov): fix codecov argument and yaml file (#1399 )