Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
https://code.morphllm.com/PaddlePaddle/PaddleOCR.git ai4science chineseocr document-parsing document-translation kie ocr paddleocr-vl pdf2markdown pdf-extractor-rag pdf-parser pp-ocr pp-structure rag