SIGN IN SIGN UP
yichuan-w / LEANN UNCLAIMED

[MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

0 0 1 Python

fix: incremental build chunking inconsistency and shared metadata dict

Two bugs caused incremental builds to produce different (broken) results
compared to force builds:

1. Single-file loading path did not set `source` metadata, so `is_code_file`
   detection failed — incremental chunks used wrong parser (node_parser
   instead of code_parser), lost line numbers, and produced different chunk
   counts. Fixed by populating `source` from `file_path` when missing, and
   falling back to `file_path` in `is_code_file` detection.

2. All chunks from the same document shared a single metadata dict reference,
   so `_assign_unique_chunk_ids` overwrote the same dict — all chunks ended
   up with the last UUID. Fixed by copying metadata per chunk.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Y
yichuan520030910320 committed
8a706d3c0cee8dbbe7dc80caa557f0e1e55df572
Parent: b30f382