feat: Introduce pluggable VLM runtime system with preset-based configuration (#2919)
* model runtime refactoring Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add test Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix code formula preset Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * batch prediction Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * use presets and new vlm options in CLI Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * use new model settings by default Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * running Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * update examples Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fixes for running examples Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * keep old stage Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * update model Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * use granite 3.3 and set options Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * revisit init logic and propagate the proper options to the runtimes Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * update all stages with original setup Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * per stage registry Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * use chat template Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * remove duplicated predict() and factor out some utils Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * working picture description examples Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add granite docling as code formula model Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * rename code formula presets Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix running minimal_vlm example Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add all models to presets and run compare_vlm Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * remove unused repo_id Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * update vlm api model example Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix legacy examples Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add another legacy example Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix test Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * avoid automatic fallback to mlx and fix end_of_utterance in codeformula Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * move vlm_convert_model Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * use new vlm runtime class Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * flasg for CI Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * rename runtimes to explicit vlm_runtimes Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * renaming from runtime to inference engine and model families Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fixes Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix test Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add docs with stages Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * update docs catalog page Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * rename runtime to inference engine Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
M
Michele Dolfi committed
d4c87133f3f4dcfc8c7619d533bac31cc297350d
Parent: c28d0d6
Committed by GitHub <noreply@github.com>
on 2/4/2026, 4:29:17 PM