Large Language Model Text Generation Inference
flashinfer: switch to plan API (#2904)
This change doesn't switch `forward` to `run` yet, since it requires that we have access to the softmax scale and the logit softcap outside the model.
D
Daniël de Kok committed
630f198624b6c405e5fcfb7f08f7f308026f68cd
Parent: 8f6146f
Committed by GitHub <noreply@github.com>
on 1/17/2025, 5:18:02 PM