KVarN: Native vLLM backend for KV-cache quantization by Huawei

(github.com)

108 points | by theanonymousone 8 hours ago ago

11 comments