Tokasaurus: An LLM inference engine for high-throughput workloads

(scalingintelligence.stanford.edu)

206 points | by rsehrlich 21 hours ago ago

23 comments