Tags
3 pages
Inference
The First Wave of Local Inference Tooling — gpum v1.1.0 and TokenSpeed
Pushing Qwen3.5-122B from 28.3 to 51 tok per second on a single DGX Spark
The LLMLingua Series — Microsoft's Underrated Prompt Compression Stack