Tags

3 pages

Inference

The First Wave of Local Inference Tooling — gpum v1.1.0 and TokenSpeed

The First Wave of Local Inference Tooling — gpum v1.1.0 and TokenSpeed

Pushing Qwen3.5-122B from 28.3 to 51 tok per second on a single DGX Spark

Pushing Qwen3.5-122B from 28.3 to 51 tok per second on a single DGX Spark

The LLMLingua Series — Microsoft's Underrated Prompt Compression Stack

The LLMLingua Series — Microsoft's Underrated Prompt Compression Stack