vLLM-mlx – 65 tok/s LLM inference on Mac with tool calling and prompt caching

by raullen | View on Hacker News