👨‍💻
myHN
Top
New
Best
Ask
Show
Job
Real-time LLM Inference on Standard GPUs: 3k tokens/s per request
by NicoConstant |
View on Hacker News