LLM inference engine from scratch in C++ – why output tokens cost 5x

by ani17 | View on Hacker News