DSpark: Speculative decoding accelerates LLM inference [pdf]

by aurenvale | View on Hacker News