Peking University, DeepSeek Open-Source DSpark To Boost LLM Efficienc…

On 27 June 2026, researchers from Peking University and DeepSeek jointly introduced and open-sourced DSpark, a speculative decoding framework designed to optimise Large Language Model (LLM) inference efficiency, released under an MIT license via the DeepSpec GitHub repository. DSpark delivers up to a 661% throughput gain and slashes compute waste in live production systems. It introduces Semi-Autoregressive Generation, which resolves traditional “acceptance rate decay” over long text sequences, and Confidence-Scheduled Verification, which dynamically scales verification lengths to eradicate wasted compute. DSpark is already deployed in DeepSeek-V4’s online production systems, where end-to-end user text generation speeds increased by 60% to 85% on DeepSeek-V4-Flash and 57% to 78% on DeepSeek-V4-Pro. Under heavy server loads, V4-Flash achieved a 51% throughput gain at an 80 token/s SLA, surging to a 661% throughput gain at a 120 token/s SLA, while V4-Pro registered a 52% throughput gain at 35 token/s and a 406% throughput gain at a 50 token/s SLA. The DeepSpec codebase features cross-model compatibility, with successful structural validation on Alibaba’s Qwen3 series and Google’s Gemma4-12B, and against baseline models like Eagle3, DSpark improved average effective token acceptance lengths by up to 30