Behind DeepSeek's 85% Speed Increase: Large Models Bid Farewell to Pa…

In June 2026, a DSpark paper signed by Liang Wenfeng was released, and the generation speed of DeepSeek-V4 online service under real traffic increased by 85%. On June 27, 2026, a paper titled "DSpark: Confidence-Scheduled Speculative Decoding Based on Semi-Autoregressive Generation" was co-authored by DeepSeek founder Liang Wenfeng in collaboration with Peking University. When DSpark was deployed on the DeepSeek-V4 online service system handling real user traffic, per-user generation speed saw a massive increase of 60% to 85% (Flash version) and 57% to 78% (Pro version); in offline or high-concurrency scenarios, aggregate throughput increased by 51% to 400%. This data is not a linear increase from simple hardware stacking, but a qualitative change in the underlying inference architecture, eliminating computational waste from invalid checks through confidence scheduling. The competition in large models is shifting from a sprint in parameter scale to a systematic engineering game of inference efficiency and computational cost.