Bryan: Huawei Got Day-Zero DeepSeek V4 Access That NVIDIA Didn't — Bi…

According to an article from BigGo Finance, analyst Bryan from SemiAnalysis stated on a podcast that when DeepSeek released the weights for its V4 model in late April, open-source inference runtimes like vLLM and SGLang had early access under NDA, while companies like Nvidia did not. This indicates a structural shift where the software layer now determines who gets a head start. DeepSeek V4 introduces two architectural changes: a million-token context window enabled by Compressed Sparse Attention and Heavily Compressed Attention, which DeepSeek claims achieves roughly a 100x reduction in KV-cache size compared to a standard Multi-Query Attention model; and a fused "Mega MoE" kernel that merges computation and communication. Technical co-host Kimbo noted that the headline changes of V4 compared to V3 and R1 are the one million context length, requiring aggressive innovations on the attention mechanism.