Alibaba's Qwen-AgentWorld improves agent performance across seven ben…

Alibaba's Qwen team released Qwen-AgentWorld on Tuesday, a language world model trained to simulate what tools and environments return when an agent takes an action. The flagship variant, Qwen-AgentWorld-397B-A17B, outperformed both GPT-5.4 and Claude Opus 4.8 on the AgentWorldBench, achieving the highest simulation quality across seven domains: MCP, Search, Terminal, Software Engineering, Android, Web, and OS. The model covers seven distinct domains under a single architecture. Alibaba’s Qwen3-Max, released in May, was built around a 35-hour autonomous execution capability and scores 69.6 on the real-world SWE-Bench Verified coding benchmark. The Qwen 3 family includes open-weight models optimized for agentic workflows, all shipping under Apache 2.0 licensing. Qwen-Agent, the team’s open-source framework for building agent applications, provides scaffolding for instruction following and tool usage, with AgentWorld plugging in as the simulation layer.