Google unveil Gemini-SQL2, text-to-SQL system

Google Research has unveiled Gemini-SQL2, a text-to-SQL system built on the Gemini 3.1 Pro architecture, which achieved 80.04% execution accuracy on the BIRD benchmark, becoming the first single-model system to surpass the 80% threshold. The BIRD benchmark spans 12,751 unique question-SQL pairs across 95 multi-table databases and 37 professional domains, evaluating Execution Accuracy. Gemini-SQL2's score positions it more than seven percentage points ahead of OpenAI’s top-tier offering, with a human expert achieving 92.96% accuracy. The system utilizes targeted post-training, multitask learning, and agentic scaffolding over Gemini 3.1 Pro, leveraging a 1-million-token context window to ingest entire database schemas and data documentation at inference time. It employs a two-stage execution verification loop that pre-runs generated queries against isolated data samples, auto-correcting syntax errors, processing timeouts, or empty sets. Additionally, multitask test-time scaling uses self-consistency voting mechanics to cross-examine multiple query candidates.