Google's Gemini 3.1 Pro Preview Tops AI Index, Offers Major Cost Advantage

Google's Gemini 3.1 Pro Preview has taken the top spot on the Artificial Analysis Intelligence Index, a composite benchmark that rolls ten individual tests into one overall score. The model leads the ranking by four points over its closest competitor, Anthropic's Claude Opus 4.6, and achieves this feat at less than half the cost of its major rivals. This performance and pricing combination presents a significant new option in the competitive landscape of large language models.

The model's strength is broad, ranking first in six out of ten benchmark categories. These include agent-based coding, knowledge, scientific reasoning, and physics. A notable technical improvement is a substantial reduction in the model's tendency to generate incorrect or fabricated information, known as hallucination. Compared to its predecessor, Gemini 3 Pro, the 3.1 Pro Preview has seen its hallucination rate drop by 38 percentage points, addressing a key weakness of the earlier version.

The cost efficiency of Gemini 3.1 Pro is stark when measured against the benchmark process itself. Running the full suite of ten tests that comprise the Artificial Analysis Intelligence Index cost just $892 using Gemini 3.1 Pro. This is a fraction of the expense required for competitors: the same evaluation cost $2,304 for OpenAI's GPT-5.2 and $2,486 for Anthropic's Claude Opus 4.6. This cost advantage is partly due to efficiency; Gemini 3.1 Pro consumed only 57 million tokens to complete the index, well under the 130 million tokens used by GPT-5.2. For context, open-source models like GLM-5 can run the tests for even less, at approximately $547, though they do not match the overall performance score of the leading proprietary models.

However, the benchmark results do not tell the complete story, and the index reveals important caveats. Despite its strong composite score, Gemini 3.1 Pro Preview still falls behind Claude Sonnet 4.6, Claude Opus 4.6, and GPT-5.2 when evaluated on real-world agent tasks, a critical category for autonomous AI applications. Furthermore, in an independent fact-checking test conducted internally by the analysts, Gemini 3.1 Pro performed significantly worse than its top rivals. The model was able to verify only about a quarter of statements in initial tests, a rate even lower than the already weak performance of Gemini 3 Pro in this area. This highlights a persistent gap in reliability for factual verification, a crucial capability for many enterprise and consumer applications.

The emergence of Gemini 3.1 Pro Preview as a cost-performance leader signals a potential shift in market dynamics, where efficiency and value are becoming as important as raw capability. Google's model demonstrates that it is possible to achieve top-tier results on standardized academic and reasoning benchmarks while drastically reducing inference costs. Yet, the disparity in performance on practical agent tasks and fact-checking serves as a critical reminder for developers and businesses. Choosing a model requires matching its specific strengths—whether in coding, scientific reasoning, or cost-efficiency—against the demands of the intended application, as no single model currently dominates all dimensions of performance.

AI Fresh Daily

Google's Gemini 3.1 Pro Preview Tops AI Index, Offers Major Cost Advantage

Key Points