AI Power Indexstatic
NVDA+2.34%
MSFT-0.12%
GOOGL+1.87%
META+0.95%
AMD+1.73%
ORCL-0.44%
PLTR+3.21%
SNOW+4.15%
AI INDEX+1.42%

Google's Gemini 3.1 Pro Preview Tops AI Index, Offers Major Cost Advantage

AI Fresh Daily
4 min read
Feb 21, 2026
Google's Gemini 3.1 Pro Preview Tops AI Index, Offers Major Cost Advantage

This article was written by AI based on multiple news sources.Read original source →

Google's Gemini 3.1 Pro Preview has taken the top spot on the Artificial Analysis Intelligence Index, a composite benchmark that rolls ten individual tests into one overall score. The model leads the ranking by four points over its closest competitor, Anthropic's Claude Opus 4.6, and achieves this feat at less than half the cost of its major rivals. This performance and pricing combination presents a significant new option in the competitive landscape of large language models.

The model's strength is broad, ranking first in six out of ten benchmark categories. These include agent-based coding, knowledge, scientific reasoning, and physics. A notable technical improvement is a substantial reduction in the model's tendency to generate incorrect or fabricated information, known as hallucination. Compared to its predecessor, Gemini 3 Pro, the 3.1 Pro Preview has seen its hallucination rate drop by 38 percentage points, addressing a key weakness of the earlier version.

The cost efficiency of Gemini 3.1 Pro is stark when measured against the benchmark process itself. Running the full suite of ten tests that comprise the Artificial Analysis Intelligence Index cost just $892 using Gemini 3.1 Pro. This is a fraction of the expense required for competitors: the same evaluation cost $2,304 for OpenAI's GPT-5.2 and $2,486 for Anthropic's Claude Opus 4.6. This cost advantage is partly due to efficiency; Gemini 3.1 Pro consumed only 57 million tokens to complete the index, well under the 130 million tokens used by GPT-5.2. For context, open-source models like GLM-5 can run the tests for even less, at approximately $547, though they do not match the overall performance score of the leading proprietary models.

However, the benchmark results do not tell the complete story, and the index reveals important caveats. Despite its strong composite score, Gemini 3.1 Pro Preview still falls behind Claude Sonnet 4.6, Claude Opus 4.6, and GPT-5.2 when evaluated on real-world agent tasks, a critical category for autonomous AI applications. Furthermore, in an independent fact-checking test conducted internally by the analysts, Gemini 3.1 Pro performed significantly worse than its top rivals. The model was able to verify only about a quarter of statements in initial tests, a rate even lower than the already weak performance of Gemini 3 Pro in this area. This highlights a persistent gap in reliability for factual verification, a crucial capability for many enterprise and consumer applications.

The emergence of Gemini 3.1 Pro Preview as a cost-performance leader signals a potential shift in market dynamics, where efficiency and value are becoming as important as raw capability. Google's model demonstrates that it is possible to achieve top-tier results on standardized academic and reasoning benchmarks while drastically reducing inference costs. Yet, the disparity in performance on practical agent tasks and fact-checking serves as a critical reminder for developers and businesses. Choosing a model requires matching its specific strengths—whether in coding, scientific reasoning, or cost-efficiency—against the demands of the intended application, as no single model currently dominates all dimensions of performance.

Key Points

  • 1Gemini 3.1 Pro Preview leads the Artificial Analysis Intelligence Index, scoring four points above Claude Opus 4.6.
  • 2It achieves this at less than half the cost: running the full index test cost $892 vs. $2,304 for GPT-5.2 and $2,486 for Claude Opus.
  • 3The model ranks first in six of ten categories, including agent-based coding and scientific reasoning, and shows a 38% reduction in hallucination rate versus Gemini 3 Pro.
Why It Matters

It demonstrates a major advance in cost-performance for top-tier models, forcing competitors to justify premium pricing and pushing efficiency to the forefront of model evaluation.