Google has unveiled Gemini 3.1 Flash-Lite, the fastest and most cost-efficient model in its Gemini 3 series, designed to support high-volume developer workloads and real-time AI applications at scale.
The new model is rolling out in preview for developers through the Gemini API in Google AI Studio and for enterprise users via Vertex AI. According to Google, the model is optimised for speed, affordability and high-frequency workflows while maintaining strong performance across reasoning and multimodal tasks.
Priced at $0.25 per million input tokens and $1.50 per million output tokens, Gemini 3.1 Flash-Lite offers significantly lower costs compared with larger AI models. Benchmark results from Artificial Analysis show that the model delivers a 2.5-times faster time to first answer token and a 45 per cent increase in output speed compared with Gemini 2.5 Flash, while maintaining similar or better response quality.
The model has also demonstrated strong benchmark performance, achieving an Elo score of 1432 on the Arena.ai Leaderboard and scoring 86.9 per cent on GPQA Diamond and 76.8 per cent on MMMU Pro, outperforming several models in its category and even surpassing earlier Gemini models in some evaluations.
Gemini 3.1 Flash-Lite includes adjustable thinking levels, enabling developers to control how much computational reasoning the model applies to a task. This feature allows teams to balance speed, cost and reasoning depth depending on the complexity of workloads.
The model is designed for a wide range of enterprise and developer applications, including high-volume translation, content moderation, UI and dashboard generation, simulations, and instruction-based workflows. Its low latency also makes it suitable for building responsive real-time applications and large-scale automated systems.
Early-access developers using Google AI Studio and Vertex AI—including companies such as Latitude, Cartwheel and Whering—have already begun deploying the model for large-scale AI solutions. Testers reported that Gemini 3.1 Flash-Lite handles complex inputs with the precision of larger models while maintaining strong instruction adherence and efficiency.
With the launch of Gemini 3.1 Flash-Lite, Google aims to provide developers with a scalable and affordable AI model capable of powering high-throughput applications and next-generation AI-driven services.


