blog / google-gemini-3-flash-release-speed-pricing-and-strategy

Gemini 3 Flash is now available in Gemini CLI

One more AI model has been presented this year. Yesterday, less than a month after releasing its previous models, Gemini 3 Pro and Gemini 3 Deep Think, Google announced Gemini 3 Flash. It is a new model in the Gemini 3 line for text, video, and images. Google positions it as a faster and more optimized version of Gemini 3 that combines pro-level reasoning with minimal latency and aggressive cost optimization. In terms of speed, the Gemini 3 Flash is one of the fastest frontier candidates on the market: up to 3x faster than the 2.5 Pro.

The model is focused on everyday tasks: it quickly responds to queries and excels at tasks like last-minute travel planning or quickly learning complex educational concepts. The new model's multimodal thinking capabilities allow users to ask Gemini to watch videos, view images, listen to audio, or read text, and to offer visual tips. For example, the user can upload a video or sketch, and the model will explain the image or suggest improvements. Gemini 3 Flash can also create application prototypes based on text queries. Benchmarks

Benchmarks show surprisingly strong results. According to Google, Gemini 3 Flash outperforms Gemini 2.5 Flash across all key metrics and significantly outperforms Gemini 2.5 Pro in a number of tests. Specifically, the model's accuracy in the GPQA Diamond benchmark, which assesses scientific knowledge, reached 90.4%. In the MMMU Pro multimodal benchmark, the model achieved a score of 81.2% almost on par with the Gemini 3 Pro. In Humanity's Last Exam, which measures academic reasoning without external tools, the model's performance was 33.7%.

The model confidently outperforms the Gemini 2.5 Pro and competes with larger frontier models while remaining in a different price class. The key engineering feature is managed thinking. On complex tasks, the model can "think longer," but on average, it uses 30% fewer tokens than the 2.5 Pro, while maintaining higher quality.

An important signal for developers is the SWE-bench Verified score: 78%, which is higher than not only the entire 2.5 series but also the Gemini 3 Pro. It is also just slightly behind recently launched GPT-5.2. Flash is clearly aimed at agent-based scenarios, high-frequency workflows, and interactive systems, where latency is more critical than the absolute maximum in reasoning.

Benchmarks

Pricing

Gemini 3 Flash costs $0.50 for input tokens and $3 for output tokens. For audio queries, the input token price is set at $1. Combining logical reasoning, tools, and multimodality, the model is suitable for video analysis, data mining, and visual Q&A. It is no longer "premium LLM" territory, but rather mass production.

price

Availability

Gemini 3 Flash has already become the default model in the Gemini app, replacing the previous Gemini 2.5 Flash model. Users are offered two modes: "Quick" for instant answers and "Deep Thinking" for more complex tasks. For more complex math and programming tasks, Google continues to offer the Gemini 3 Pro model. Gemini 3 Flash is also becoming a default AI mode in search, meaning that the model will be available to everyday Google users around the world. 

Gemini 3 Flash is available for developers in preview mode via the Gemini API in Google AI Studio, Google Antigravity, Gemini CLI, and Android Studio. Enterprise customers can use the model through environments like Vertex AI and Gemini Enterprise service. It is already being used by JetBrains, Figma, Cursor, and Latitude.

Final Thoughts

Gemini 3 Flash is suitable for automating large volumes of work, data analysis, and rapid, repetitive scenarios at very affordable pricing. The release of this model aims to give more people access to AI-driven tools. Google is clearly betting not on headlines for making the most powerful model, but on faster responses, smarter multimodal reasoning, and prices designed for everyday, high-frequency use. With models like Gemini 3 Flash that are fast, predictable, and cheap enough to run continuously, AI can shift from a premium tool to a default layer that can be embedded into everyday workflows, agents, and products.

FAQ

Yes, but only if you start in the right place. Most people fail because they jump straight into code without understanding what they’re building or why it works. Channels like Crash Course, Jordan Harrod, and Computerphile explain what AI actually does, where it works, where it breaks, and why it matters.

COMMENTS (0)

Loading...
Gemini 3 Flash is now available in Gemini CLI