Google has introduced a preview version of its new generative AI model, Gemini 2.5 Flash, which is now available for testing via the Gemini API, Google AI Studio, and Vertex AI. The model is aimed at developers and teams who need to process large volumes of chat requests or build real-time solutions. Gemini 2.5 Flash supports text, images, video, and audio, and offers a context window of up to one million tokens.
Gemini 2.5 Flash just dropped. ⚡
— Google DeepMind (@GoogleDeepMind) April 17, 2025
As a hybrid reasoning model, you can control how much it ‘thinks’ depending on your 💰 – making it ideal for tasks like building chat apps, extracting data and more.
Try an early version in @Google AI Studio → https://t.co/iZJNqQmooH pic.twitter.com/gUKbK5x3yZ
This model features a hybrid operating mode — developers can independently set the model’s “thinking” level, meaning how many resources it will spend analyzing a query. This allows for an optimized balance between speed, response quality, and usage cost. If advanced reasoning is enabled, the output token cost rises from sixty cents to three dollars and fifty cents per million tokens, and developers can set a “thinking_budget” limit from zero up to twenty-four thousand five hundred seventy-six tokens.
Gemini 2.5 Flash demonstrates high performance on complex tasks, trailing only Gemini 2.5 Pro in the Hard Prompts test. In the alternative benchmark Humanity’s Last Exam, this model outperformed competitors such as Claude 3.7 Sonnet and DeepSeek R1, but fell behind OpenAI o4-mini. Notably, for simple queries, the model independently determines whether additional reasoning is needed, saving time and resources.
The new model is especially useful for building chatbots, automated data extraction tools, and other solutions where processing speed and cost control are crucial. Google emphasizes that Gemini 2.5 Flash is the most cost-effective in its lineup, and its functionality will be expanded during testing before general availability.