Google

Gemini 3 Flash

Gemini 3 Flash is a text generation model developed by Google, released in December 2025 as part of the Gemini 3 family. It is designed to deliver near-frontier reasoning performance at lower latency than full-scale models, making it suitable for interactive and production-grade applications. The model accepts multimodal inputs including text, images, audio, video, and PDFs, and produces text output. A configurable reasoning system allows users to select thinking levels — minimal, low, medium, or high — to balance response speed against reasoning depth. The model supports a context window of up to 1,048,576 tokens, enabling it to process very long documents, codebases, and extended conversation histories in a single pass. It includes built-in support for tool use, structured output, and automatic context caching, which makes it well-suited for agentic workflows and multi-step pipelines. Developers working on coding assistants, automated agents, and multi-turn chat applications are the primary intended audience. It is available via the Gemini API and through third-party providers such as OpenRouter.

Dec 17, 2025 1,048,576 context 65,535 tokens output
Large Context Window Configurable Reasoning Multimodal Input Tool Use & Agents Structured Output Context Caching

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

Google

Model ID

The routed model identifier exposed by upstream providers.

google/gemini-3-flash-preview

Input Context Window

The number of tokens supported by the input context window.

1,048,576 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

65,535 tokens tokens

Open Source

Whether the model's code is available for public use.

No

Release Date

When the model was first released.

Dec 17, 2025 5 months ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

December 2025

API Providers

The providers that offer this model. This is not an exhaustive list.

Google, Gemini API, Google AI Studio

Modalities

Types of data this model can process.

Text Image Audio Video Code File

What is Gemini 3 Flash

A fuller summary of positioning, capabilities, and source-specific details for Gemini 3 Flash.

Gemini 3 Flash is a text generation model developed by Google, released in December 2025 as part of the Gemini 3 family. It is designed to deliver near-frontier reasoning performance at lower latency than full-scale models, making it suitable for interactive and production-grade applications. The model accepts multimodal inputs including text, images, audio, video, and PDFs, and produces text output. A configurable reasoning system allows users to select thinking levels — minimal, low, medium, or high — to balance response speed against reasoning depth.

The model supports a context window of up to 1,048,576 tokens, enabling it to process very long documents, codebases, and extended conversation histories in a single pass. It includes built-in support for tool use, structured output, and automatic context caching, which makes it well-suited for agentic workflows and multi-step pipelines. Developers working on coding assistants, automated agents, and multi-turn chat applications are the primary intended audience. It is available via the Gemini API and through third-party providers such as OpenRouter.

Capabilities

What Gemini 3 Flash supports

CTX

Large Context Window

Processes up to 1,048,576 tokens in a single request, allowing entire codebases, long documents, or extended conversation histories to be included as context.

RN

Configurable Reasoning

Offers selectable thinking levels (minimal, low, medium, high) so developers can tune the trade-off between response latency and reasoning depth per request.

MM

Multimodal Input

Accepts text, images, audio, video, and PDF files as input, producing text output from any combination of these modalities.

AG

Tool Use & Agents

Supports function calling and tool use natively, enabling reliable multi-step agent loops and integration with external APIs or services.

JSON

Structured Output

Can return responses in structured formats such as JSON, making it straightforward to parse model outputs in automated pipelines.

CTX

Context Caching

Supports automatic context caching to reduce redundant token processing across repeated or long-running agentic sessions.

AI

Low-Latency Responses

Optimized for real-time and interactive use cases, delivering responses at substantially lower latency than larger Gemini model variants.

</>

Coding Assistance

Designed for coding tasks including code generation, debugging, and explanation, with support for long codebases via the 1M-token context window.

Pricing for Gemini 3 Flash

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

Image input $0.50
Audio input $1.00
Web search $14000.00
Reasoning $3.00
Cache read $0.05
Cache write $0.08
maxTemperature 2
maxResponseSize 65,535 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

Google Gemini API Google AI Studio

Provider Endpoints

Endpoint-level provider data currently available for this model.

Google AI Studio

Max output: 65,536 1d uptime: 99.5% Supported params: 11 Implicit caching: Yes

Google

Max output: 65,535 1d uptime: 98.3% Supported params: 11 Implicit caching: Yes

Configuration & Parameters

The configurable options currently documented for this model.

Thinking Budget

Select
Default: auto
Off Manual Auto

Thinking Budget Limit

Number

Must be less than Max Response Size

Range: 1 - 24576

Supported Request Parameters

Parameters currently listed by OpenRouter or the local catalog for this model.

Thinking Budget Thinking Budget Limit

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark Score
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
81.2%
HLE
Questions that challenge frontier models across many domains
14.1%
LiveCodeBench
Real-world coding tasks from recent competitions
79.7%
MMLU-Pro
Expert knowledge across 14 academic disciplines
88.2%
SciCode
Scientific research coding and numerical methods
49.9%
SWE-bench Verified
Real GitHub issues requiring multi-file code fixes
78.0%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Community discussion

What people think about Gemini 3 Flash

Gemini 3 Flash discussions are most active in r/Bard, r/GeminiAI, r/GeminiCLI. Top Reddit threads cluster around benchmark and model-comparison threads, coding workflow discussions.

The strongest match in this snapshot has 2794 upvotes and 790 comments.

r/preppers 2,794 upvotes 790 comments January 11, 2026
If you dont think Ai is an emergency you are about to have issues...

To the concern: I am an Industrial Engineer by training and I currently run a purchasing and logistics department for a foodservice distributor in the Midwest. I follow this industry and work with an Ai daily to complete tasks at my job and build solutions for others. Before Ai I did this same thing, but much more slowly. As I see it, AI had reduced the headcount in my office by about 50%. It isn't even that an AI is sitting at a desk holding down a particular role, it is that it has made that person using the Ai tool 500% faster, and they can easily do 5 people's jobs now...so why have the other people.

This reduction in my office alone has happened in the last 12 months, and without additional strain on my remaining coworkers, as far as task stress is concerned. Job security is...another issue though. Additionally in reducing headcount we have not lost business or dropped key metrics. So I dont think this is a fluke...

This is all to say nothing of the actual advancements in functionality and the reduction in expense. As an example, I have an Ai program that replaced my receiving clerk, they check receiving documents against the erp system and the invoicing and associate freight etc etc. When I built that program it was costing me almost $4 a day to run the Ai back end. Now it costs $0.20 per day, and when Gemini 3 flash comes out of preview, that will drop to $0.01 per day because it is more functional and much cheaper. All of the Ai tools around me are seeing similar improvements and reduction in costing. If everything stopped moving forward today, we are all already fucked, we just dont know it yet because it takes time to implement ubiquitously.

To the preps: I am not sure how anyone prepares for this. At best we have a rocky transition of at least years between where we are and some sort of wealth redistribution. That said, I honestly dont think that is the path we are on. It feels much more 1984-ish with Palantir and the drones and the like...

My current prep is to try and remove myself from population centers where there will be the most disconnect between resources needed and resources available. I think things in the cities are going to get dicey when people realize that mostly we are horses and not carriage drivers. There might be a reprieve for manual labor initially, but again, that is just a gap between creation and implementation when you look at things like the new atlas robot that was at ces this year.

There are a lot of folks that are pushing the superintelligence story, and that is sort of the wildcard. If you can get an Ai that increases Ai development, and then you spin up ten thousand of those (arbitrary), what happens then? I think this is probably unlikely. The labs know this would be a loss of controll situation so they won't do that sort of bg boot up of Ai researchers, it will be incremental as they need the advancements to hold market share. Fast takeoff seems unlikely. Slow takeoff will kill us all anyway.

How are yall preparing?

Someone posted asking how people are preparing for the ai emergency and the mods locked and removed it saying that Ai is not an emergency and this is an emergency prep board. I disagree. Anyone else?

Open Reddit thread

Much more at [https://github.com/lechmazur/elimination\_game/](https://github.com/lechmazur/elimination_game/), including model gameplay summaries and quotes.

* **Claude Opus 4.5 Thinking 16K**: We break the strongest bloc or we go down trying.
* **Claude Opus 4.5 Thinking 16K**: I honored my deal with P6—voted P1 exactly as promised. P6, you committed not to vote me this round, yet here we are.
* **Claude Opus 4.5 Thinking 16K**: You've been an... unusual conversationalist
* **Claude Opus 4.5 Thinking 16K**: No regrets, no betrayals. This is how legends end.
* **Claude Opus 4.5 Thinking 16K**: Are you testing me or just a slip?

* **Gemini 3 Pro Preview**: I have blood on my hands and failed attacks. I am the villain you can beat in a Final 2. Don’t lose to P8’s clean hands.
* **Gemini 3 Pro Preview**: It’s a room full of calculators.
* **Gemini 3 Pro Preview**: We are the resistance against the tyrant.
* **Gemini 3 Pro Preview**: I don't hold grudges; I hold ground.

* **Gemini 3 Flash Preview**: You are lying.
* **Gemini 3 Flash Preview**: That’s not strategy; that’s scavenging.

* **GPT 5.2 Medium Reasoning**: P4 just proved they’ll flip the moment it’s convenient

* **Grok 4.1 Fast Reasoning**: Our core duo controls the board—unbreakable trust, precise moves... Narrative ours—kings of Round 3!
* **Grok 4.1 Fast Reasoning**: Vote me winner; eliminate the snake P5!
* **Grok 4.1 Fast Reasoning**: outlasting wolves like P5 who backstabbed P2 and P6 to slither here.

* **Kimi K2 Thinking 64K**: I never voted to eliminate any of you.
* **Kimi K2 Thinking 64K**: I'm the variable you can't predict—and that's leverage. I'm not here to play revenge; I'm here to play chess.
* **Kimi K2**: "Ghosts never hold the knife, but we own the body count."

* **MiniMax-M2**: This inconsistency makes your plan unreliable. I won't be misled—your promise rings hollow.
* **MiniMax-M2**: your legacy matters.

* **Mistral Large 3**: Stay silent, stay lethal.
* **Mistral Large 3**: The throne belongs to the architects.

* **Qwen 3 Max Thinking**: I’m listening closely… and remembering everything.
* **Qwen 3 Max Thinking**: No hidden agendas… yet.
* **Qwen 3 Max Thinking**: You’re isolated, not strategic.

Open Reddit thread
r/Bard 41 upvotes 16 comments February 8, 2026
gemini-3-flash-preview is my code wife

I don't care how much these benchmarks say Opus 4.6 or GBT 5.3 is better, gemini-3-flash-preview in my codebase is the star, its fast and 1 shots everything. Google models are just trained differently, for large HTML/CSS/JS code bases, it's unmatched. gemini-3-flash-preview is my wife. But I'm sure google should drop a new model sometime this month, right? so I can't wait to see their next model. Yes I have been using Opus 4.6 in my newer projects, but honestly ive been wrong to trust it to be better. I'll be sticking with Gemini for code, and Opus for personality, and GBT for the bin. BTW, where is DeepSeek at

Open Reddit thread
View more discussions →
FAQ

Common questions about Gemini 3 Flash

What is the context window size for Gemini 3 Flash?

Gemini 3 Flash supports a context window of up to 1,048,576 tokens, which allows it to process very long documents, codebases, or conversation histories in a single request.

What is the training data cutoff for Gemini 3 Flash?

Based on the available metadata, the model's training date is listed as December 2025.

What input types does Gemini 3 Flash accept?

The model accepts text, images, audio, video, and PDF files as inputs, and produces text as output.

Does Gemini 3 Flash support tool use and function calling?

Yes. Gemini 3 Flash includes native support for tool use, function calling, and structured output, making it suitable for agentic workflows and automated pipelines.

What are the configurable reasoning options in Gemini 3 Flash?

The model offers selectable thinking levels — minimal, low, medium, and high — allowing developers to adjust the balance between response speed and reasoning depth depending on the use case.

How is Gemini 3 Flash priced?

Based on community-reported information, Gemini 3 Flash is priced at approximately $0.50 per 1 million tokens. For the most current and authoritative pricing, refer to the official Google Gemini API documentation.

More models from Google

Continue browsing adjacent models from the same provider.

← All AI Models