Google

Gemini 3 Flash

Gemini 3 Flash is a text generation model developed by Google, released in December 2025 as part of the Gemini 3 family. It is designed to deliver near-frontier reasoning performance at lower latency than full-scale models, making it suitable for interactive and production-grade applications. The model accepts multimodal inputs including text, images, audio, video, and PDFs, and produces text output. A configurable reasoning system allows users to select thinking levels — minimal, low, medium, or high — to balance response speed against reasoning depth. The model supports a context window of up to 1,048,576 tokens, enabling it to process very long documents, codebases, and extended conversation histories in a single pass. It includes built-in support for tool use, structured output, and automatic context caching, which makes it well-suited for agentic workflows and multi-step pipelines. Developers working on coding assistants, automated agents, and multi-turn chat applications are the primary intended audience. It is available via the Gemini API and through third-party providers such as OpenRouter.

Dec 17, 2025 1,048,576 context 65,535 tokens output

Large Context Window Configurable Reasoning Multimodal Input Tool Use & Agents Structured Output Context Caching

Overview ↓ About ↓ Capabilities ↓ Pricing ↓ Price Comparison ↓ Providers ↓ Parameters ↓ Benchmarks ↓ Compare ↓ Tools ↓ Daily ↓ Resources ↓ Community ↓ FAQ ↓

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

Google

Model ID

The routed model identifier exposed by upstream providers.

google/gemini-3-flash-preview

Input Context Window

The number of tokens supported by the input context window.

1,048,576 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

65,535 tokens tokens

Open Source

Whether the model's code is available for public use.

Release Date

When the model was first released.

Dec 17, 2025 6 months ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

December 2025

API Providers

The providers that offer this model. This is not an exhaustive list.

Google, Gemini API, Google AI Studio

Modalities

Types of data this model can process.

Text Image Audio Video Code File

What is Gemini 3 Flash

A fuller summary of positioning, capabilities, and source-specific details for Gemini 3 Flash.

Gemini 3 Flash is a text generation model developed by Google, released in December 2025 as part of the Gemini 3 family. It is designed to deliver near-frontier reasoning performance at lower latency than full-scale models, making it suitable for interactive and production-grade applications. The model accepts multimodal inputs including text, images, audio, video, and PDFs, and produces text output. A configurable reasoning system allows users to select thinking levels — minimal, low, medium, or high — to balance response speed against reasoning depth.

The model supports a context window of up to 1,048,576 tokens, enabling it to process very long documents, codebases, and extended conversation histories in a single pass. It includes built-in support for tool use, structured output, and automatic context caching, which makes it well-suited for agentic workflows and multi-step pipelines. Developers working on coding assistants, automated agents, and multi-turn chat applications are the primary intended audience. It is available via the Gemini API and through third-party providers such as OpenRouter.

Capabilities

What Gemini 3 Flash supports

CTX

Large Context Window

Processes up to 1,048,576 tokens in a single request, allowing entire codebases, long documents, or extended conversation histories to be included as context.

Configurable Reasoning

Offers selectable thinking levels (minimal, low, medium, high) so developers can tune the trade-off between response latency and reasoning depth per request.

Multimodal Input

Accepts text, images, audio, video, and PDF files as input, producing text output from any combination of these modalities.

Tool Use & Agents

Supports function calling and tool use natively, enabling reliable multi-step agent loops and integration with external APIs or services.

JSON

Structured Output

Can return responses in structured formats such as JSON, making it straightforward to parse model outputs in automated pipelines.

CTX

Context Caching

Supports automatic context caching to reduce redundant token processing across repeated or long-running agentic sessions.

Low-Latency Responses

Optimized for real-time and interactive use cases, delivering responses at substantially lower latency than larger Gemini model variants.

</>

Coding Assistance

Designed for coding tasks including code generation, debugging, and explanation, with support for long codebases via the 1M-token context window.

Pricing for Gemini 3 Flash

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Input tokens $0.50 Per million tokens

Output tokens $3.00 Per million tokens

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

Image input $0.50

Audio input $1.00

Web search $14000.00

Reasoning $3.00

Cache read $0.05

Cache write $0.08

maxTemperature 2

maxResponseSize 65,535 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

Google Gemini API Google AI Studio

Provider Endpoints

Endpoint-level provider data currently available for this model.

Google

Max output: 65,535 1d uptime: 96.0% Supported params: 11 Implicit caching: Yes

Google AI Studio

Max output: 65,536 1d uptime: 99.1% Supported params: 10 Implicit caching: Yes

Configuration & Parameters

The configurable options currently documented for this model.

Thinking Budget

Select

Default: auto

Off Manual Auto

Thinking Budget Limit

Number

Must be less than Max Response Size

Range: 1 - 24576

Supported Request Parameters

Parameters currently listed by OpenRouter or the local catalog for this model.

Thinking Budget Thinking Budget Limit

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark	Score
GPQA Diamond PhD-level science questions (biology, physics, chemistry)	81.2%
HLE Questions that challenge frontier models across many domains	14.1%
LiveCodeBench Real-world coding tasks from recent competitions	79.7%
MMLU-Pro Expert knowledge across 14 academic disciplines	88.2%
SciCode Scientific research coding and numerical methods	49.9%
SWE-bench Verified Real GitHub issues requiring multi-file code fixes	78.0%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

OpenRouter Model Page Other

→

Official Documentation Documentation

→

Release Notes Announcements

→

Gemini API Overview Documentation

→

Google AI Studio Playground Playground

→

OpenRouter Model Page OpenRouter

→