Google

Gemini 2.5 Flash Vision

Gemini 2.5 Flash Vision is a multimodal vision model developed by Google, designed to process and reason over visual inputs alongside text. It is part of the Gemini 2.5 Flash family, which is built around balancing cost efficiency with broad capability coverage. The model supports a context window of 1,048,576 tokens, making it suitable for tasks that require processing large amounts of information in a single request. It was trained with a knowledge cutoff of June 2025. This model is positioned for use cases where real-time or low-latency responses are important, such as visual question answering, document analysis with images, and applications that combine vision with extended context. The "thinking" architecture underlying the Gemini 2.5 Flash series enables the model to apply multi-step reasoning before producing a response. Developers looking for a vision-capable model that can handle long documents, images, and mixed-modality inputs without incurring the cost of larger models will find this a practical option.

Jun 17, 2025 1,048,576 context 65,535 tokens output

Large Context Window Real-Time Latency Visual Understanding Multimodal Reasoning Structured Output

Overview ↓ About ↓ Capabilities ↓ Pricing ↓ Price Comparison ↓ Parameters ↓ Benchmarks ↓ Compare ↓ Tools ↓ Daily ↓ Resources ↓ FAQ ↓

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

Google

Input Context Window

The number of tokens supported by the input context window.

1,048,576 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

65,535 tokens tokens

Open Source

Whether the model's code is available for public use.

Release Date

When the model was first released.

Jun 17, 2025 1 year ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

January 2025

API Providers

The providers that offer this model. This is not an exhaustive list.

Google, Vertex AI

Modalities

Types of data this model can process.

Text Image

What is Gemini 2.5 Flash Vision

A fuller summary of positioning, capabilities, and source-specific details for Gemini 2.5 Flash Vision.

Gemini 2.5 Flash Vision is a multimodal vision model developed by Google, designed to process and reason over visual inputs alongside text. It is part of the Gemini 2.5 Flash family, which is built around balancing cost efficiency with broad capability coverage. The model supports a context window of 1,048,576 tokens, making it suitable for tasks that require processing large amounts of information in a single request. It was trained with a knowledge cutoff of June 2025.

This model is positioned for use cases where real-time or low-latency responses are important, such as visual question answering, document analysis with images, and applications that combine vision with extended context. The "thinking" architecture underlying the Gemini 2.5 Flash series enables the model to apply multi-step reasoning before producing a response. Developers looking for a vision-capable model that can handle long documents, images, and mixed-modality inputs without incurring the cost of larger models will find this a practical option.

Capabilities

What Gemini 2.5 Flash Vision supports

CTX

Large Context Window

Supports up to 1,048,576 tokens in a single context, enabling processing of long documents, extended conversations, or large batches of visual and textual content.

Real-Time Latency

Optimized for low-latency responses, making it suitable for interactive applications and real-time visual analysis workflows.

Visual Understanding

Processes image inputs alongside text to answer questions, describe scenes, extract information, or reason over visual content.

Multimodal Reasoning

Applies multi-step thinking across both visual and textual inputs, supporting tasks like document comprehension that combine images and text.

JSON

Structured Output

Can return responses in structured formats, useful for extracting data from images or documents into machine-readable outputs.

Pricing for Gemini 2.5 Flash Vision

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Input tokens $0.30 Per million tokens

Output tokens N/A Per million tokens

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

maxTemperature 2

maxResponseSize 65,535 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

Google Vertex AI

Configuration & Parameters

The configurable options currently documented for this model.

Temperature

Number

Default: 1 Range: 0 - 2 (step 0.1)

Max Response Tokens

Number

Default: 4096 Range: 1 - 65535 (step 1)

Supported Request Parameters

Parameters currently listed by OpenRouter or the local catalog for this model.

Temperature Max Response Tokens

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark	Score
AIME 2024 American math olympiad problems	50.0%
GPQA Diamond PhD-level science questions (biology, physics, chemistry)	68.3%
HLE Questions that challenge frontier models across many domains	5.1%
LiveCodeBench Real-world coding tasks from recent competitions	49.5%
MATH-500 Undergraduate and competition-level math problems	93.2%
MMLU-Pro Expert knowledge across 14 academic disciplines	80.9%
SciCode Scientific research coding and numerical methods	29.1%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Official Website Other

→

Documentation Documentation

→

Google AI Studio Playground

→

Gemini API Reference Documentation

→

Gemini 2.5 Flash Announcement Announcements

→