Google

Gemini 2.5 Pro Vision

Gemini 2.5 Pro Vision is a multimodal AI model developed by Google DeepMind, designed to reason through complex problems by analyzing text, images, audio, video, and code. It operates as a "thinking model," meaning it works through logical steps before producing a response rather than generating output directly. The model supports a context window of 1,048,576 tokens, enabling it to process large documents, codebases, and extended conversations in a single request. The model is particularly suited for tasks that require combining visual understanding with structured reasoning, such as interpreting diagrams, analyzing image-based data, and generating code from visual inputs. It has demonstrated strong benchmark performance in math, science, and software engineering tasks, including a 63.8% score on the SWE-Bench Verified evaluation. Gemini 2.5 Pro Vision is available through Google AI Studio and via the Gemini API, making it accessible for developers building applications that require both vision and reasoning capabilities.

Jun 17, 2025 1,048,576 context 65,536 tokens output

Extended Context Window Multimodal Input Structured Reasoning Code Generation Math and Science Tasks Visual Understanding

Overview ↓ About ↓ Capabilities ↓ Pricing ↓ Price Comparison ↓ Parameters ↓ Benchmarks ↓ Compare ↓ Tools ↓ Daily ↓ Resources ↓ FAQ ↓

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

Google

Input Context Window

The number of tokens supported by the input context window.

1,048,576 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

65,536 tokens tokens

Open Source

Whether the model's code is available for public use.

Release Date

When the model was first released.

Jun 17, 2025 1 year ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

June 2025

API Providers

The providers that offer this model. This is not an exhaustive list.

Google, Vertex AI, Gemini API

Modalities

Types of data this model can process.

Text Image Video Audio Code

What is Gemini 2.5 Pro Vision

A fuller summary of positioning, capabilities, and source-specific details for Gemini 2.5 Pro Vision.

Gemini 2.5 Pro Vision is a multimodal AI model developed by Google DeepMind, designed to reason through complex problems by analyzing text, images, audio, video, and code. It operates as a "thinking model," meaning it works through logical steps before producing a response rather than generating output directly. The model supports a context window of 1,048,576 tokens, enabling it to process large documents, codebases, and extended conversations in a single request.

The model is particularly suited for tasks that require combining visual understanding with structured reasoning, such as interpreting diagrams, analyzing image-based data, and generating code from visual inputs. It has demonstrated strong benchmark performance in math, science, and software engineering tasks, including a 63.8% score on the SWE-Bench Verified evaluation. Gemini 2.5 Pro Vision is available through Google AI Studio and via the Gemini API, making it accessible for developers building applications that require both vision and reasoning capabilities.

Capabilities

What Gemini 2.5 Pro Vision supports

CTX

Extended Context Window

Processes up to 1,048,576 tokens in a single request, allowing entire codebases, long documents, or extended conversations to be handled without truncation.

Multimodal Input

Accepts text, images, audio, video, and code as inputs within the same request, enabling cross-modal analysis and generation.

Structured Reasoning

Uses a chain-of-thought approach to work through multi-step problems before producing a final answer, improving accuracy on complex tasks.

</>

Code Generation

Generates and analyzes code across multiple languages, achieving 63.8% on the SWE-Bench Verified benchmark for software engineering tasks.

Math and Science Tasks

Applies logical and quantitative reasoning to solve problems in mathematics and science, with benchmark results reflecting strong performance in these domains.

Visual Understanding

Interprets and reasons over images and video frames, supporting use cases like diagram analysis, chart reading, and image-based question answering.

Pricing for Gemini 2.5 Pro Vision

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Input tokens $1.25 Per million tokens

Output tokens N/A Per million tokens

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

maxTemperature 2

maxResponseSize 65,536 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

Google Vertex AI Gemini API

Configuration & Parameters

The configurable options currently documented for this model.

Temperature

Number

Default: 1 Range: 0 - 2 (step 0.1)

Max Response Tokens

Number

Default: 4096 Range: 1 - 65535 (step 1)

Supported Request Parameters

Parameters currently listed by OpenRouter or the local catalog for this model.

Temperature Max Response Tokens

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark	Score
AIME 2024 American math olympiad problems	88.7%
GPQA Diamond PhD-level science questions (biology, physics, chemistry)	84.4%
HLE Questions that challenge frontier models across many domains	21.1%
LiveCodeBench Real-world coding tasks from recent competitions	80.1%
MATH-500 Undergraduate and competition-level math problems	96.7%
MMLU-Pro Expert knowledge across 14 academic disciplines	86.2%
SciCode Scientific research coding and numerical methods	42.8%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Product Announcement Announcements

→

Documentation Documentation

→

Google AI Studio Playground

→

Gemini API Reference Documentation

→

Gemini 2.5 Pro Model Card Documentation

→