Extended Context Window
Processes up to 1,048,576 tokens in a single request, allowing entire codebases, long documents, or extended conversations to be handled without truncation.
Gemini 2.5 Pro Vision is a multimodal AI model developed by Google DeepMind, designed to reason through complex problems by analyzing text, images, audio, video, and code. It operates as a "thinking model," meaning it works through logical steps before producing a response rather than generating output directly. The model supports a context window of 1,048,576 tokens, enabling it to process large documents, codebases, and extended conversations in a single request. The model is particularly suited for tasks that require combining visual understanding with structured reasoning, such as interpreting diagrams, analyzing image-based data, and generating code from visual inputs. It has demonstrated strong benchmark performance in math, science, and software engineering tasks, including a 63.8% score on the SWE-Bench Verified evaluation. Gemini 2.5 Pro Vision is available through Google AI Studio and via the Gemini API, making it accessible for developers building applications that require both vision and reasoning capabilities.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for Gemini 2.5 Pro Vision.
Gemini 2.5 Pro Vision is a multimodal AI model developed by Google DeepMind, designed to reason through complex problems by analyzing text, images, audio, video, and code. It operates as a "thinking model," meaning it works through logical steps before producing a response rather than generating output directly. The model supports a context window of 1,048,576 tokens, enabling it to process large documents, codebases, and extended conversations in a single request.
The model is particularly suited for tasks that require combining visual understanding with structured reasoning, such as interpreting diagrams, analyzing image-based data, and generating code from visual inputs. It has demonstrated strong benchmark performance in math, science, and software engineering tasks, including a 63.8% score on the SWE-Bench Verified evaluation. Gemini 2.5 Pro Vision is available through Google AI Studio and via the Gemini API, making it accessible for developers building applications that require both vision and reasoning capabilities.
Processes up to 1,048,576 tokens in a single request, allowing entire codebases, long documents, or extended conversations to be handled without truncation.
Accepts text, images, audio, video, and code as inputs within the same request, enabling cross-modal analysis and generation.
Uses a chain-of-thought approach to work through multi-step problems before producing a final answer, improving accuracy on complex tasks.
Generates and analyzes code across multiple languages, achieving 63.8% on the SWE-Bench Verified benchmark for software engineering tasks.
Applies logical and quantitative reasoning to solve problems in mathematics and science, with benchmark results reflecting strong performance in these domains.
Interprets and reasons over images and video frames, supporting use cases like diagram analysis, chart reading, and image-based question answering.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Additional usage-cost dimensions synced into the project for this model.
Places where this model is available, based on the synced detail-page metadata.
The configurable options currently documented for this model.
Parameters currently listed by OpenRouter or the local catalog for this model.
Benchmark scores synced from the current model source and normalized into the local catalog.
| Benchmark | Score |
|---|---|
|
AIME 2024
American math olympiad problems
|
|
|
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
|
|
|
HLE
Questions that challenge frontier models across many domains
|
|
|
LiveCodeBench
Real-world coding tasks from recent competitions
|
|
|
MATH-500
Undergraduate and competition-level math problems
|
|
|
MMLU-Pro
Expert knowledge across 14 academic disciplines
|
|
|
SciCode
Scientific research coding and numerical methods
|
Official model cards, release notes, docs, and other references synced from the source page.
Jump straight into the most relevant side-by-side comparison pages for this model.
Compare Gemini 2.5 Pro Vision and Gemini 2.5 Pro across pricing, context window, capabilities, benchmarks, and API access to choose the better fit for long-context workloads versus long-context workloads.
Compare Gemini 2.5 Pro Vision and Gemini 2.5 Flash Vision across pricing, context window, capabilities, benchmarks, and API access to choose the better fit for long-context workloads versus long-context workloads.
Compare Gemini 2.5 Pro Vision and Gemini 2.5 Flash Lite across pricing, context window, capabilities, benchmarks, and API access to choose the better fit for long-context workloads versus long-context workloads.
Compare Gemini 2.5 Pro Vision and Gemini 2.5 Flash Image across pricing, context window, capabilities, benchmarks, and API access to choose the better fit for long-context workloads versus long-context workloads.
Compare Gemini 2.5 Pro Vision and Gemini 2.5 Flash across pricing, context window, capabilities, benchmarks, and API access to choose the better fit for long-context workloads versus long-context workloads.
Compare Gemini 2.5 Pro Vision and Gemini 1.5 Pro Vision Deprecated across pricing, context window, capabilities, benchmarks, and API access to choose the better fit for long-context workloads versus general-purpose AI workloads.
Gemini 2.5 Pro Vision supports a context window of 1,048,576 tokens, which allows it to process large volumes of text, images, and other inputs in a single request.
According to the model metadata, the training date is listed as June 2025.
The model supports multimodal inputs including text, images, audio, video, and code, making it suitable for tasks that combine visual and language understanding.
The model is available through Google AI Studio, the Gemini API, and Google Cloud Vertex AI, as well as through MindStudio without requiring separate API key management.
Yes. The model scored 63.8% on the SWE-Bench Verified evaluation, which measures performance on real-world software engineering tasks, and it supports code generation and analysis across multiple programming languages.
Continue browsing adjacent models from the same provider.