Google

Gemini 2.0 Flash-Lite Vision

Gemini 2.0 Flash-Lite Vision is a multimodal model developed by Google, designed to process both visual and textual inputs. It belongs to the Gemini 2.0 Flash family and is positioned as the fastest and most cost-efficient option within that lineup. The model supports a context window of over one million tokens, making it suitable for tasks that require processing large amounts of information in a single request. It was trained on data up to June 2024. This model is intended as an upgrade path for users of Gemini 1.5 Flash who want improved output quality without changes to cost or latency. Its vision capabilities allow it to handle image understanding tasks alongside text-based workflows. The combination of speed, large context support, and multimodal input handling makes it well-suited for applications such as document analysis, image captioning, and high-throughput pipelines where cost efficiency is a priority.

Feb 25, 2025 1,048,576 context 8,192 tokens output

Vision Understanding Large Context Window Multimodal Input High-Speed Inference Text Generation Document Analysis

Overview ↓ About ↓ Capabilities ↓ Pricing ↓ Price Comparison ↓ Parameters ↓ Benchmarks ↓ Compare ↓ Tools ↓ Daily ↓ Resources ↓ FAQ ↓

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

Google

Input Context Window

The number of tokens supported by the input context window.

1,048,576 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

8,192 tokens tokens

Open Source

Whether the model's code is available for public use.

Release Date

When the model was first released.

Feb 25, 2025 1 year ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

June 2024

API Providers

The providers that offer this model. This is not an exhaustive list.

Google, Vertex AI

Modalities

Types of data this model can process.

Text Image

What is Gemini 2.0 Flash-Lite Vision

A fuller summary of positioning, capabilities, and source-specific details for Gemini 2.0 Flash-Lite Vision.

Gemini 2.0 Flash-Lite Vision is a multimodal model developed by Google, designed to process both visual and textual inputs. It belongs to the Gemini 2.0 Flash family and is positioned as the fastest and most cost-efficient option within that lineup. The model supports a context window of over one million tokens, making it suitable for tasks that require processing large amounts of information in a single request. It was trained on data up to June 2024.

This model is intended as an upgrade path for users of Gemini 1.5 Flash who want improved output quality without changes to cost or latency. Its vision capabilities allow it to handle image understanding tasks alongside text-based workflows. The combination of speed, large context support, and multimodal input handling makes it well-suited for applications such as document analysis, image captioning, and high-throughput pipelines where cost efficiency is a priority.

Capabilities

What Gemini 2.0 Flash-Lite Vision supports

Vision Understanding

Processes and interprets image inputs alongside text, enabling tasks like image captioning, visual question answering, and scene description.

CTX

Large Context Window

Supports up to 1,048,576 tokens in a single context, allowing long documents, multi-image inputs, or extended conversations to be processed together.

Multimodal Input

Accepts combinations of text and image inputs in a single request, enabling workflows that mix visual and textual data.

High-Speed Inference

Optimized for low-latency responses, making it suitable for real-time or high-throughput production applications.

Text Generation

Generates coherent text responses based on visual and textual prompts, supporting summarization, Q&A, and content extraction tasks.

Document Analysis

Can process long-form documents or multi-page inputs within its million-token context window, extracting structured information or answering questions about content.

Pricing for Gemini 2.0 Flash-Lite Vision

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Input tokens $0.08 Per million tokens

Output tokens N/A Per million tokens

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

maxTemperature 2

maxResponseSize 8,192 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

Google Vertex AI

Configuration & Parameters

The configurable options currently documented for this model.

Temperature

Number

Default: 1 Range: 0 - 2 (step 0.1)

Max Response Tokens

Number

Default: 4096 Range: 1 - 8192 (step 1)

Supported Request Parameters

Parameters currently listed by OpenRouter or the local catalog for this model.

Temperature Max Response Tokens

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark	Score
AIME 2024 American math olympiad problems	27.7%
GPQA Diamond PhD-level science questions (biology, physics, chemistry)	53.5%
HLE Questions that challenge frontier models across many domains	3.6%
LiveCodeBench Real-world coding tasks from recent competitions	18.5%
MATH-500 Undergraduate and competition-level math problems	87.3%
MMLU-Pro Expert knowledge across 14 academic disciplines	72.4%
SciCode Scientific research coding and numerical methods	25.0%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Official Website Other

→

Documentation Documentation

→

Google AI Studio Playground

→

Gemini API Reference Documentation

→

Gemini 2.0 Flash Announcement Announcements

→