X.ai

Grok 2 Vision

Grok 2 Vision (grok-2-vision-1212) is a multimodal language model developed by xAI and released in December 2024. It accepts combined image and text inputs and is designed to understand, analyze, and respond to visual content alongside natural language. The model supports images up to 20MiB in JPG, JPEG, or PNG format and can process inputs in any order. It also includes multilingual support and improved instruction-following compared to earlier Grok vision releases. Grok 2 Vision is suited for production use cases that require visual comprehension, such as image captioning, visual question answering, chart and document analysis, and building AI assistants that respond to visual inputs. It supports tool calling and structured outputs, making it straightforward to integrate into developer workflows. With a 32,768-token context window, it can handle moderately long conversations that mix text and image content.

Unknown 32,768 context 1M output

Image Understanding Multimodal Input Multilingual Support Instruction Following Tool Calling Structured Outputs

Overview ↓ About ↓ Capabilities ↓ Pricing ↓ Price Comparison ↓ Benchmarks ↓ Tools ↓ Resources ↓ FAQ ↓

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

X.ai

Input Context Window

The number of tokens supported by the input context window.

32,768 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

1M tokens

Open Source

Whether the model's code is available for public use.

Release Date

When the model was first released.

Unknown

Knowledge Cut-off Date

When the model's knowledge was last updated.

Unknown

API Providers

The providers that offer this model. This is not an exhaustive list.

X.ai

Modalities

Types of data this model can process.

Text Image

What is Grok 2 Vision

A fuller summary of positioning, capabilities, and source-specific details for Grok 2 Vision.

Grok 2 Vision (grok-2-vision-1212) is a multimodal language model developed by xAI and released in December 2024. It accepts combined image and text inputs and is designed to understand, analyze, and respond to visual content alongside natural language. The model supports images up to 20MiB in JPG, JPEG, or PNG format and can process inputs in any order. It also includes multilingual support and improved instruction-following compared to earlier Grok vision releases.

Grok 2 Vision is suited for production use cases that require visual comprehension, such as image captioning, visual question answering, chart and document analysis, and building AI assistants that respond to visual inputs. It supports tool calling and structured outputs, making it straightforward to integrate into developer workflows. With a 32,768-token context window, it can handle moderately long conversations that mix text and image content.

Capabilities

What Grok 2 Vision supports

IMG

Image Understanding

Analyzes image content including objects, styles, charts, and documents. Accepts JPG, JPEG, or PNG files up to 20MiB per image.

Multimodal Input

Accepts interleaved text and image inputs in any order within a single request, enabling flexible prompt construction.

Multilingual Support

Processes and generates responses in multiple languages, making it usable for internationally facing applications.

Instruction Following

Follows complex and nuanced prompts with improved steerability introduced in the December 2024 release.

Tool Calling

Supports function calling so developers can connect the model to external tools and APIs within their pipelines.

JSON

Structured Outputs

Returns structured data formats and supports temperature control for predictable, integration-ready responses.

Visual Question Answering

Answers natural language questions about image content, including charts, diagrams, and scanned documents.

CTX

Long Context Window

Supports up to 32,768 tokens per request, accommodating extended conversations that mix text and image inputs.

Pricing for Grok 2 Vision

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Input tokens $2.00 Per million tokens

Output tokens $2.50 Per million tokens

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

maxTemperature 1

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

X.ai

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark	Score
AIME 2024 American math olympiad problems	13.3%
GPQA Diamond PhD-level science questions (biology, physics, chemistry)	51.0%
HLE Questions that challenge frontier models across many domains	3.8%
LiveCodeBench Real-world coding tasks from recent competitions	26.7%
MATH-500 Undergraduate and competition-level math problems	77.8%
MMLU-Pro Expert knowledge across 14 academic disciplines	70.9%
SciCode Scientific research coding and numerical methods	28.5%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Official Documentation Documentation

→

Announcement Blog Post Announcements

→

OpenRouter Model Page Other

→

xAI API Playground Playground

→

AI tools related to Grok 2 Vision

These tools are strongly connected to Grok 2 Vision through direct product references, provider mentions, or explicit model mappings.

AI Assistant

XX.AI

XX.AI is a desktop-based AI writing assistant designed to enhance your productivity and communication. Powered by advanced models including GPT-4o, Claude 3, and DALL-E 3, it offers a desktop-integrated alternative to web-based services. Access 15 leading AI models—such as Gemini, Claude, GPT, and Perplexity—within a single, free software application.

Free 34 visits 1 saves

AI Assistant

Grok

Grok is a free AI assistant developed by xAI, engineered to prioritize truth and objectivity. It provides features including real-time search, image generation, and trend analysis.

Free 279 visits 27 saves

AI Sales

Opnbx-ai

Opnbx-ai is a generative AI tool built by sales professionals to personalize cold emails and improve outreach effectiveness. It streamlines communication by creating prospect-centric emails and introductory lines designed to increase engagement.

Free 0 visits 3 saves

AI Assistant

HIX.AI

HIX.AI is an all-in-one AI writing assistant designed to generate high-quality copy for ads, emails, blogs, and other formats in seconds. It provides a suite of over 120 AI writing tools, including HIX AI Writer, HIX Chat, HIX Editor, a long-form article writer, an email generator, and a browser extension to support various content creation needs.

Free 1 visits 116 saves

FAQ

Common questions about Grok 2 Vision

What is the context window for Grok 2 Vision?

Grok 2 Vision supports a context window of 32,768 tokens per request.

What image formats does Grok 2 Vision accept?

The model accepts JPG, JPEG, and PNG image formats, with a maximum file size of 20MiB per image.

When was Grok 2 Vision released and what is its training cutoff?

Grok 2 Vision was released in December 2024, with a training date listed as December 2024.

Does Grok 2 Vision support tool calling?

Yes, Grok 2 Vision supports function calling and structured outputs, allowing integration with external tools and APIs.

Who publishes Grok 2 Vision and where can I access it via API?

Grok 2 Vision is published by xAI (the AI division of X). It is accessible through the xAI API and is also listed on OpenRouter under the model ID grok-2-vision-1212.

More models from X.ai

Continue browsing adjacent models from the same provider.

← All AI Models