Meta

Llama 4 Maverick

Llama 4 Maverick is a multimodal mixture-of-experts model developed by Meta, released in early 2025. It has 17 billion active parameters drawn from a pool of 400 billion total parameters across 128 experts, and supports both text and image inputs. The model handles 12 languages and offers a 130,000-token context window, making it suited for long-document and multilingual tasks. Maverick is designed for general assistant and chat use cases, with particular strengths in image understanding and creative writing. It uses a sparse MoE architecture, meaning only a subset of parameters are activated per inference pass, which allows the model to deliver broad capability at a more efficient compute cost. Developers building applications that require cross-language support, visual reasoning, or extended context handling are the primary target audience for this model.

Apr 05, 2025 130,000 context 60,000 tokens output
Multimodal Input Long Context Window Multilingual Support Mixture-of-Experts Architecture Creative Writing Instruction Following

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

Meta

Model ID

The routed model identifier exposed by upstream providers.

meta-llama/llama-4-maverick

Input Context Window

The number of tokens supported by the input context window.

130,000 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

60,000 tokens tokens

Open Source

Whether the model's code is available for public use.

Yes

Release Date

When the model was first released.

Apr 05, 2025 1 year ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

2024-08-31

API Providers

The providers that offer this model. This is not an exhaustive list.

DeepInfra, Novita, Parasail, Google

Modalities

Types of data this model can process.

Text Image

What is Llama 4 Maverick

A fuller summary of positioning, capabilities, and source-specific details for Llama 4 Maverick.

Llama 4 Maverick is a multimodal mixture-of-experts model developed by Meta, released in early 2025. It has 17 billion active parameters drawn from a pool of 400 billion total parameters across 128 experts, and supports both text and image inputs. The model handles 12 languages and offers a 130,000-token context window, making it suited for long-document and multilingual tasks.

Maverick is designed for general assistant and chat use cases, with particular strengths in image understanding and creative writing. It uses a sparse MoE architecture, meaning only a subset of parameters are activated per inference pass, which allows the model to deliver broad capability at a more efficient compute cost. Developers building applications that require cross-language support, visual reasoning, or extended context handling are the primary target audience for this model.

Capabilities

What Llama 4 Maverick supports

MM

Multimodal Input

Accepts both text and image inputs in a single prompt, enabling tasks like visual question answering and image-based reasoning.

CTX

Long Context Window

Supports up to 130,000 tokens of context, allowing processing of long documents, extended conversations, or large code files in a single request.

AI

Multilingual Support

Handles 12 languages natively, enabling chat and assistant tasks across a range of international languages without translation preprocessing.

AI

Mixture-of-Experts Architecture

Uses 128 experts with 17 billion active parameters per forward pass out of 400 billion total, enabling broad capability with selective parameter activation.

AI

Creative Writing

Generates structured and open-ended written content with attention to tone, with Meta noting response quality and tone as explicit design focuses.

AI

Instruction Following

Tuned as an instruct model with built-in refusal mechanisms, designed to follow user instructions accurately while maintaining safety guardrails.

Pricing for Llama 4 Maverick

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

maxTemperature 1
maxResponseSize 60,000 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

DeepInfra Novita Parasail Google

Provider Endpoints

Endpoint-level provider data currently available for this model.

DeepInfra

Max output: 16,384 1d uptime: 99.8% Supported params: 13 Implicit caching: No

Novita

Max output: 8,192 1d uptime: 99.7% Supported params: 11 Implicit caching: No

Parasail

Max output: 32,768 1d uptime: 100.0% Supported params: 12 Implicit caching: No

Google

Max output: 8,192 1d uptime: 99.9% Supported params: 12 Implicit caching: No

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark Score
AIME 2024
American math olympiad problems
39.0%
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
67.1%
HLE
Questions that challenge frontier models across many domains
4.8%
LiveCodeBench
Real-world coding tasks from recent competitions
39.7%
MATH-500
Undergraduate and competition-level math problems
88.9%
MMLU-Pro
Expert knowledge across 14 academic disciplines
80.9%
SciCode
Scientific research coding and numerical methods
33.1%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Community discussion

What people think about Llama 4 Maverick

Llama 4 Maverick discussions are most active in r/LocalLLaMA, r/singularity, r/AIToolsPerformance.

Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions. The strongest match in this snapshot has 3387 upvotes and 351 comments.

Hey guys!

I just wrapped up a follow-up demo where I got 45+ tokens per second out of Meta’s massive 400 billion-parameter, 128-expert Llama 4 Maverick, and I wanted to share the full setup in case it helps anyone else pushing these models locally. Here’s what made it possible:
CPU: Intel Engineering Sample QYFS (similar to Xeon Platinum 8480+ with 56 cores / 112 threads) with AMX acceleration

GPU: Single NVIDIA RTX 4090 (no dual-GPU hack needed!)
RAM: 512 GB DDR5 ECC
OS: Ubuntu 22.04 LTS

Environment: K-Transformers support-llama4 branch

Below is the link to video :
https://youtu.be/YZqUfGQzOtk

If you're interested in the hardware build:
https://youtu.be/r7gVGIwkZDc

Open Reddit thread
View more discussions →
FAQ

Common questions about Llama 4 Maverick

What is the context window for Llama 4 Maverick?

Llama 4 Maverick supports a context window of 130,000 tokens, which allows it to process long documents, extended conversations, or large inputs in a single request.

How many parameters does Llama 4 Maverick have?

The model has 400 billion total parameters across 128 experts, but only 17 billion parameters are active during any single inference pass due to its mixture-of-experts architecture.

What languages does Llama 4 Maverick support?

Llama 4 Maverick supports 12 languages, making it suitable for multilingual assistant and chat applications.

What types of inputs does Llama 4 Maverick accept?

The model is multimodal and accepts both text and image inputs, enabling use cases such as visual question answering and image-based reasoning alongside standard text tasks.

When was Llama 4 Maverick trained?

According to the available metadata, Llama 4 Maverick has a training date of early 2025. A precise knowledge cutoff date has not been publicly specified in the available documentation.

More models from Meta

Continue browsing adjacent models from the same provider.

← All AI Models