Multimodal Input
Accepts both text and image inputs in a single prompt, enabling tasks like visual question answering and image-based reasoning.
Llama 4 Maverick is a multimodal mixture-of-experts model developed by Meta, released in early 2025. It has 17 billion active parameters drawn from a pool of 400 billion total parameters across 128 experts, and supports both text and image inputs. The model handles 12 languages and offers a 130,000-token context window, making it suited for long-document and multilingual tasks. Maverick is designed for general assistant and chat use cases, with particular strengths in image understanding and creative writing. It uses a sparse MoE architecture, meaning only a subset of parameters are activated per inference pass, which allows the model to deliver broad capability at a more efficient compute cost. Developers building applications that require cross-language support, visual reasoning, or extended context handling are the primary target audience for this model.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The routed model identifier exposed by upstream providers.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for Llama 4 Maverick.
Llama 4 Maverick is a multimodal mixture-of-experts model developed by Meta, released in early 2025. It has 17 billion active parameters drawn from a pool of 400 billion total parameters across 128 experts, and supports both text and image inputs. The model handles 12 languages and offers a 130,000-token context window, making it suited for long-document and multilingual tasks.
Maverick is designed for general assistant and chat use cases, with particular strengths in image understanding and creative writing. It uses a sparse MoE architecture, meaning only a subset of parameters are activated per inference pass, which allows the model to deliver broad capability at a more efficient compute cost. Developers building applications that require cross-language support, visual reasoning, or extended context handling are the primary target audience for this model.
Accepts both text and image inputs in a single prompt, enabling tasks like visual question answering and image-based reasoning.
Supports up to 130,000 tokens of context, allowing processing of long documents, extended conversations, or large code files in a single request.
Handles 12 languages natively, enabling chat and assistant tasks across a range of international languages without translation preprocessing.
Uses 128 experts with 17 billion active parameters per forward pass out of 400 billion total, enabling broad capability with selective parameter activation.
Generates structured and open-ended written content with attention to tone, with Meta noting response quality and tone as explicit design focuses.
Tuned as an instruct model with built-in refusal mechanisms, designed to follow user instructions accurately while maintaining safety guardrails.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Additional usage-cost dimensions synced into the project for this model.
Places where this model is available, based on the synced detail-page metadata.
Endpoint-level provider data currently available for this model.
Benchmark scores synced from the current model source and normalized into the local catalog.
| Benchmark | Score |
|---|---|
|
AIME 2024
American math olympiad problems
|
|
|
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
|
|
|
HLE
Questions that challenge frontier models across many domains
|
|
|
LiveCodeBench
Real-world coding tasks from recent competitions
|
|
|
MATH-500
Undergraduate and competition-level math problems
|
|
|
MMLU-Pro
Expert knowledge across 14 academic disciplines
|
|
|
SciCode
Scientific research coding and numerical methods
|
Official model cards, release notes, docs, and other references synced from the source page.
Llama 4 Maverick discussions are most active in r/LocalLLaMA, r/singularity, r/AIToolsPerformance.
Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions. The strongest match in this snapshot has 3387 upvotes and 351 comments.
Hey guys!
I just wrapped up a follow-up demo where I got 45+ tokens per second out of Meta’s massive 400 billion-parameter, 128-expert Llama 4 Maverick, and I wanted to share the full setup in case it helps anyone else pushing these models locally. Here’s what made it possible:
CPU: Intel Engineering Sample QYFS (similar to Xeon Platinum 8480+ with 56 cores / 112 threads) with AMX acceleration
GPU: Single NVIDIA RTX 4090 (no dual-GPU hack needed!)
RAM: 512 GB DDR5 ECC
OS: Ubuntu 22.04 LTS
Environment: K-Transformers support-llama4 branch
Below is the link to video :
https://youtu.be/YZqUfGQzOtk
If you're interested in the hardware build:
https://youtu.be/r7gVGIwkZDc
Llama 4 Maverick supports a context window of 130,000 tokens, which allows it to process long documents, extended conversations, or large inputs in a single request.
The model has 400 billion total parameters across 128 experts, but only 17 billion parameters are active during any single inference pass due to its mixture-of-experts architecture.
Llama 4 Maverick supports 12 languages, making it suitable for multilingual assistant and chat applications.
The model is multimodal and accepts both text and image inputs, enabling use cases such as visual question answering and image-based reasoning alongside standard text tasks.
According to the available metadata, Llama 4 Maverick has a training date of early 2025. A precise knowledge cutoff date has not been publicly specified in the available documentation.
Continue browsing adjacent models from the same provider.