Llama 4 Maverick

Llama 4 Maverick is a multimodal mixture-of-experts model developed by Meta, released in early 2025. It has 17 billion active parameters drawn from a pool of 400 billion total parameters across 128 experts, and supports both text and image inputs. The model handles 12 languages and offers a 130,000-token context window, making it suited for long-document and multilingual tasks. Maverick is designed for general assistant and chat use cases, with particular strengths in image understanding and creative writing. It uses a sparse MoE architecture, meaning only a subset of parameters are activated per inference pass, which allows the model to deliver broad capability at a more efficient compute cost. Developers building applications that require cross-language support, visual reasoning, or extended context handling are the primary target audience for this model.

Apr 05, 2025 130,000 context 60,000 tokens output

Multimodal Input Long Context Window Multilingual Support Mixture-of-Experts Architecture Creative Writing Instruction Following

Overview ↓ About ↓ Capabilities ↓ Pricing ↓ Price Comparison ↓ Providers ↓ Benchmarks ↓ Tools ↓ Daily ↓ Resources ↓ Community ↓ FAQ ↓

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

Meta

Model ID

The routed model identifier exposed by upstream providers.

meta-llama/llama-4-maverick

Input Context Window

The number of tokens supported by the input context window.

130,000 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

60,000 tokens tokens

Open Source

Whether the model's code is available for public use.

Yes

Release Date

When the model was first released.

Apr 05, 2025 1 year ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

2024-08-31

API Providers

The providers that offer this model. This is not an exhaustive list.

DeepInfra, DigitalOcean, Novita, Parasail, Google

Modalities

Types of data this model can process.

Text Image

What is Llama 4 Maverick

A fuller summary of positioning, capabilities, and source-specific details for Llama 4 Maverick.

Llama 4 Maverick is a multimodal mixture-of-experts model developed by Meta, released in early 2025. It has 17 billion active parameters drawn from a pool of 400 billion total parameters across 128 experts, and supports both text and image inputs. The model handles 12 languages and offers a 130,000-token context window, making it suited for long-document and multilingual tasks.

Maverick is designed for general assistant and chat use cases, with particular strengths in image understanding and creative writing. It uses a sparse MoE architecture, meaning only a subset of parameters are activated per inference pass, which allows the model to deliver broad capability at a more efficient compute cost. Developers building applications that require cross-language support, visual reasoning, or extended context handling are the primary target audience for this model.

Capabilities

What Llama 4 Maverick supports

Multimodal Input

Accepts both text and image inputs in a single prompt, enabling tasks like visual question answering and image-based reasoning.

CTX

Long Context Window

Supports up to 130,000 tokens of context, allowing processing of long documents, extended conversations, or large code files in a single request.

Multilingual Support

Handles 12 languages natively, enabling chat and assistant tasks across a range of international languages without translation preprocessing.

Mixture-of-Experts Architecture

Uses 128 experts with 17 billion active parameters per forward pass out of 400 billion total, enabling broad capability with selective parameter activation.

Creative Writing

Generates structured and open-ended written content with attention to tone, with Meta noting response quality and tone as explicit design focuses.

Instruction Following

Tuned as an instruct model with built-in refusal mechanisms, designed to follow user instructions accurately while maintaining safety guardrails.

Pricing for Llama 4 Maverick

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Input tokens $0.20 Per million tokens

Output tokens $0.60 Per million tokens

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

maxTemperature 1

maxResponseSize 60,000 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

DeepInfra DigitalOcean Novita Parasail Google

Provider Endpoints

Endpoint-level provider data currently available for this model.

DeepInfra

Max output: 16,384 1d uptime: 99.8% Supported params: 13 Implicit caching: No

DigitalOcean

1d uptime: 99.6% Supported params: 9 Implicit caching: No

Novita

Max output: 8,192 1d uptime: 99.8% Supported params: 11 Implicit caching: No

Parasail

Max output: 32,768 1d uptime: 99.9% Supported params: 16 Implicit caching: No

Google

Max output: 8,192 1d uptime: 99.9% Supported params: 12 Implicit caching: No

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark	Score
AIME 2024 American math olympiad problems	39.0%
GPQA Diamond PhD-level science questions (biology, physics, chemistry)	67.1%
HLE Questions that challenge frontier models across many domains	4.8%
LiveCodeBench Real-world coding tasks from recent competitions	39.7%
MATH-500 Undergraduate and competition-level math problems	88.9%
MMLU-Pro Expert knowledge across 14 academic disciplines	80.9%
SciCode Scientific research coding and numerical methods	33.1%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Product Announcement Announcements

→

Documentation Documentation

→

Meta Llama 4 Blog Post Announcements

→

Llama 4 Maverick on Hugging Face Open Source

→

Meta Llama GitHub Open Source

→

Official Website

→

Technical Specifications

→

Research Paper

→

Responsible Use Guide

→

Usage License

→

OpenRouter Model Page OpenRouter

→

AI tools related to Llama 4 Maverick

These tools are strongly connected to Llama 4 Maverick through direct product references, provider mentions, or explicit model mappings.

Large Language Models (LLMs)

O.Translator

O.Translator is an AI-powered online translation platform designed to translate documents while maintaining their original formatting. It supports a wide range of file types, including PDF, DOCX, XLSX, PPTX, and EPUB. The service provides high-accuracy AI translations, easy editing tools, free previews, cost-effective pricing, data privacy, and team-based translation features.

Free 0 visits 14 saves

AI Assistant

Viinyx AI

Viinyx AI is an all-in-one browser extension that provides access to multiple AI models, including ChatGPT, Claude, Meta AI, and Gemini, directly on any website. Key features include page and video summarization, multi-PDF chat, chat history, AI writing assistance, and image generation. The extension operates within your browser session and supports Bring Your Own Key (BYOK) functionality for upgraded accounts.

Free 0 visits

AI Marketing

Hashmeta AI

Hashmeta AI is a Singapore-based AI agency focused on AI transformation and marketing. By integrating marketing expertise with AI agents, they assist businesses in achieving significant growth. Their service offerings include AI-powered SEO writing, lead response, and customer engagement, designed to provide high-level agency results at a more accessible price point. The team plans, builds, and executes tailored AI-driven marketing campaigns to ensure quality, speed, and scalability.

Free 27 visits 9 saves

AI Image Generator

Imagine with Meta AI

Imagine with Meta AI is a standalone tool that enables creative hobbyists to generate images using Emu, Meta's image foundation model. Users provide text descriptions, and the AI generates corresponding images. Please note that these AI-generated images may occasionally be inaccurate or inappropriate.

Free 0 visits 3 saves

Related Daily Briefs

Recent daily stories tied to Llama 4 Maverick through direct model mentions or provider-level coverage.

Frontier Models

Hugging Face Open-Weight Push Lands as US Rules Loom and Kimi K3 Trails Cyber Tests

Hugging Face and Cognition move deeper into real workflows.

2026-07-24 AI Models AI API

Frontier Models

OpenAI launches Across ChatGPT; OpenAI launches GPT-5; OpenAI agent update lands

Anthropic and OpenAI move deeper into real workflows.

2026-07-09 AI Models AI API

Frontier Models

OpenAI, Meta, and MiniMax Signal a Broader Shift Around Meta Model API

Pika and OpenAI move deeper into real workflows.

2026-07-09 AI Models AI API

Frontier Models

OpenAI, Meta, and Google DeepMind Signal a Broader Shift Around Launches ChatGPT

OpenAI and Meta move deeper into real workflows.

2026-07-09 AI Models AI API

Community discussion

What people think about Llama 4 Maverick

Llama 4 Maverick discussions are most active in r/LocalLLaMA, r/singularity, r/AIToolsPerformance.

Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions. The strongest match in this snapshot has 3387 upvotes and 351 comments.

r/LocalLLaMA 445 upvotes 71 comments April 28, 2025

Qwen 3 MoE making Llama 4 Maverick obsolete... 😱

Open Reddit thread

r/LocalLLaMA 231 upvotes 111 comments April 6, 2025

Llama 4 Maverick surpassing Claude 3.7 Sonnet, under DeepSeek V3.1 according to Artificial Analysis

Open Reddit thread

r/LocalLLaMA 232 upvotes 105 comments April 23, 2025

Llama 4 Maverick Locally at 45 tk/s on a Single RTX 4090 - I finally got it working!

Hey guys!

I just wrapped up a follow-up demo where I got 45+ tokens per second out of Meta’s massive 400 billion-parameter, 128-expert Llama 4 Maverick, and I wanted to share the full setup in case it helps anyone else pushing these models locally. Here’s what made it possible:
CPU: Intel Engineering Sample QYFS (similar to Xeon Platinum 8480+ with 56 cores / 112 threads) with AMX acceleration

GPU: Single NVIDIA RTX 4090 (no dual-GPU hack needed!)
RAM: 512 GB DDR5 ECC
OS: Ubuntu 22.04 LTS

Environment: K-Transformers support-llama4 branch

Below is the link to video :
https://youtu.be/YZqUfGQzOtk

If you're interested in the hardware build:
https://youtu.be/r7gVGIwkZDc

Open Reddit thread

r/LocalLLaMA 311 upvotes 89 comments April 6, 2025

Llama 4 Maverick scored 16% on the aider polyglot coding benchmark.

Open Reddit thread

r/LocalLLaMA 361 upvotes 79 comments April 6, 2025

First results are in. Llama 4 Maverick 17B active / 400B total is blazing fast with MLX on an M3 Ultra - 4-bit model generating 1100 tokens at 50 tok/sec:

Open Reddit thread

View more discussions →

FAQ

Common questions about Llama 4 Maverick

What is the context window for Llama 4 Maverick?

Llama 4 Maverick supports a context window of 130,000 tokens, which allows it to process long documents, extended conversations, or large inputs in a single request.

How many parameters does Llama 4 Maverick have?

The model has 400 billion total parameters across 128 experts, but only 17 billion parameters are active during any single inference pass due to its mixture-of-experts architecture.

What languages does Llama 4 Maverick support?

Llama 4 Maverick supports 12 languages, making it suitable for multilingual assistant and chat applications.

What types of inputs does Llama 4 Maverick accept?

The model is multimodal and accepts both text and image inputs, enabling use cases such as visual question answering and image-based reasoning alongside standard text tasks.

When was Llama 4 Maverick trained?

According to the available metadata, Llama 4 Maverick has a training date of early 2025. A precise knowledge cutoff date has not been publicly specified in the available documentation.

More models from Meta

Continue browsing adjacent models from the same provider.

← All AI Models