LLM Model Directory

Explore frontier AI models by provider, pricing, and context

Browse the synced model catalog by provider, release, pricing, and core capabilities.

130 models 14 providers in view Current filter: All providers Type: Text
O

OpenAI

30 models

Text

GPT 5.5

Apr 24, 2026

GPT-5.5 is OpenAI’s frontier model designed for complex professional workloads, building on GPT-5.4 with stronger reasoning, higher reliability, and improved token efficiency on hard tasks. It features a 1M+ token...

Text Image File
Context: 1050K Output: 128,000 tokens
Input: $5.00 Output: $30.00
View model →
Text

GPT 5.4

Mar 05, 2026

GPT-5.4 is a text generation model developed by OpenAI, released in March 2026 as their flagship model for professional and enterprise use. It is available in three variants — standard, Thinking, and Pro — and features a context window of 1 million tokens, the largest OpenAI has offered. The model is designed not only to plan complex tasks but to complete them reliably, with built-in computer use capabilities for orchestrating multi-step agentic workflows. GPT-5.4 is best suited for enterprise teams running AI in production environments, including customer support automation, document drafting, data analysis, and developer workflows. It recorded an 83% score on GDPval for knowledge work tasks and ranked second out of 116 models on the Artificial Analysis Intelligence Index. The Pro variant adds multi-path reasoning evaluation for scenarios where analytical depth is prioritized over speed, such as scientific research and complex decision-making.

Text Image File
Context: 1050K Output: 128,000 tokens
Input: $2.50 Output: $15.00
View model →
Text

GPT 5.4 Pro

Mar 05, 2026

GPT-5.4 Pro is a text generation model developed by OpenAI, released in March 2026 as part of the GPT-5.4 family. It is one of three variants in that family — alongside the standard GPT-5.4 and GPT-5.4 Thinking — and is specifically optimized for deep analytical work through multi-path reasoning evaluation. The model supports a context window of 1 million tokens, the largest OpenAI has offered, enabling it to process extensive documents, codebases, and multi-step workflows within a single session. GPT-5.4 Pro is designed for professional and enterprise use cases where thoroughness takes priority over speed, including scientific research, complex decision-making, legal analysis, and financial modeling. The broader GPT-5.4 family includes built-in computer use capabilities for agentic workflows, produces 33% fewer factual errors than GPT-5.2, and ranked #2 out of 116 models on the Artificial Analysis Intelligence Index. It also recorded benchmark scores on OSWorld-Verified, WebArena Verified, and an 83% score on GDPval for knowledge work tasks.

Text Image File
Context: 1050K Output: 128,000 tokens
Input: $30.00 Output: $180.00
View model →
Text

GPT‑5.2 Pro

Dec 10, 2025

GPT-5.2 Pro is a text generation model developed by OpenAI, added to MindStudio in December 2025. It supports a 400,000-token context window and is trained on data through December 2025, making it OpenAI's most recent flagship release. The model is tagged for reasoning, tool use, and MCP (Model Context Protocol) support, reflecting its design for complex, multi-step tasks. GPT-5.2 Pro is built for professional knowledge work across a wide range of domains. According to OpenAI, it was evaluated on GDPval, a benchmark spanning 44 occupations, where it performed at or above the level of industry professionals on well-specified tasks. It is best suited for workflows that require deep reasoning, tool integration, and handling large documents or long-context inputs.

Text Image File
Context: 400,000 Output: 256,000 tokens
Input: $21.00 Output: $168.00
View model →
Text

GPT-5.1

Nov 13, 2025

GPT-5.1 is a text generation model developed by OpenAI, positioned as the flagship option for coding and agentic workflows. It supports a 400,000-token context window and features configurable reasoning effort, allowing users to toggle between reasoning and non-reasoning modes depending on the task at hand. Its training data extends through November 2025. The model is designed with tool use and agent orchestration in mind, accepting inputs that include tool definitions and MCP server configurations alongside standard text prompts. This makes it well-suited for multi-step tasks, automated pipelines, and code generation scenarios where structured decision-making and external integrations are required.

Text Image File
Context: 400,000 Output: 128,000 tokens
Input: $1.25 Output: $10.00
View model →
Text

GPT-5

Aug 07, 2025

GPT-5 is OpenAI's flagship text generation model, designed with a focus on coding, reasoning, and agentic tasks across a wide range of domains. It supports a 400,000-token context window and has a training data cutoff of September 2024. The model is tagged for reasoning, tool use, and MCP (Model Context Protocol) support, reflecting its orientation toward complex, multi-step workflows. GPT-5 is best suited for developers and teams building agentic applications, automated pipelines, and code-heavy workflows. It accepts tool definitions and MCP server configurations as inputs, making it well-suited for orchestration scenarios where the model needs to call external functions or services. It is available via the OpenAI API and accessible on MindStudio without requiring separate API key management.

Text Image File
Context: 400,000 Output: 128,000 tokens
Input: $1.25 Output: $10.00
View model →
Text

GPT-5 Chat

Aug 07, 2025

GPT-5 Chat is a text generation model developed by OpenAI and serves as the snapshot of GPT-5 currently deployed in ChatGPT. It has a 400,000-token context window and a training data cutoff of September 2024. The model supports tool use and MCP (Model Context Protocol) servers as input types, making it suitable for agentic workflows and integrations. GPT-5 Chat is designed for high-intelligence tasks that benefit from a large context window, such as long-document analysis, multi-step reasoning, and complex instruction following. Its support for tools and MCP servers means it can be connected to external services and data sources within automated pipelines. Developers accessing it via the API receive the same model version that powers the ChatGPT interface, keeping behavior consistent across both surfaces.

Text Image File
Context: 400,000 Output: 16,384 tokens
Input: $1.25 Output: $10.00
View model →
Text

GPT-5 mini

Aug 07, 2025

GPT-5 mini is a text generation model developed by OpenAI, designed as a faster and more cost-efficient variant of GPT-5. It supports a 400,000-token context window and has a training data cutoff of May 2024. The model is tagged as a latest release and supports tool use and MCP (Model Context Protocol) server integrations. GPT-5 mini is best suited for well-defined tasks where precise prompting is used and response speed or cost efficiency is a priority. It accepts structured inputs including tool calls and MCP server configurations, making it a practical choice for agentic workflows and automation pipelines. Developers working on tasks with clear, bounded requirements are the primary intended audience for this model.

Text Image File
Context: 400,000 Output: 128,000 tokens
Input: $0.25 Output: $2.00
View model →
Text

GPT-5 nano

Aug 07, 2025

GPT-5 Nano is a text generation model developed by OpenAI and released as part of the GPT-5 model family. It is designed to be the fastest and most cost-efficient variant in that family, making it accessible for high-volume or latency-sensitive applications. The model supports a 400,000-token context window and has a training data cutoff of May 2024. It accepts structured inputs including tool calls and MCP server configurations. GPT-5 Nano is particularly well-suited for summarization and classification tasks, where speed and throughput matter more than extended reasoning depth. Its large context window allows it to process long documents in a single pass, which is useful for document triage, content labeling, and similar workflows. Developers can integrate it with external tools and MCP servers, extending its utility beyond pure text generation into agentic and multi-step task scenarios.

Text Image File
Context: 400,000 Output: 128,000 tokens
Input: $0.05 Output: $0.40
View model →
Text

GPT OSS 120B

Aug 05, 2025

GPT OSS 120B is OpenAI's largest open-weight model, released in August 2025 under the Apache 2.0 license. It has approximately 116.8 billion total parameters and uses a Mixture-of-Experts (MoE) architecture that activates only around 5.1 billion parameters per token, enabling efficient inference on a single H100 GPU. The model is part of the GPT OSS family and is designed for commercial and private deployments without licensing restrictions. The model is built for coding, mathematical reasoning, scientific analysis, and agentic workflows. It supports a 128,000-token context window, adjustable reasoning levels (low, medium, and high), and native tool use including web browsing, Python code execution, and custom developer-defined functions. Architecturally, it uses 36 transformer layers with 128 experts per MoE layer (top 4 active per token), Grouped Query Attention, Rotary Position Embeddings, and an alternating local/dense attention pattern, and it is available for local inference via Hugging Face Transformers, llama.cpp, and vLLM.

Text Tools Structured Output
Context: 131.1K Output: 32,768 tokens
Input: $0.15 Output: $0.00
View model →
Text

GPT OSS 20B

Aug 05, 2025

GPT OSS 20B is an open-weight text generation model released by OpenAI in August 2025, representing the company's first open-weight release since GPT-2 in 2019. It uses a Mixture-of-Experts (MoE) architecture with 21 billion total parameters, activating approximately 3.6 billion parameters per token across 4 of 32 experts in 24 layers. Combined with MXFP4 4-bit quantization, the model runs within 16GB of memory, making it suitable for consumer hardware and on-device deployment. It is licensed under Apache 2.0, allowing local hosting, firewall-protected deployment, and fine-tuning for custom use cases. GPT OSS 20B supports a 128,000-token context window and includes adjustable reasoning levels — low, medium, and high — with chain-of-thought traces. Its documented strengths include coding, mathematical reasoning, and scientific analysis, along with tool use and agentic workflow support. The model also produces structured outputs for predictable, schema-conforming responses. It is available through Hugging Face, Amazon SageMaker, Amazon Bedrock, and NVIDIA NIM, and is well-suited for developers and organizations that require a self-hosted, customizable AI model without relying on cloud infrastructure.

Text Tools Structured Output
Context: 128,000 Output: 32,768 tokens
Input: $0.10 Output: $0.00
View model →
Text

o3-pro

Jun 10, 2025

o3-pro is a text generation model developed by OpenAI, released on June 10, 2025. It is built around a reasoning-first architecture that performs iterative self-reflection before producing a response, simulating multiple solution paths and evaluating potential flaws rather than generating a single-pass output. The model accepts both text and image inputs and supports a 200,000-token context window. It also includes autonomous tool use, allowing it to independently invoke capabilities like Python execution, file analysis, and web retrieval. o3-pro is designed for tasks that require sustained, multi-step reasoning — including mathematics, software engineering, scientific research, and legal analysis. It supports structured outputs and function calling, making it suitable for integration into developer pipelines and agentic workflows. Access to the model via API requires identity verification (KYC) from OpenAI. It is best suited for developers, researchers, and enterprises that need reliable, deeply reasoned outputs on complex problems.

Text Image File
Context: 200,000 Output: 100,000 tokens
Input: $2.00 Output: $80.00
View model →
Text

o3

Apr 16, 2025

OpenAI o3 is the flagship model in OpenAI's o-series of reasoning models, released in April 2025. It is designed to spend more time thinking through problems before responding, using large-scale reinforcement learning to work through complex, multi-step tasks. The model supports a 200,000-token context window and can process both text and images as inputs. According to OpenAI, o3 makes 20% fewer major errors than its predecessor on difficult real-world tasks, with particular strength in programming, business consulting, and creative ideation. A notable feature of o3 is its ability to integrate images directly into its reasoning process — not just interpreting them, but actively using them as part of problem-solving, including handling blurry, reversed, or low-quality visuals. The model can also autonomously combine tools such as web search, Python-based data analysis, and image generation to address multi-faceted questions. It is best suited for users who need rigorous analytical reasoning across domains like biology, mathematics, engineering, and software development, particularly when tasks require combining visual and textual information.

Text Image File
Context: 200K Output: 100,000 tokens
Input: $2.00 Output: $8.00
View model →
Text

o4-mini

Apr 16, 2025

o4-mini is a compact text generation model developed by OpenAI and released in April 2025 alongside the larger o3 model. It uses a chain-of-thought reasoning approach, thinking through problems step by step before producing a response, which makes it well-suited for structured problem-solving in math, coding, science, and visual tasks. The model supports a 200,000-token context window, allowing it to process and analyze lengthy documents in a single session. What distinguishes o4-mini from earlier reasoning models is its native ability to incorporate images directly into its reasoning process — not just interpreting them, but actively using them as part of its chain of thought, including handling low-quality or rotated images. It is also trained for agentic tool use, meaning it can decide when to invoke tools like web search, Python execution, or file analysis to complete multi-step tasks. Its design prioritizes high throughput, making it a practical choice for developers and applications that require large volumes of reasoning-intensive requests.

Text Image File
Context: 200,000 Output: 100,000 tokens
Input: $1.10 Output: $4.40
View model →
Text

GPT-4.1

Apr 14, 2025

GPT-4.1 is a text generation model developed by OpenAI and released in April 2025. It is positioned as OpenAI's flagship model for handling complex, multi-domain tasks and is available to developers via the OpenAI API. The model supports a 200,000-token context window, enabling it to process and reason over long documents, codebases, and extended conversations in a single request. Its training data has a knowledge cutoff of May 31, 2024. GPT-4.1 is designed for problem solving across a wide range of domains, including coding, analysis, instruction following, and structured output generation. It is an API-only model, meaning it is accessible through the OpenAI platform rather than through ChatGPT's consumer interface. Developers building agents, pipelines, or applications that require handling large amounts of context or complex multi-step instructions are the primary intended audience for this model.

Text Image File
Context: 200,000 Output: 32,768 tokens
Input: $2.00 Output: $8.00
View model →
Text

GPT-4.1 Mini

Apr 14, 2025

GPT-4.1 Mini is a text generation model developed by OpenAI, released as part of the GPT-4.1 model family in April 2025. It is designed to occupy a middle ground between the full GPT-4.1 model and lighter-weight options, offering a context window of over one million tokens — specifically 1,047,576 tokens. The model has a training data cutoff of May 31, 2024, and is accessible via the OpenAI API. GPT-4.1 Mini is positioned for use cases where developers need a capable text generation model without the latency or cost profile of larger models. Its large context window makes it suitable for tasks involving long documents, extended conversations, or multi-step instructions. It fits well into applications that require a balance of response quality, throughput, and cost efficiency.

Text Image File
Context: 1,047,576 Output: 32,768 tokens
Input: $0.40 Output: $1.60
View model →
Text

GPT-4.1 Nano

Apr 14, 2025

GPT-4.1 Nano is a text generation model developed by OpenAI and released in April 2025. It is the smallest and most cost-efficient model in the GPT-4.1 family, designed for latency-sensitive and high-throughput applications. It supports a context window of over one million tokens (1,047,576 tokens), making it capable of processing very long documents or conversation histories in a single request. Its training data has a knowledge cutoff of May 31, 2024. GPT-4.1 Nano is best suited for tasks where speed and cost efficiency are priorities, such as classification, summarization, autocomplete, and lightweight instruction-following. Because it sits at the smaller end of the GPT-4.1 family, it trades some capability headroom for significantly lower latency and cost per token. Developers building applications that require frequent, rapid model calls — such as real-time assistants, tagging pipelines, or high-volume data processing — are the primary target audience for this model.

Text Image File
Context: 1,047,576 Output: 32,768 tokens
Input: $0.10 Output: $0.40
View model →
Text

o1-pro

Mar 19, 2025

o1-pro is a text generation model developed by OpenAI and released in December 2024. It is built on the same foundation as the o1 model family but allocates significantly more compute and longer reflection time per query, which allows it to work through multi-step problems more carefully before producing a response. It supports a 200,000-token context window and can generate up to 100,000 tokens in a single output, and it accepts both text and image inputs. The model is designed for tasks where accuracy on difficult problems takes priority over response speed. It performs well on advanced mathematics, scientific reasoning, and complex coding challenges, with benchmark scores including 94.8% on MATH, 92.4% on HumanEval, and 77.3% on GPQA. o1-pro was initially available exclusively through the ChatGPT Pro subscription plan before becoming accessible via the OpenAI API in March 2025.

Text Image File
Context: 200,000 Output: 100,000 tokens
Input: $150.00 Output: $600.00
View model →
Text

o3-mini

Jan 31, 2025

o3-mini is a text generation model developed by OpenAI and released in January 2025. It belongs to OpenAI's o-series, a family of models trained to reason through problems step by step before producing a response. The model is designed to balance reasoning quality with speed and cost efficiency, making it practical for high-volume deployments where deliberate thinking is needed without long wait times. o3-mini is particularly well-suited for tasks involving mathematical reasoning, programming challenges, and scientific questions. It operates with a 200,000-token context window, allowing it to process long documents, extended codebases, or multi-turn conversations in a single session. The model generates output at approximately 137 tokens per second and uses an internal reasoning process rather than responding immediately, which contributes to its accuracy on structured, logic-intensive tasks.

Text File Tools
Context: 200K Output: 100,000 tokens
Input: $1.10 Output: $4.40
View model →
Text

o1

Dec 17, 2024

OpenAI o1 is a large language model developed by OpenAI and trained using reinforcement learning to perform complex, multi-step reasoning. Unlike standard language models that respond immediately, o1 generates an internal chain of thought before producing its final answer, allowing it to work through difficult problems more systematically. It supports a 200,000-token context window, tool use, and Structured Outputs via the API. The model is designed for tasks in coding, mathematics, and science where careful reasoning is more important than broad general knowledge. It has demonstrated notable benchmark results, including ranking in the 89th percentile on Codeforces competitive programming questions, placing among the top 500 students in the US on the AIME math qualifier, and exceeding human PhD-level accuracy on the GPQA benchmark covering physics, biology, and chemistry. It is well-suited for developers and researchers who need a model that can handle technically demanding problems within a large context.

Text Image File
Context: 200,000 Output: 100,000 tokens
Input: $15.00 Output: $60.00
View model →
Text

GPT-4o Mini

Jul 18, 2024

GPT-4o Mini is a text generation model developed by OpenAI and released in July 2024. It is designed to deliver low-cost, low-latency responses across a wide range of tasks, making it suitable for applications that require fast throughput or high request volumes. The model supports a 128,000-token context window and is compatible with the same range of languages as GPT-4o. GPT-4o Mini is positioned for use cases such as real-time customer interactions, processing large volumes of context, and multimodal reasoning tasks. It performs on academic benchmarks across both textual intelligence and multimodal reasoning, outscoring GPT-3.5 Turbo and other small models in those evaluations. Its combination of speed and affordability makes it a practical choice for developers building cost-sensitive production applications.

Text Image File
Context: 128,000 Output: 16,383 tokens
Input: $0.15 Output: $0.60
View model →
Text

GPT-4o

May 13, 2024

GPT-4o is a multimodal language model developed by OpenAI, released in May 2024. The "o" stands for "omni," reflecting its ability to accept any combination of text, audio, and image as input and generate any combination of those same modalities as output. It has a 128,000-token context window and a training data cutoff of October 2023. One of GPT-4o's defining characteristics is its audio response latency, which can be as low as 232 milliseconds and averages around 320 milliseconds — comparable to human conversational response times. It is well-suited for applications requiring fast, multimodal interaction, such as voice assistants, image analysis pipelines, and multilingual text processing. OpenAI has noted it offers improved performance on non-English text compared to GPT-4 Turbo, while also being available at a lower API cost.

Text Image File
Context: 128,000 Output: 16,384 tokens
Input: $2.50 Output: $10.00
View model →
Text

GPT-4 Turbo

Apr 09, 2024

GPT-4 Turbo is a variant of OpenAI's GPT-4 model, released to provide faster response times while retaining the language understanding and generation capabilities of the base GPT-4. It supports a 128,000-token context window, allowing it to process and reason over long documents, extended conversations, or large blocks of text in a single request. The model has a training data cutoff of December 2023 and is available through OpenAI's API. GPT-4 Turbo is designed for use cases where both response quality and speed matter, such as interactive chatbots, real-time content generation, and applications that need to handle lengthy inputs. Its large context window makes it well-suited for tasks like document summarization, multi-turn dialogue, and code generation across large codebases. Developers building latency-sensitive applications often choose this variant over the base GPT-4 for its improved throughput.

Text Image Tools
Context: 128,000 Output: 4,096 tokens
Input: $10.00 Output: $30.00
View model →
Instruct

GPT-3.5 Instruct Deprecated

Sep 28, 2023

This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting chat-related optimizations. Training data: up to Sep 2021.

Text Structured Output
Context: 4.1K Output: 2,000 tokens
Input: $1.50 Output: $2.00
View model →
Text

GPT-3.5 Deprecated

May 28, 2023

GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Training data up to Sep 2021.

Text Tools Structured Output
Context: 16.4K Output: 4,000 tokens
Input: $0.50 Output: $1.50
View model →
Text

GPT-4

May 28, 2023

GPT-4 is a large language model developed by OpenAI and released in March 2023 as part of the Generative Pre-trained Transformer series. It accepts text input and produces text output, with a context window of 8,192 tokens, and its training data has a knowledge cutoff of April 2023. GPT-4 was designed to improve on earlier GPT models in areas such as instruction following, contextual understanding, and factual accuracy across a wide range of topics. GPT-4 is well suited for tasks that require sustained coherence over longer passages, such as drafting documents, answering detailed questions, summarizing content, and writing or reviewing code. It is available through the OpenAI API and has been integrated into products including ChatGPT and Microsoft Copilot. Developers and organizations commonly use it for applications that involve natural language understanding, content generation, and conversational interfaces.

Text Tools Structured Output
Context: 8,192 Output: 5,000 tokens
Input: $30.00 Output: $60.00
View model →
Instruct

GPT-3 Deprecated

Release date unavailable

Enhanced language understanding and generation for detailed, context-relevant responses.

Context: N/A Output: 2,500 tokens
Input: N/A Output: N/A
View model →
Text

GPT-4.5 Deprecated

Release date unavailable

Increased capacity and nuance compared to predecessors, offering more accurate text generation.

Text
Context: N/A Output: 8,000 tokens
Input: N/A Output: N/A
View model →
Text

o1-mini Deprecated

Release date unavailable

Faster, cheaper version of o1 adept at coding, math, and science tasks without extensive general knowledge.

Text
Context: 128,000 Output: 65,536 tokens
Input: $1.10 Output: $4.40
View model →
Text

o1-preview Deprecated

Release date unavailable

Early preview model using broad general knowledge to reason about hard problems.

Text
Context: 128,000 Output: 32,768 tokens
Input: $15.00 Output: $60.00
View model →
G

Google

15 models

Text

Gemini 1.0 Pro Deprecated

Apr 27, 2026

This model always redirects to the latest model in the Google Gemini Pro family.

Text Image File
Context: 1.0M Output: 2,048 tokens
Input: $2.00 Output: $12.00
View model →
Text

Gemini 3.1 Pro

Feb 19, 2026

Gemini 3.1 Pro is a frontier reasoning model developed by Google, released in February 2026 as a major upgrade to the Gemini 3 series. It supports multimodal inputs — including text, images, video, audio, and code — within a single model, and offers a context window of 1,048,576 tokens, equivalent to roughly 1,500 A4 pages. The model scores 77.1% on the ARC-AGI-2 benchmark and introduces a medium thinking level designed to balance cost, speed, and reasoning depth. Gemini 3.1 Pro is built for developers, enterprises, and researchers working on demanding, multi-step workflows. It is particularly suited to agentic coding, structured planning, financial modeling, multimodal analysis, and workflow automation. The model is accessible through the Gemini API, Google AI Studio, Vertex AI, Gemini CLI, Android Studio, and the Gemini app for Pro and Ultra subscribers.

Text Image File
Context: 1,048,576 Output: 65,536 tokens
Input: $2.00 Output: $12.00
View model →
Text

Gemini 3 Flash

Dec 17, 2025

Gemini 3 Flash is a text generation model developed by Google, released in December 2025 as part of the Gemini 3 family. It is designed to deliver near-frontier reasoning performance at lower latency than full-scale models, making it suitable for interactive and production-grade applications. The model accepts multimodal inputs including text, images, audio, video, and PDFs, and produces text output. A configurable reasoning system allows users to select thinking levels — minimal, low, medium, or high — to balance response speed against reasoning depth. The model supports a context window of up to 1,048,576 tokens, enabling it to process very long documents, codebases, and extended conversation histories in a single pass. It includes built-in support for tool use, structured output, and automatic context caching, which makes it well-suited for agentic workflows and multi-step pipelines. Developers working on coding assistants, automated agents, and multi-turn chat applications are the primary intended audience. It is available via the Gemini API and through third-party providers such as OpenRouter.

Text Image File
Context: 1,048,576 Output: 65,535 tokens
Input: $0.50 Output: $3.00
View model →
Text

Gemini 3 Deprecated

Nov 18, 2025

Gemini 3 Pro is a multimodal text generation model developed by Google, released in November 2025. It supports a context window of 1,048,576 tokens and is designed to handle complex reasoning tasks, nuanced instruction following, and agentic workflows. The model is available to developers through Google AI Studio and Vertex AI, and is also integrated into Google Search and the Gemini app. Gemini 3 Pro is built for tasks that require understanding context and intent with minimal prompting, including multi-step problem solving, code generation, and multimodal input processing. It is positioned as Google's primary model for agentic development, including use within the Google Antigravity platform. The model accepts tool inputs alongside text and numeric parameters, making it suited for applications that require dynamic tool use and structured interactions.

Text
Context: 1,048,576 Output: 65,536 tokens
Input: $2.00 Output: $12.00
View model →
Text

Gemini 2.5 Flash Lite

Jul 22, 2025

Gemini 2.5 Flash Lite is Google's most cost-efficient model in the Gemini 2.5 family, designed for high-volume, latency-sensitive workloads. It supports a 1 million-token context window and includes optional reasoning capabilities that can be toggled on or off via controllable thinking budgets, allowing developers to balance speed and depth depending on the task. The model also supports Grounding with Google Search, Code Execution, and URL Context as built-in features. Gemini 2.5 Flash Lite is well-suited for production applications that require processing large numbers of requests efficiently, such as document classification, real-time translation, content moderation, and coding assistance. Its multimodal input support and broad benchmark coverage across coding, math, science, and reasoning tasks make it a practical choice for developers building scalable AI pipelines where cost and throughput are primary constraints.

Text Image File
Context: 1.0M Output: 65,535 tokens
Input: $0.10 Output: $0.40
View model →
Text

Gemini 2.5 Flash

Jun 17, 2025

Gemini 2.5 Flash is a text generation model developed by Google, designed to balance performance and cost efficiency. It is a thinking model, meaning it applies internal reasoning steps before producing a response, which supports more deliberate outputs across a range of tasks. The model supports a context window of 1,048,576 tokens, making it suitable for processing long documents, extended conversations, and large codebases in a single request. Gemini 2.5 Flash is well-suited for tasks that require both speed and reasoning, such as summarization, question answering, tool use, and multi-step instruction following. It supports tool integrations, allowing it to be used in agentic workflows where external functions or APIs need to be called. The model reached general availability with a training data cutoff of June 2025, and is accessible through Google's Vertex AI platform.

Text Image File
Context: 1,048,576 Output: 65,535 tokens
Input: $0.30 Output: $2.50
View model →
Text

Gemini 2.5 Pro

Jun 17, 2025

Gemini 2.5 Pro is a thinking model developed by Google DeepMind, designed to reason through complex problems rather than simply predict outputs. It is built to analyze information, draw logical conclusions, and incorporate contextual nuance across tasks in code, mathematics, and STEM. The model supports native multimodality, meaning it can process text, images, audio, video, and code repositories within a single context. The model features a 1,048,576-token context window, making it suited for tasks that require processing large documents, entire codebases, or extended conversations. It scored 63.8% on the SWE-Bench Verified coding evaluation and is available through the Gemini API and Google AI Studio. It is best suited for developers and researchers working on complex reasoning tasks, long-document analysis, and advanced code generation.

Text Image File
Context: 1,048,576 Output: 65,536 tokens
Input: $1.25 Output: $10.00
View model →
Text

Gemma 3.2

Release date unavailable

Gemma 3 27B is an open-weight multimodal language model developed by Google DeepMind as the flagship model in the Gemma 3 family. It accepts both image and text inputs and generates text outputs, supporting over 140 languages and a context window of 128,000 tokens — sixteen times larger than the previous Gemma 2 generation. The model is built on the same research foundation as Google's Gemini models and was released in March 2025. Gemma 3 27B is designed to run in resource-constrained environments, including on a single consumer GPU with 24GB of VRAM, as well as on laptops, desktops, and cloud infrastructure. It is well-suited for tasks such as visual question answering, document analysis, multilingual text generation, summarization, coding assistance, and logical reasoning. Its combination of multimodal input support, large context handling, and open-weight availability makes it a practical choice for developers building applications that require flexible deployment options.

Text
Context: 128,000 Output: 8,000 tokens
Input: $0.10 Output: N/A
View model →
Text

Gemini 2.0 Flash Lite

Feb 25, 2025

Gemini 2.0 Flash Lite is a multimodal text generation model developed by Google, released in early 2025 as part of the Gemini 2.0 model family. It is designed specifically for high-volume, cost-sensitive applications, offering a balance between response speed and output quality. The model supports a context window of over one million tokens (1,048,576), making it suitable for processing long documents or extended conversations in a single request. Gemini 2.0 Flash Lite is best suited for developers and organizations that need to run large numbers of inference requests without incurring high costs. Its architecture prioritizes throughput and efficiency, making it a practical choice for tasks like summarization, classification, translation, and content generation at scale. The model's training data has a cutoff of June 2024, and it is accessible through Google's Vertex AI platform.

Text Image File
Context: 1,048,576 Output: 8,192 tokens
Input: $0.08 Output: $0.30
View model →
Text

Gemini 2.0 Flash

Feb 05, 2025

Gemini 2.0 Flash is a text generation model developed by Google, released as part of the Gemini 2.0 model family. It features a context window of 1,048,576 tokens and is designed to handle a broad range of everyday tasks with real-time response latency. The model's training data has a cutoff of June 2024. Gemini 2.0 Flash is positioned as an upgrade for users of the 1.5 Flash model who want meaningfully improved output quality, and for users of the 1.5 Pro model who want comparable or slightly improved quality at lower latency and cost. It is well-suited for applications that require processing long documents, maintaining extended conversations, or running high-throughput workloads where response speed matters.

Text Image File
Context: 1,048,576 Output: 8,192 tokens
Input: $0.15 Output: $0.40
View model →
Text

Gemini 1.5 Flash Deprecated

Release date unavailable

Speedy, cost-effective multimodal model for high-volume applications without compromising quality.

Text
Context: N/A Output: 8,192 tokens
Input: N/A Output: N/A
View model →
Text

Gemini 1.5 Pro Deprecated

Release date unavailable

Proficient at multimodal tasks and content creation from image, audio, and video inputs.

Text
Context: N/A Output: 8,192 tokens
Input: N/A Output: N/A
View model →
Text

Gemini 2.0 Flash Thinking Deprecated

Release date unavailable

Combining speed and performance, 2.0 Flash Thinking Experimental excels in science and math, showing its thinking to solve complex problems.

Text
Context: 128K Output: 8,192 tokens
Input: N/A Output: N/A
View model →
Text

Gemini 2.0 Pro Deprecated

Release date unavailable

An experimental update Gemini 2.0 for coding and complex prompts.

Text
Context: 128K Output: 8,192 tokens
Input: N/A Output: N/A
View model →
Text

PaLM 2 Deprecated

Release date unavailable

Advanced language model with high efficiency and accuracy for complex language tasks and creative content generation.

Text
Context: N/A Output: 1,024 tokens
Input: N/A Output: N/A
View model →
M

Mistral

17 models

Text

Mistral Large 3

Dec 02, 2025

Open source

Mistral Large 3 is a 675-billion-parameter mixture-of-experts (MoE) text generation model developed by Mistral. It is the first MoE model Mistral has released since the Mixtral series, and was trained from scratch on 3,000 NVIDIA H200 GPUs. The model is released under a permissive open-weight license, making the weights publicly available for download and self-hosting. Mistral Large 3 supports a 256,000-token context window and includes image understanding alongside text generation. It is particularly noted for multilingual conversation handling, with Mistral highlighting non-English and non-Chinese language performance as a focus area. The model is well-suited for tasks requiring long-context reasoning, multilingual text processing, and instruction following across general-purpose prompts.

Text
Context: 256,000 Output: 16,000 tokens
Input: $0.50 Output: $1.50
View model →
Text

Mistral Medium 3

May 07, 2025

Mistral Medium 3 is a text generation model released on May 7, 2025 by Mistral, a French AI company. It is designed to balance performance with cost efficiency, priced at $0.40 per million input tokens and $2.00 per million output tokens. The model supports a 128,000-token context window and was trained on data through early 2025. It is available through Mistral La Plateforme and Amazon SageMaker, with additional platform support planned. Mistral Medium 3 is built with enterprise deployment in mind, supporting self-hosted setups with a minimum of four GPUs as well as any cloud environment. It can be customized through continuous pre-training, fine-tuning, and integration with enterprise knowledge bases, making it applicable to domain-specific workflows in sectors such as financial services, energy, and healthcare. The model is noted for its strengths in coding tasks and multimodal understanding, and is suited for use cases including customer service automation, business process personalization, and complex dataset analysis.

Text Image File
Context: 128,000 Output: 16,000 tokens
Input: $0.40 Output: $2.00
View model →
Text

Mistral Nemo

Jul 19, 2024

Mistral NeMo is a text generation model developed by Mistral, a French AI company. It features a 128,000-token context window and is trained with function calling support, making it suitable for agentic and tool-use workflows. The model has particular strength across eleven languages: English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. Mistral NeMo is a 12-billion parameter model built in collaboration with NVIDIA, which is reflected in the "NeMo" name referencing NVIDIA's NeMo framework. It is designed for developers and organizations building multilingual applications where broad language coverage and a large context window are priorities. The model's combination of function calling capability, multilingual training, and long-context handling makes it a practical choice for global deployment scenarios.

Text Tools Structured Output
Context: 128,000 Output: 64,000 tokens
Input: $0.15 Output: $0.04
View model →
Text

Mixtral 8x22B Instruct Deprecated

Apr 17, 2024

Mistral's official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include: - strong math, coding,...

Text File Tools
Context: 65.5K Output: 64,000 tokens
Input: $2.00 Output: $6.00
View model →
Text

Mistral 7B Instruct

Oct 10, 2023

Mistral 7B Instruct is a 7-billion-parameter language model developed by Mistral AI and released in September 2023. It is the instruction-tuned variant of the base Mistral 7B model, fine-tuned to follow user instructions and produce clear, direct responses. The model uses grouped-query attention (GQA) and sliding window attention (SWA) techniques, which allow it to handle sequences efficiently within its 4,096-token context window. This model is well-suited for instruction-following tasks such as conversational AI, content summarization, and task-oriented dialogue. Because it is optimized to adhere closely to user-provided instructions, it performs consistently in structured workflows where predictable output format matters. It is available through Amazon Bedrock and is also openly accessible on Hugging Face, making it usable in a range of deployment environments.

Text
Context: 4,096 Output: 2,500 tokens
Input: $0.15 Output: N/A
View model →
Text

Mistral 7B Instruct Deprecated

Oct 10, 2023

Focused on instruction-based tasks, providing clear, concise responses adhering to user instructions.

Text
Context: N/A Output: 2,500 tokens
Input: N/A Output: N/A
View model →
Text

Ministral 3 14B

Release date unavailable

Ministral 3 14B is the largest model in the Ministral 3 family, developed by Mistral AI. It is an open-source text generation model with a 256,000-token context window, designed to handle long-form inputs and extended conversations. The model is released under an open license, making it available for local deployment and self-hosted use cases. The model is optimized for running on diverse hardware configurations, including consumer-grade local setups, which makes it suitable for developers and researchers who prefer on-device inference. Its 14 billion parameter count positions it as the largest variant in the Ministral 3 series. Common use cases include text generation, summarization, instruction following, and tasks that benefit from a large context window without requiring cloud-based infrastructure.

Text
Context: 256,000 Output: 16,000 tokens
Input: $0.20 Output: N/A
View model →
Text

Ministral 3 3B

Release date unavailable

Ministral 3 3B is a 3-billion-parameter language model developed by Mistral AI as part of the Ministral 3 family. It is the smallest model in that family and is released as open-weight, meaning the model weights are publicly available for download and local use. The model supports a 256,000-token context window and includes both language and vision capabilities in a compact form factor. Ministral 3 3B is designed specifically for edge deployment, making it suitable for running on local hardware, embedded systems, and resource-constrained environments. Its small parameter count allows it to operate efficiently across a wide range of hardware configurations without requiring cloud infrastructure. It is well-suited for developers building on-device applications, offline workflows, or latency-sensitive pipelines where a smaller footprint is a requirement.

Text
Context: 256,000 Output: 16,000 tokens
Input: $0.10 Output: N/A
View model →
Text

Ministral 3 8B

Release date unavailable

Ministral 3 8B is a text generation model developed by Mistral AI, part of the Ministral 3 model family. It is open source and designed with edge deployment in mind, meaning it is optimized to run efficiently across a range of hardware configurations, including local setups without cloud infrastructure. The model supports a 256,000-token context window, enabling it to process and reason over long documents in a single pass. Ministral 3 8B is well-suited for developers and organizations that need a capable language model deployable on-device or in resource-constrained environments. Its 8-billion parameter size makes it practical for local inference while still handling a broad range of text generation tasks. The open-source availability means it can be downloaded, fine-tuned, and self-hosted without requiring API access.

Text
Context: 256,000 Output: 16,000 tokens
Input: $0.15 Output: N/A
View model →
Text

Mistral 8x7b Deprecated

Release date unavailable

Mixtral 8x7B is a high-performance mixture-of-experts language model from Mistral AI, offering a 32K token context window with efficient, fast inference.

Text
Context: N/A Output: 8,192 tokens
Input: N/A Output: N/A
View model →
Text

Mistral Codestral

Release date unavailable

Mistral Codestral is an open-weight generative AI model built by Mistral and designed specifically for code generation tasks. It operates through a shared instruction and completion API endpoint, allowing developers to both write new code and interact with existing codebases. The model is trained on a dataset spanning more than 80 programming languages, including Python, Java, C, C++, JavaScript, Bash, Swift, and Fortran. Codestral is intended for developers building AI-assisted coding tools and applications, as it handles both code and English fluently. Its broad language coverage makes it applicable across a wide range of development environments and project types. Because it is open-weight, it can be deployed and integrated in ways that closed models typically do not permit.

Text
Context: 32,000 Output: 16,000 tokens
Input: $0.20 Output: N/A
View model →
Text

Mistral Large 24.02

Release date unavailable

Mistral Large 24.02 is a text generation model developed by Mistral, built around 123 billion parameters and designed to run on a single node for large-throughput inference. It features a 128,000-token context window, making it suited for long-document processing and extended conversational tasks. The model supports dozens of natural languages, including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean. Beyond natural language, Mistral Large 24.02 supports over 80 programming languages, including Python, Java, C, C++, JavaScript, and Bash, making it applicable to code generation and analysis tasks. Its single-node inference design means it can deliver high throughput without requiring distributed infrastructure. This combination of broad language coverage, large context capacity, and coding support makes it well-suited for multilingual applications, long-context document workflows, and software development assistance.

Text
Context: 128,000 Output: 16,000 tokens
Input: $4.00 Output: N/A
View model →
Text

Mistral Large 24.07

Release date unavailable

Mistral Large 24.07 is a text generation model developed by Mistral, released in July 2024 as the second iteration of their Large series. It features 123 billion parameters and a 128,000-token context window, making it suitable for long-document processing and extended conversational tasks within a single inference node. The model supports dozens of natural languages, including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean. One of the model's defining characteristics is its design for single-node inference, meaning the full 123B parameter model can run at high throughput without requiring multi-node infrastructure. It also supports over 80 coding languages, including Python, Java, C, C++, JavaScript, and Bash, making it applicable to software development workflows. On MindStudio, it is available through Amazon Bedrock under the identifier mistral-large-24.07-bedrock.

Text
Context: 128,000 Output: 16,000 tokens
Input: $2.00 Output: N/A
View model →
Text

Mistral Small 24.02

Release date unavailable

Mistral Small 24.02 is a text generation model developed by Mistral, designed to run on a single node while supporting a 128,000-token context window. It covers dozens of natural languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean, as well as over 80 coding languages such as Python, Java, C, C++, JavaScript, and Bash. The model has 123 billion parameters, which enables high-throughput inference without requiring multi-node infrastructure. This model is well-suited for long-context applications where fitting large documents or extended conversations into a single prompt is necessary. Its broad language coverage makes it applicable to multilingual workflows, while its coding language support makes it useful for code generation and analysis tasks. The single-node inference design is a practical consideration for teams managing deployment costs and infrastructure complexity.

Text
Context: 128,000 Output: 16,000 tokens
Input: $1.00 Output: N/A
View model →
Text

Mistral Small 3.1 (25.03)

Release date unavailable

Mistral Small 3.1 (25.03) is a text generation model developed by Mistral, released in March 2025. It features a 128,000-token context window, multimodal understanding, and support for dozens of spoken languages alongside more than 80 coding languages. The model is designed to run on a single node, making it practical for deployment without distributed infrastructure. This version introduces improved text performance and expanded context handling compared to earlier Mistral Small releases. At an inference speed of approximately 150 tokens per second, it is suited for tasks that require both throughput and long-context processing, such as document analysis, multilingual applications, and code generation. Its combination of broad language coverage and single-node efficiency makes it a practical choice for developers building production applications with constrained compute budgets.

Text
Context: 128,000 Output: 16,000 tokens
Input: $0.10 Output: N/A
View model →
Text

Mixtral 8x7B Instruct

Release date unavailable

Mixtral 8x7B Instruct is a sparse mixture-of-experts (SMoE) language model developed by Mistral AI and released under the Apache 2.0 license. It uses a routing mechanism that activates only a subset of its expert networks per token, allowing it to draw on a large total parameter count while keeping active computation lower than a dense model of equivalent size. The instruct variant has been fine-tuned to follow instructions and engage in conversational tasks. The model has a context window of 4,096 tokens and was trained on data through September 2023. Its open-weight, permissive license makes it suitable for commercial and research use cases where model access and reproducibility matter. It is well-suited for tasks such as text generation, summarization, question answering, and general instruction following.

Text
Context: 4,096 Output: 2,500 tokens
Input: $0.45 Output: N/A
View model →
Text

Mixtral 8x7B Instruct Deprecated

Release date unavailable

High-quality, efficient sparse model outperforming larger models in speed and benchmarks.

Text
Context: N/A Output: 2,500 tokens
Input: N/A Output: N/A
View model →
X

X.ai

12 models

Text

Grok 4.3

Apr 30, 2026

Grok 4.3 is a reasoning model from xAI. It accepts text and image inputs with text output, and is suited for agentic workflows, instruction-following tasks, and applications requiring high factual...

Text Image Tools
Context: 1M Output: 2,000,000 tokens
Input: $1.25 Output: $2.50
View model →
Text

Grok 4.20

Mar 31, 2026

Grok 4.20 is a text generation model developed by xAI, the AI division of X. This variant is specifically configured with reasoning disabled, meaning it skips the extended chain-of-thought process to deliver faster, lower-latency responses while still operating on the full Grok 4.20 architecture. It supports a context window of up to 2 million tokens, allowing it to ingest very long documents, large codebases, or extended conversation histories in a single pass. The model was made available via API in March 2026 as part of the Grok 4.20 Beta family, which also includes reasoning-enabled and multi-agent-tuned variants. This model is designed for agentic and tool-centric workflows where response speed is a priority over deep step-by-step reasoning. It is well-suited for automated pipelines, coding agents, data-processing tasks, and any application where the model needs to call external tools rapidly and reliably. Its instruction-following behavior is tuned for consistency, making outputs predictable across repeated or templated prompts. Developers building low-latency AI systems or integrating LLM capabilities into production pipelines are the primary intended audience.

Text Image File
Context: 2M Output: 2,000,000 tokens
Input: $2.00 Output: $2.50
View model →
Text

Grok 4.20 Reasoning

Release date unavailable

Grok 4.20 Reasoning is an experimental, reasoning-focused text generation model developed by xAI, the AI division of X. It is part of the Grok 4.20 beta series and is specifically designed to work through problems using deliberate, multi-step thinking before producing a response. This approach improves accuracy on tasks where a direct answer is likely to fall short, such as mathematical problem-solving, logical analysis, and scientific reasoning. The model supports a context window of 2,000,000 tokens, allowing it to process and reason over very long documents or extended conversation histories in a single pass. It is accessible through the xAI inference provider via the Inworld Router or Realtime API, making it straightforward to integrate into developer applications. Use cases where it is particularly well-suited include research assistance, code debugging, nuanced question answering, and any workflow that benefits from structured, step-by-step analysis.

Text
Context: 2,000,000 Output: 2,000,000 tokens
Input: $2.00 Output: N/A
View model →
Text

Grok 4.1 Fast

Release date unavailable

Grok 4.1 Fast is a speed-optimized text generation model developed by xAI, the AI division of X. It is the non-reasoning variant of Grok 4.1 Fast, meaning it skips the extended chain-of-thought processing used in its reasoning counterpart and instead delivers near-instant, pattern-matched responses. This design makes it well-suited for applications where low latency matters more than deliberative step-by-step analysis. The model supports a 2 million token context window, multimodal input (text and images), tool use, structured outputs, and implicit caching. Grok 4.1 Fast is built for real-time and high-throughput workloads such as customer support automation, finance workflows, and agentic pipelines that require rapid sequential tool calls. Its large context window allows it to process extensive documents, long conversation histories, or complex multi-step task instructions in a single pass. The model shares weights with the full Grok 4.1 Fast but trades deliberative reasoning for response speed, making it a practical choice when throughput and latency are the primary constraints.

Text
Context: N/A Output: 2,000,000 tokens
Input: $0.20 Output: $2.50
View model →
Text

Grok 4.1 Fast Reasoning

Release date unavailable

Grok 4.1 Fast Reasoning is a text generation model developed by xAI, the AI division of X. It is designed specifically for agentic and tool-calling workflows, trained through reinforcement learning in simulated environments across dozens of tool-use domains. The model supports a 2-million-token context window, accepts both text and image inputs, and produces text outputs with chain-of-thought reasoning enabled. The model is best suited for developers building autonomous agents, enterprise automation pipelines, and multi-step research or customer support applications. It supports structured outputs, function calling, and a range of tool integrations including web search, X search, code execution, file retrieval, and MCP tool integrations via the Agent Tools API. Its training cutoff is November 2025, and it is available through the xAI API as well as third-party cloud providers such as Oracle Cloud.

Text
Context: N/A Output: 2,000,000 tokens
Input: $0.20 Output: $2.50
View model →
Text

Grok 4 Fast

Release date unavailable

Grok 4 Fast is a text generation model developed by xAI, the AI division of X. It is built on learnings from Grok 4 and is designed to deliver high-quality reasoning at lower computational cost, using approximately 40% fewer thinking tokens on average compared to its full counterpart. The model features a 2 million token context window and supports both reasoning and non-reasoning modes within a single unified architecture. Grok 4 Fast is trained end-to-end with tool-use reinforcement learning, enabling it to handle agentic tasks such as web browsing, code execution, and real-time information synthesis. It accepts both text and image inputs and produces text output. The model is well-suited for developers and enterprises that need multi-step reasoning, long-context document processing, and real-time web research without the computational overhead of a full frontier model.

Text
Context: N/A Output: 2,000,000 tokens
Input: $0.20 Output: $2.50
View model →
Text

Grok 4 Fast Reasoning

Release date unavailable

Grok 4 Fast Reasoning is a text generation model developed by xAI, released in September 2025 as a cost-efficient counterpart to their flagship Grok 4 model. It is built using large-scale reinforcement learning and uses approximately 40% fewer thinking tokens on average compared to Grok 4, while achieving comparable benchmark results. The model supports a 2 million token context window, making it suitable for processing large documents, multi-file codebases, and extended conversations. The model accepts both text and image inputs and outputs text, with a unified architecture that blends chain-of-thought reasoning with faster response modes depending on task complexity. It is trained end-to-end with tool-use reinforcement learning, enabling agentic web search, browsing X (Twitter), and real-time information synthesis. Grok 4 Fast Reasoning is well-suited for developers and users working on research, coding assistance, agentic workflows, and complex question answering where efficiency and speed are priorities.

Text
Context: N/A Output: 2,000,000 tokens
Input: $0.20 Output: $2.50
View model →
Text

Grok 4

Jul 09, 2025

Grok 4 is a text generation model developed by xAI, released on July 9, 2025, and trained using reinforcement learning on xAI's 200,000-GPU Colossus cluster. It features a 256,000-token context window and was built with a 6x improvement in compute efficiency over its predecessor, with verifiable training data expanded well beyond mathematics and coding. The model is designed for tasks requiring deep reasoning, including expert-level problems in science, mathematics, and software development. What distinguishes Grok 4 is its native tool use — it was trained to autonomously operate a code interpreter and web browser, selecting its own search queries to produce thorough answers. It also integrates real-time web search and X (Twitter) search, including keyword, semantic, and media search. A variant called Grok 4 Heavy runs multiple reasoning agents in parallel at inference time to handle the most demanding problems, and it was the first model to score above 50% on the Humanity's Last Exam benchmark. Grok 4 is available to SuperGrok and Premium+ subscribers on grok.com and through the xAI API.

Text
Context: 256,000 Output: 256,000 tokens
Input: $3.00 Output: $15.00
View model →
Text

Grok 3 Mini Fast

Release date unavailable

Grok 3 Mini Fast Beta is a compact text generation model developed by xAI, the AI division of X. It belongs to the Grok 3 model family and is designed to deliver faster response times compared to the full Grok 3 models, making it suitable for latency-sensitive applications. The model supports extended thinking, function calling, and real-time web search, and operates with a 131,072-token context window. Grok 3 Mini Fast Beta is well-suited for developers and businesses building high-throughput applications that require reasoning capability without the overhead of a larger model. Practical use cases include question answering, document summarization, data extraction, and tool-augmented agentic workflows. Its combination of speed, extended context, and tool integration makes it a practical option for production environments where response time is a priority.

Text
Context: 131,072 Output: 8,192 tokens
Input: $0.60 Output: N/A
View model →
Text

Grok 3

Release date unavailable

Grok 3 is the flagship large language model from xAI, developed and released in February 2025. It was built from the ground up in approximately one year and is designed to handle demanding tasks including advanced reasoning, coding, and creative writing. The model is available via API under the identifier grok-3-latest and supports a context window of 131,072 tokens. It includes a dedicated Thinking mode that enables multi-step reasoning on complex problems. Grok 3 is well-suited for tasks that require structured, multi-step problem solving, such as scientific research, advanced mathematics, and complex software development. It scored 96% on AIME, a challenging mathematics competition benchmark, and 85% on GPQA, a graduate-level science reasoning benchmark. The model also supports image understanding, function calling, and structured output generation, making it usable across a range of developer and research workflows. It ranked first in creative writing evaluations at the time of its release.

Text
Context: 131,072 Output: 8,192 tokens
Input: $3.00 Output: $2.50
View model →
Text

Grok 3 Fast

Release date unavailable

Grok 3 Fast is a performance-optimized variant of xAI's Grok 3 model, released in April 2025 as part of the Grok 3 family. It is designed to deliver faster response times compared to the standard Grok 3 Beta while retaining the same core language understanding, function calling, and web search capabilities. The model supports a 131,072-token context window, making it capable of handling long documents and extended multi-turn conversations. Grok 3 Fast is best suited for applications where response latency matters, such as real-time chat interfaces, high-throughput processing pipelines, and interactive AI assistants. Its support for function calling allows developers to integrate external tools and APIs, enabling agentic workflows that can act on live information. The model exposes an OpenAI-compatible API, which simplifies adoption for developers already working within that ecosystem.

Text
Context: 131,072 Output: 8,192 tokens
Input: $5.00 Output: N/A
View model →
Text

Grok 3 Mini

Release date unavailable

Grok 3 Mini Beta is a compact text generation model developed by xAI, the AI division of X. It is designed as a thinking model, meaning it reasons through problems step by step before producing a final answer, and it exposes that reasoning trace so users can follow the model's logic in full. The model supports adjustable reasoning effort, defaulting to a lower setting for speed but allowing a high-effort mode for more demanding problems. It has a 131,072-token context window and was trained with data up to April 2025. Grok 3 Mini is best suited for tasks that rely heavily on structured reasoning rather than broad world knowledge — including math problems, logic puzzles, coding challenges, and quantitative analysis. According to xAI's published benchmarks, it scores 95.8% on AIME 2024 and 80.4% on LiveCodeBench. It also supports function calling and web search, making it usable in agentic workflows. Epoch AI has noted that with high reasoning effort, Grok 3 Mini outperforms the larger Grok 3 model on math benchmarks.

Text
Context: 131,072 Output: 8,192 tokens
Input: $0.30 Output: N/A
View model →
A

Anthropic

12 models

Text

Claude 4.7 Opus

Apr 16, 2026

Opus 4.7 is the next generation of Anthropic's Opus family, built for long-running, asynchronous agents. Building on the coding and agentic strengths of Opus 4.6, it delivers stronger performance on...

Text Image File
Context: 1M Output: 128,000 tokens
Input: $5.00 Output: $25.00
View model →
Text

Claude 4.6 Sonnet

Feb 17, 2026

Claude Sonnet 4.6 is a text generation model developed by Anthropic, released in February 2026 as an upgrade to the Sonnet line of mid-tier models. It features a 1 million token context window in beta, allowing it to process entire codebases, lengthy legal documents, or large collections of research papers within a single request. The model is designed for coding, agentic workflows, computer use, and professional knowledge work at scale. Sonnet 4.6 is particularly suited for developers and enterprises running high-volume workloads that require consistent instruction following, accurate tool selection, and reliable error correction across long sessions. It includes improved computer use capabilities, enabling it to navigate browsers, fill multi-step web forms, and automate desktop workflows. Anthropic's safety evaluations found it to be as safe as or safer than other recent Claude models, with noted resistance to prompt injection attacks.

Text Image File
Context: 1M Output: 128,000 tokens
Input: $3.00 Output: $15.00
View model →
Text

Claude 4.6 Opus

Feb 04, 2026

Claude Opus 4.6 is Anthropic's most capable text generation model, released on February 5, 2026. It is designed for long-horizon agentic tasks, complex reasoning, and professional knowledge work across domains such as software development, finance, and legal analysis. A defining feature of this release is its 1 million token context window, available in beta, which allows the model to process and reason over very large volumes of information within a single session. It also introduces adaptive thinking, which automatically calibrates the depth of reasoning applied based on the complexity of the task at hand. Opus 4.6 is built to handle demanding, real-world workloads with minimal human oversight. It can orchestrate teams of subagents, parallelize work across tools, and sustain long-running tasks across the full software development lifecycle from architecture through deployment. The model supports tool use and MCP server integration, making it suitable for enterprise workflows and autonomous agent pipelines. It is best suited for senior engineers, analysts, and organizations that need to delegate complex, multi-step challenges to an AI system.

Text Image File
Context: 1M Output: 128,000 tokens
Input: $5.00 Output: $25.00
View model →
Text

Claude 4.5 Opus

Nov 24, 2025

Claude 4.5 Opus is Anthropic's top-tier large language model, released on November 24, 2025. It is designed for demanding tasks including software engineering, long-horizon autonomous workflows, and complex reasoning, with a 200,000-token context window that supports multi-file operations and extended document analysis. The model includes an "effort" parameter that gives developers control over reasoning depth, allowing optimization for either speed or accuracy depending on the task at hand. Claude 4.5 Opus is particularly suited for enterprises and developers working on large-scale software engineering, autonomous agent orchestration, financial modeling, legal analysis, and deep research workflows. It features enhanced computer use capabilities, including a zoom tool for detailed screen inspection, enabling UI-based automation. Early users reported that the model handles ambiguous, multi-system problems with minimal guidance, and some reported token usage reductions of up to 65% compared to earlier models when solving equivalent problems.

Text Image File
Context: 200K Output: 64,000 tokens
Input: $5.00 Output: $25.00
View model →
Text

Claude 4.5 Haiku

Oct 15, 2025

Claude 4.5 Haiku is a lightweight text generation model developed by Anthropic, released in October 2025. It is designed to deliver high throughput and low latency while maintaining strong performance on coding and reasoning tasks. The model supports a 200,000-token context window and can generate up to 64,000 tokens in a single response, making it capable of handling long documents and complex multi-turn conversations. It accepts text, images, and PDFs as input and is available through Anthropic's API, AWS Bedrock, and Google Cloud Vertex AI. Claude 4.5 Haiku is built for production applications where speed and cost efficiency are priorities, such as customer support systems, real-time coding assistants, document processing pipelines, and autonomous AI agents. It supports tool calling, reasoning, and multi-step workflow automation, enabling agentic use cases without requiring a heavier model. Its knowledge cutoff is February 2025. Developers looking to build high-volume applications will find it suited to scenarios where response time and per-token cost are key constraints.

Text Image File
Context: 200,000 Output: 64,000 tokens
Input: $1.00 Output: $5.00
View model →
Text

Claude 4.5 Sonnet

Sep 29, 2025

Claude 4.5 Sonnet is a text generation model developed by Anthropic, released in September 2025. It is designed for software development, autonomous agent workflows, and direct computer interaction, supporting a 200,000-token context window. The model is trained with a knowledge cutoff of September 2025 and is available through Anthropic's API as well as Amazon Bedrock. The model is built to handle extended, multi-step tasks — including executing commands, editing files, and running tests — with sustained coherence over long sessions. It scores 61.4% on OSWorld, a benchmark for real-world computer task completion, and ranks at the top of the SWE-bench Verified leaderboard for software engineering tasks. Claude 4.5 Sonnet integrates with tools like Claude Code, the Claude Agent SDK, and MCP servers, making it well-suited for building production AI agents and developer tooling.

Text Image File
Context: 200,000 Output: 64,000 tokens
Input: $3.00 Output: $15.00
View model →
Text

Claude 4.1 Opus

Aug 05, 2025

Claude Opus 4.1 is Anthropic's flagship text generation model, released on August 5, 2025 as an upgrade to Claude Opus 4. It is designed for demanding workflows that require sustained reasoning across long, multi-step tasks, with particular strength in software development, autonomous research, and agentic problem solving. The model supports a 200,000-token context window, up to 32,000 output tokens, and accepts both text and image inputs. It is multilingual, with documented support for French, Arabic, Mandarin, Japanese, Korean, Spanish, and Hindi. On the SWE-bench Verified benchmark for real-world software bug fixing, Claude Opus 4.1 scores 74.5%, and it delivers a one standard deviation improvement over Opus 4 on Windsurf's junior developer benchmark for autonomous coding tasks. It supports extended thinking with up to 64,000 reasoning tokens, enabling deeper deliberation on complex problems. The model is available through the Anthropic API, Claude Code, Amazon Bedrock, and Google Cloud Vertex AI, making it suited for developers, researchers, and enterprises running complex multi-file code refactoring, long-horizon agent workflows, and in-depth research synthesis.

Text Image File
Context: 32,000 Output: 32,000 tokens
Input: $15.00 Output: $75.00
View model →
Text

Claude 4 Opus

May 22, 2025

Claude Opus 4 is a text generation model released by Anthropic on May 22, 2025. It is a hybrid model that supports both near-instant responses and extended thinking, allowing it to alternate between multi-step reasoning and tool use — such as web search — within a single workflow. The model carries a 200,000-token context window and supports vision, function calling, prompt caching, and structured outputs. On release, it scored 72.5% on SWE-bench Verified, 79.6% on GPQA Diamond, and 75.5% on AIME 2025. Claude Opus 4 is designed for tasks that require sustained, complex reasoning across long contexts, including refactoring large codebases, synthesizing research across many documents, and coordinating multi-step agentic workflows. Anthropic has classified it under ASL-3 safety measures — the first Claude model to receive that designation — which applies restrictions related to potential misuse in sensitive domains. It is well-suited for developer and enterprise applications that involve autonomous task execution, long-horizon planning, or processing large volumes of text and image data in a single session.

Text Image File
Context: 200,000 Output: 32,000 tokens
Input: $15.00 Output: $75.00
View model →
Text

Claude 4 Sonnet

May 22, 2025

Claude Sonnet 4 (claude-sonnet-4-20250514) is a text generation model developed by Anthropic and released on May 22, 2025. It sits in the mid-tier of Anthropic's Claude 4 model family, designed to balance capability with computational efficiency for production use. The model supports a 200,000-token context window and accepts text, images, and PDFs as input. It includes an optional extended thinking mode that allows the model to perform step-by-step reasoning when tasks require greater depth. Claude Sonnet 4 is built for high-volume workloads where consistent performance and reliability matter. It scores 72.7% on SWE-bench, reflecting strong performance on software engineering tasks such as code generation, debugging, and codebase navigation. The model also supports agentic tool use, making it suitable for multi-step workflows and integration with external APIs. Common use cases include code review, customer support automation, data analysis, and long-document processing.

Text Image File
Context: 200,000 Output: 64,000 tokens
Input: $3.00 Output: $15.00
View model →
Text

Claude 3 Haiku

Mar 13, 2024

Claude 3 Haiku is a text generation model developed by Anthropic, positioned as the fastest and most affordable model in the Claude 3 family. It features a 200,000-token context window and vision capabilities, making it suitable for tasks that require processing large documents or analyzing images alongside text. The model's training data has a cutoff of August 2023. Haiku is designed for enterprise use cases where throughput and cost efficiency matter, such as customer support, real-time chat, and batch processing of large datasets. It is capable of processing approximately 21,000 tokens — roughly 30 pages — per second for prompts under 32,000 tokens, which makes it well-suited for latency-sensitive applications and workloads that involve running many smaller tasks in parallel.

Text Image Tools
Context: 200,000 Output: 4,096 tokens
Input: $0.25 Output: $1.25
View model →
Text

Claude 3 Sonnet

Release date unavailable

Claude 3 Sonnet is a large language model developed by Anthropic, released as part of the Claude 3 model family in early 2024. It is designed to occupy a middle position within that family, offering a balance between response quality and processing speed suited to high-volume, enterprise-scale deployments. The model supports a 200,000-token context window, enabling it to process and reason over long documents, codebases, and extended conversations in a single pass. Claude 3 Sonnet is particularly well-suited for organizations running large-scale AI workloads where throughput and cost efficiency are priorities alongside output quality. Its training data has a cutoff of August 2023, and it is available through Anthropic's API as well as cloud providers including Amazon Web Services via Bedrock. The model handles tasks such as summarization, question answering, content drafting, and code assistance across a wide range of professional contexts.

Text
Context: 200,000 Output: 4,096 tokens
Input: $3.00 Output: N/A
View model →
Text

Claude Instant Deprecated

Release date unavailable

Structured model profile with pricing, context, and capability details.

Text
Context: N/A Output: 4,096 tokens
Input: N/A Output: N/A
View model →
D

DeepSeek

10 models

Text

DeepSeek V4 Flash

Apr 24, 2026

Open source

DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated parameters, supporting a 1M-token context window. It is designed for fast inference and...

Text Tools Structured Output
Context: 1.0M Output: 384,000 tokens
Input: $0.14 Output: $0.00
View model →
Text

DeepSeek V4 Pro

Apr 24, 2026

Open source

DeepSeek V4 Pro is a large-scale Mixture-of-Experts model from DeepSeek with 1.6T total parameters and 49B activated parameters, supporting a 1M-token context window. It is designed for advanced reasoning, coding,...

Text Tools Structured Output
Context: 1.0M Output: 384,000 tokens
Input: $1.74 Output: $0.87
View model →
Text

Kimi K2.6

Apr 21, 2026

Open source

Kimi K2.6 is Moonshot AI's next-generation multimodal model, designed for long-horizon coding, coding-driven UI/UX generation, and multi-agent orchestration. It handles complex end-to-end coding tasks across Python, Rust, and Go, and...

Text Image Tools
Context: 262.1K Output: 16,384 tokens
Input: $0.75 Output: $4.00
View model →
Text

Kimi K2.5

Jan 27, 2026

Kimi K2.5 is an open-source multimodal model developed by Moonshot AI and released in January 2026. It uses a Mixture-of-Experts architecture with 1 trillion total parameters and approximately 32 billion active at inference time, trained on roughly 15 trillion mixed visual and text tokens. Unlike models that add vision as a secondary capability, Kimi K2.5 was trained natively on both image and text data, enabling integrated understanding of charts, documents, video, and code. The model supports two operating modes — Instant Mode for direct responses and Thinking Mode for step-by-step reasoning on complex problems — within a 256,000-token context window. It introduces an Agent Swarm paradigm that can coordinate up to 100 parallel sub-agents, reducing execution time by 4.5x on parallelizable tasks. Kimi K2.5 is released under a modified MIT license, making it available for local deployment, fine-tuning, and commercial use, and is particularly suited for visual programming, document analysis, automated research, and multi-step agentic workflows.

Text Image Tools
Context: 262,144 Output: 16,384 tokens
Input: $0.45 Output: $1.90
View model →
Text

DeepSeek V3.2

Dec 01, 2025

DeepSeek-V3.2 is an open-weight large language model developed by DeepSeek and released on December 1, 2025. It uses a Mixture-of-Experts architecture combined with a novel sparse attention mechanism called DeepSeek Sparse Attention (DSA), which reduces computational complexity to near-linear scale (O(kL)) for long-context tasks. The model supports a 160,000-token context window and is available under the MIT License on Hugging Face. DeepSeek-V3.2 introduces three notable technical advances: a scalable reinforcement learning training framework, a large-scale agentic task synthesis pipeline covering over 1,800 environments and 85,000+ complex instructions, and native support for Thinking in Tool-Use — the ability to reason while invoking external tools in both thinking and non-thinking modes. It is best suited for complex multi-step reasoning, agentic workflows involving search and code execution, long-context document processing, and developers building AI applications that require integrated reasoning and tool use.

Text Tools Structured Output
Context: 160,000 Output: 8,000 tokens
Input: $0.26 Output: $0.38
View model →
Text

DeepSeek V3.1

Aug 21, 2025

DeepSeek-V3.1 is a 671-billion parameter large language model developed by DeepSeek, using a Mixture-of-Experts (MoE) architecture that activates 37 billion parameters at any given time. It supports a 128,000-token context window and was trained through August 2025, with an enhanced base model built using a two-phase long-context extension process that included 630 billion tokens at the 32K phase and 209 billion tokens at the 128K phase. The model accepts text input and produces text output across a wide range of general-purpose tasks. What distinguishes DeepSeek-V3.1 from earlier versions is its hybrid thinking design: a single model that can operate in a fast conversational mode or a slower step-by-step reasoning mode, selectable through prompting rather than requiring a separate model. Post-training improvements have also focused on tool use and agentic workflows, including multi-step API calls, web search, and code execution. This makes it well-suited for coding, mathematical reasoning, long-document analysis, and complex multi-turn agent tasks.

Text Tools Structured Output
Context: 128,000 Output: 8,000 tokens
Input: $0.27 Output: $0.79
View model →
Text

DeepSeek-R1

Jan 22, 2025

DeepSeek-R1 is a text generation model developed by DeepSeek, a Chinese AI company. It is a reasoning-focused model that generates a Chain of Thought (CoT) before producing a final answer, a technique designed to improve accuracy on multi-step problems. The model was trained through late 2024 and supports a context window of 64,000 tokens. DeepSeek released the model weights publicly, making it available for local deployment and research use. DeepSeek-R1 is well suited for tasks that benefit from structured reasoning, such as mathematics, logic puzzles, coding challenges, and scientific problem-solving. Because the model externalizes its reasoning steps before answering, users can inspect the thought process that led to a given response. DeepSeek also released a series of distilled versions of R1 based on smaller base models, broadening its accessibility across different hardware configurations.

Text
Context: 64,000 Output: 8,000 tokens
Input: $0.55 Output: N/A
View model →
Text

DeepSeek-V3

Dec 26, 2024

DeepSeek-V3 is a large language model developed by DeepSeek, a Chinese AI company. It is a general-purpose text generation model designed to handle a wide range of tasks including coding, reasoning, summarization, and open-ended conversation. The model supports a 128,000-token context window and was trained on data through late 2024. It is identified by the model ID deepseek-chat and is available via API. DeepSeek-V3 uses a Mixture-of-Experts (MoE) architecture with 671 billion total parameters, activating 37 billion per forward pass, which allows it to maintain efficiency at scale. The model was trained using an optimized pipeline that includes multi-token prediction and FP8 mixed-precision training. It is well-suited for tasks that require long-context understanding, instruction following, and multi-step reasoning across technical and general domains.

Text Tools Structured Output
Context: 128,000 Output: 8,000 tokens
Input: $0.27 Output: $0.89
View model →
Text

DeepSeek R1 Turbo

Release date unavailable

DeepSeek R1 Turbo is a text generation model developed by DeepSeek, designed as an accelerated variant of the R1 reasoning model family. It retains the chain-of-thought reasoning capabilities of the base R1 model while incorporating architectural and inference optimizations aimed at reducing latency. The model supports a 128,000-token context window and was trained on data through late 2024. It accepts text input and produces text output across a wide range of analytical and generative tasks. DeepSeek R1 Turbo is particularly well-suited for applications where multi-step reasoning is required but response time is a practical constraint. Common use cases include coding assistance, mathematical problem-solving, logical deduction, and structured analytical workflows. Developers building interactive tools or real-time applications that depend on reasoning-intensive outputs are the primary intended audience for this model.

Text
Context: 128,000 Output: 8,000 tokens
Input: $1.00 Output: N/A
View model →
Text

DeepSeek-V3 Deprecated

Release date unavailable

General-purpose LLM from Chinese AI company DeepSeek.

Text
Context: N/A Output: 8,000 tokens
Input: N/A Output: N/A
View model →
P

Perplexity

9 models

Text

Sonar Deep Research

Mar 07, 2025

Sonar Deep Research is a text generation model developed by Perplexity AI, released in February 2025. It is designed specifically for complex, multi-step research tasks that require gathering and synthesizing information from a large number of web sources. Rather than returning a single retrieved answer, it autonomously plans a research strategy, conducts dozens of iterative web searches, evaluates the results, and refines its approach before producing a detailed, citation-backed report. It operates with a 128,000-token context window, allowing it to handle substantial volumes of text and references within a single session. Sonar Deep Research is best suited for tasks where thoroughness and accuracy take priority over response speed, such as academic research, market analysis, competitive intelligence, and due diligence investigations. It includes a dedicated reasoning phase in which the model thinks through gathered material before generating its final output, which helps produce more nuanced and accurate responses. The model does not use customer queries or outputs for training purposes. It is well-suited for professionals, researchers, and developers working in domains like finance, technology, healthcare, and current events who need reliable, well-sourced reports.

Text Reasoning
Context: 128,000 Output: 8,000 tokens
Input: $2.00 Output: $8.00
View model →
Text

Sonar Pro

Mar 07, 2025

Sonar Pro is a search-augmented text generation model developed by Perplexity, designed to handle complex research queries that require thorough source attribution and multi-step reasoning. It operates with a 200,000-token context window, allowing it to process large volumes of information within a single session. The model supports both text and image inputs and can produce up to 8,192 output tokens per response. It also includes function calling, structured output generation, and a reasoning mode for analytical tasks. Sonar Pro is Perplexity's premium tier offering within the Sonar model family, delivering approximately twice the citations and search results compared to the standard Sonar model. This makes it particularly well-suited for enterprise applications, professional research workflows, and use cases that demand comprehensive source coverage and reliable multi-step query handling. The model's training data extends through March 2025, and its live web search integration means responses can draw on current information beyond that date. It is available via API for developers building research-intensive or knowledge-heavy applications.

Text Image
Context: 200K Output: 8,000 tokens
Input: $3.00 Output: $15.00
View model →
Text

Sonar Reasoning Pro

Mar 07, 2025

Sonar Reasoning Pro is a text generation model developed by Perplexity AI, built on top of DeepSeek R1 and augmented with Perplexity's proprietary real-time web search capabilities. It uses Chain-of-Thought reasoning to work through problems step by step before producing a final answer, making it distinct from models that rely solely on static training data. The model supports a 128,000-token context window and multiple languages, and was made available in February 2025. Sonar Reasoning Pro is designed for tasks where accuracy, source transparency, and up-to-date information are important. Because it actively queries the web during inference, it can surface current information and provide citations alongside its responses. It is best suited for in-depth research, complex multi-step analytical questions, and scenarios where users need a well-reasoned explanation grounded in verifiable, recent sources.

Text Image Reasoning
Context: 128,000 Output: 8,000 tokens
Input: $2.00 Output: $8.00
View model →
Text

Sonar

Jan 27, 2025

Sonar is Perplexity AI's in-house text generation model, built on Meta's Llama 3.3 70B and optimized for web-grounded question answering. Released in January 2025, it retrieves live internet data at query time rather than relying solely on static training knowledge, and every response includes inline source citations for transparency. It supports a 128,000-token context window and runs at approximately 121 tokens per second using Cerebras wafer-scale inference. Sonar is designed for developers and businesses that need to embed fast, factual, and source-backed search capabilities into their own applications. It offers three search depth modes — High, Medium, and Low — allowing teams to balance thoroughness against response speed depending on their use case. On the SimpleQA benchmark, Sonar achieved an F-score of 0.773, reflecting its focus on factual accuracy. It is particularly well-suited for high-volume applications such as sales research tools, medical information platforms, and real-time in-meeting search features.

Text Image
Context: 128,000 Output: 32,768 tokens
Input: $1.00 Output: $1.00
View model →
Text

Sonar Large Chat Deprecated

Release date unavailable

Perplexity's latest model family surpassing earlier versions in cost-efficiency, speed, and performance.

Text
Context: N/A Output: 32,768 tokens
Input: N/A Output: N/A
View model →
Text

Sonar Large Online Deprecated

Release date unavailable

Perplexity's latest model family surpassing earlier versions in cost-efficiency, speed, and performance.

Text
Context: N/A Output: 28,000 tokens
Input: N/A Output: N/A
View model →
Text

Sonar Reasoning Deprecated

Release date unavailable

Lightweight reasoning offering powered by reasoning models trained with DeepSeek R1.

Text
Context: N/A Output: 32,768 tokens
Input: N/A Output: N/A
View model →
Text

Sonar Small Chat Deprecated

Release date unavailable

Perplexity's latest model family surpassing earlier versions in cost-efficiency, speed, and performance.

Text
Context: N/A Output: 32,768 tokens
Input: N/A Output: N/A
View model →
Text

Sonar Small Online Deprecated

Release date unavailable

Perplexity's latest model family surpassing earlier versions in cost-efficiency, speed, and performance.

Text
Context: N/A Output: 28,000 tokens
Input: N/A Output: N/A
View model →
M

Meta

8 models

Text

Llama 4 Maverick

Apr 05, 2025

Open source

Llama 4 Maverick is a multimodal mixture-of-experts model developed by Meta, released in early 2025. It has 17 billion active parameters drawn from a pool of 400 billion total parameters across 128 experts, and supports both text and image inputs. The model handles 12 languages and offers a 130,000-token context window, making it suited for long-document and multilingual tasks. Maverick is designed for general assistant and chat use cases, with particular strengths in image understanding and creative writing. It uses a sparse MoE architecture, meaning only a subset of parameters are activated per inference pass, which allows the model to deliver broad capability at a more efficient compute cost. Developers building applications that require cross-language support, visual reasoning, or extended context handling are the primary target audience for this model.

Text Image Structured Output
Context: 130,000 Output: 60,000 tokens
Input: $0.20 Output: $0.60
View model →
Text

Llama 4 Scout

Apr 05, 2025

Llama 4 Scout is a multimodal AI model developed by Meta, released in early 2025 as part of the Llama 4 model family. It uses a Mixture of Experts (MoE) architecture with 17 billion active parameters, 16 experts, and 109 billion total parameters, processing both text and image inputs through a unified model backbone. The model supports a 130,000-token context window and is available under Meta's Llama 4 Community License. Llama 4 Scout is designed for developers and enterprises building applications that require multimodal understanding across text and vision. Its MoE design activates only a subset of parameters per token, making inference more compute-efficient relative to dense models of comparable total parameter count. It is well-suited for tasks such as document analysis, image-grounded question answering, and long-context text generation.

Text Image Tools
Context: 130,000 Output: 60,000 tokens
Input: $0.10 Output: $0.30
View model →
Text

Llama-2 13B Chat Deprecated

Jul 18, 2023

Balanced model for detailed language processing, offering advanced understanding and generation.

Text
Context: N/A Output: 2,500 tokens
Input: N/A Output: N/A
View model →
Text

Llama-2 70B Chat Deprecated

Jul 18, 2023

Provides depth and complexity in language understanding for sophisticated content creation.

Text
Context: N/A Output: 2,500 tokens
Input: N/A Output: N/A
View model →
Text

Llama 4 Scout

Apr 11, 2022

Llama 4 Scout is a multimodal AI model developed by Meta, released in early 2025 as part of the Llama 4 model family. It uses a Mixture of Experts (MoE) architecture with 17 billion active parameters, 16 experts, and 109 billion total parameters, meaning only a subset of parameters is activated per token during inference. The model processes both text and image inputs within a unified backbone and supports a 130,000-token context window. Llama 4 Scout is designed for developers and enterprises building applications that require combined text and vision understanding. Its MoE design makes it more compute-efficient during training and inference compared to dense models of similar total parameter counts. On MindStudio, it is served via Groq, which provides low-latency inference for the instruct-tuned variant.

Text
Context: 130,000 Output: 8,192 tokens
Input: $0.11 Output: N/A
View model →
Text

Code Llama Deprecated

Release date unavailable

Tailored for code comprehension, generation, and debugging with an instructive design.

Text
Context: N/A Output: 2,500 tokens
Input: N/A Output: N/A
View model →
Text

Llama 3 70B Deprecated

Release date unavailable

Structured model profile with pricing, context, and capability details.

Text
Context: N/A Output: 8,192 tokens
Input: N/A Output: N/A
View model →
Text

Llama 3 8B Deprecated

Release date unavailable

Structured model profile with pricing, context, and capability details.

Text
Context: N/A Output: 8,192 tokens
Input: N/A Output: N/A
View model →
Q

Qwen

2 models

Z

Z.ai

5 models

Text

GLM 5.1

Apr 07, 2026

Open source

GLM-5.1 delivers a major leap in coding capability, with particularly significant gains in handling long-horizon tasks. Unlike previous models built around minute-level interactions, GLM-5.1 can work independently and continuously on...

Text Tools Structured Output
Context: 202.8K Output: 16,384 tokens
Input: $1.40 Output: $3.08
View model →
Text

GLM 5

Feb 11, 2026

GLM-5 is a 744-billion-parameter Mixture-of-Experts language model developed by Z.ai (formerly Zhipu AI), released in February 2026 under the MIT license. It activates 40 billion parameters per token and supports a 200,000-token context window, making it suited for tasks that require processing large volumes of text in a single pass. The model was pre-trained on 28.5 trillion tokens and incorporates DeepSeek Sparse Attention to reduce inference costs while maintaining long-context performance. GLM-5 is designed primarily for agentic workflows, autonomous software engineering, tool use, and long-horizon planning tasks. A notable aspect of its development is that it was trained entirely on Huawei Ascend chips using the MindSpore framework, with no dependency on NVIDIA hardware. It also introduces an asynchronous reinforcement learning training system called slime, which improves training throughput and enables more fine-grained post-training alignment. The model is freely available for both research and commercial use under its MIT license.

Text Tools Structured Output
Context: 202.8K Output: 16,384 tokens
Input: $0.80 Output: $1.92
View model →
Text

GLM 4.7

Dec 22, 2025

GLM-4.7 is a 358-billion-parameter large language model developed by Z.ai (formerly Zhipu AI/THUDM) and released in December 2025. It is designed specifically for agentic workflows, multi-step coding tasks, terminal automation, and complex mathematical and scientific reasoning. The model is available under an MIT license, making it usable for both commercial and non-commercial applications. It supports a 131,072-token context window, allowing it to handle long documents and extended coding sessions. What distinguishes GLM-4.7 from earlier GLM releases is a set of three reasoning mechanisms: Interleaved Thinking, which applies reasoning before every response and tool call; Preserved Thinking, which retains reasoning context across conversation turns to maintain consistency; and Turn-level Thinking, which lets developers toggle reasoning depth on or off per turn. On benchmarks, the model scores 73.8% on SWE-bench Verified, 95.7% on AIME 2025, and 87.4% on τ²-Bench. It is best suited for developers and researchers building agent pipelines, automated coding tools, or applications requiring reliable multi-step planning.

Text Tools Structured Output
Context: 131,072 Output: 16,384 tokens
Input: $0.40 Output: $1.75
View model →
Text

GLM 4.6V

Dec 08, 2025

GLM-4.6V is a large-scale multimodal foundation model developed by Z.ai, available in two variants: the full 106B parameter version designed for cloud and high-performance cluster deployments, and a lightweight 9B Flash version optimized for local and low-latency use. The model supports a 128K token context window, allowing it to process long documents, multi-page files, and complex mixed-media inputs natively without converting content to plain text first. It was trained with a data cutoff of December 2025. What distinguishes GLM-4.6V is its native integration of tool-use capabilities within a visual model — it can accept images, screenshots, and document pages directly as inputs to function calls, connecting visual perception to executable actions in agent workflows. The model also supports interleaved image-text generation, frontend replication from UI screenshots, and joint understanding of text, layout, charts, tables, and figures. It is best suited for enterprise and agent-based applications such as document analysis pipelines, multimodal AI assistants, UI automation, and content generation workflows.

Text Image Video
Context: 131,072 Output: 16,384 tokens
Input: $0.30 Output: $0.90
View model →
Text

GLM 4.6

Sep 30, 2025

GLM-4.6 is a large language model developed by Zhipu AI (Z.ai), built on a Mixture-of-Experts architecture with approximately 357 billion parameters. It supports both English and Chinese, carries a 200,000-token context window, and is released under the MIT license, making it available for commercial and personal use without restrictions. The model was released in late 2025 and represents Zhipu AI's flagship offering in the GLM series. GLM-4.6 is designed for tasks that require extended context handling, multi-step reasoning, and agentic workflows. A notable characteristic is its ability to invoke tools during the reasoning process itself — not only after completing a chain of thought — which enables more dynamic problem-solving in agent-based applications. It is well suited for developers and researchers working on complex coding tasks, long-document analysis, bilingual applications, and automated multi-step pipelines.

Text Tools Structured Output
Context: 200,000 Output: 16,384 tokens
Input: $0.43 Output: $1.74
View model →
A

Amazon

3 models

Text

Amazon Nova Lite

Dec 05, 2024

Amazon Nova Lite is a multimodal foundation model developed by Amazon and made available through Amazon Bedrock. It accepts image, video, and text inputs and is designed to process them at low latency and low cost. The model was released in December 2024 as part of the Amazon Nova family, which includes three understanding models — Nova Micro, Nova Lite, and Nova Pro — and two creative content generation models. Nova Lite occupies the middle tier of the Nova understanding lineup, sitting between the text-only Nova Micro and the more capable Nova Pro. It supports a 300,000-token context window, making it suitable for tasks that involve long documents or extended conversations. The model also supports fine-tuning on Amazon Bedrock, allowing developers to adapt it for specific use cases. It is well-suited for applications that require multimodal input processing at scale where cost efficiency and speed are priorities.

Text Image Tools
Context: 300,000 Output: 5,000 tokens
Input: $0.06 Output: $0.24
View model →
Text

Amazon Nova Micro

Dec 05, 2024

Amazon Nova Micro is a text-only foundation model developed by Amazon and made available through Amazon Bedrock. It is part of the Amazon Nova family, which includes understanding models (Nova Pro, Nova Lite, and Nova Micro) as well as creative content generation models. Nova Micro is specifically designed to deliver the lowest latency responses within the Nova lineup at very low cost, making it a practical choice for applications where speed and cost efficiency are priorities. Because Nova Micro handles text input and output exclusively, it is well suited for tasks such as summarization, classification, question answering, and other text-based workflows where multimodal capabilities are not required. The model supports a 128,000-token context window, allowing it to process long documents or extended conversations in a single request. It can also be fine-tuned on Amazon Bedrock, enabling developers to adapt it to specific domains or use cases.

Text Tools
Context: 128,000 Output: 5,000 tokens
Input: $0.04 Output: $0.14
View model →
Text

Amazon Nova Pro

Dec 05, 2024

Amazon Nova Pro is a multimodal foundation model developed by Amazon and made available through Amazon Bedrock. It accepts text and vision inputs and is designed to handle a wide range of tasks where accuracy, response speed, and cost-efficiency all need to be balanced together. It is part of the Amazon Nova family, which also includes Nova Lite and Nova Micro, each targeting different points on the capability-cost spectrum. Nova Pro was released in December 2024 and supports a 300,000-token context window. Nova Pro is particularly suited for agentic workflows and UI actuation, meaning it can be used to build systems that take sequences of actions or interact with interfaces. It supports fine-tuning on Amazon Bedrock, allowing developers to customize the model for specific domains or cost targets. Within the Nova family, Pro occupies the highest capability tier among the understanding models, making it the appropriate choice when tasks require processing both text and images at scale.

Text Image Tools
Context: 300,000 Output: 5,000 tokens
Input: $0.80 Output: $3.20
View model →
R

Reka

3 models

C

Cohere

2 models

Text

Command R

Aug 30, 2024

Command R is an instruction-following conversational model developed by Cohere, designed for enterprise language tasks with a focus on reliability and scalability. It is available through Amazon Bedrock and carries a knowledge cutoff of March 2024. The model is purpose-built for retrieval-augmented generation (RAG) and tool use, making it well-suited for workflows that require grounding responses in external data sources or integrating with external APIs and functions. One of Command R's defining characteristics is its 128,000-token context window, which allows it to process long documents, extended multi-turn conversations, and complex inputs in a single pass. It also supports multilingual tasks and is tagged for low-latency performance, making it a practical choice for organizations building scalable AI applications where response speed and contextual accuracy matter. It is best suited for enterprise use cases such as document analysis, agentic pipelines, and knowledge-grounded question answering.

Text Tools Structured Output
Context: 128,000 Output: 4,000 tokens
Input: $0.50 Output: $0.60
View model →
Text

Command R+

Aug 30, 2024

Command R+ is a large language model developed by Cohere, positioned as the company's flagship text generation model for enterprise use. It is available through Amazon Bedrock, allowing organizations to deploy it within AWS's managed cloud infrastructure. The model supports a 128,000-token context window and was trained on data up to January 2023. It is designed specifically for demanding enterprise workloads that require high accuracy and reliability. What distinguishes Command R+ is its purpose-built support for retrieval-augmented generation, enabling it to ground responses in external knowledge sources rather than relying solely on parametric memory. It also supports multi-step tool use and agentic workflows, allowing it to interact with APIs, databases, and other external systems. The model handles multiple languages, making it applicable for global deployments. It is best suited for production applications such as intelligent search, document summarization, customer support automation, and complex data analysis pipelines.

Text Tools Structured Output
Context: 128,000 Output: 4,000 tokens
Input: $3.00 Output: $10.00
View model →
N

Nvidia

2 models

Text

Nemotron 3 Super 120B

Mar 11, 2026

Open source

Nemotron 3 Super 120B is an open-weight large language model released by NVIDIA in March 2026. It uses a hybrid LatentMoE architecture that combines Mamba-2, Mixture-of-Experts, and Attention layers, activating only 12 billion of its 120 billion total parameters per token. This design allows the model to handle demanding tasks while using significantly less compute than a dense model of comparable parameter count. The model is built for agentic workflows, long-context reasoning, and high-throughput deployments. It supports a context window of up to 1 million tokens and achieves a RULER-100 retrieval score of 91.75 at that length. Nemotron 3 Super 120B also includes a configurable thinking mode for step-by-step reasoning, supports seven languages including English, French, German, Italian, Japanese, Spanish, and Chinese, and is available as an open-weight model suitable for both cloud API and self-hosted use.

Text Tools Structured Output
Context: 1M Output: 16,384 tokens
Input: $0.10 Output: $0.00
View model →
Text

Nemotron 3 Nano 30B

Dec 14, 2025

Nemotron 3 Nano 30B is an open-weight text generation model released by NVIDIA in December 2025 as part of the Nemotron 3 family. It uses a hybrid architecture combining 23 Mamba-2 layers, 23 Mixture-of-Experts (MoE) layers, and 6 Attention layers, with 30B total parameters but only 3.5B active per token. This design allows the model to handle complex tasks while using significantly less compute than a comparable dense model. It supports six languages: English, German, Spanish, French, Italian, and Japanese. The model supports a context window of up to 1 million tokens, making it well-suited for long-document processing, retrieval-augmented generation (RAG), and agentic workflows. On math benchmarks it scores 89.1% on AIME25 without tools and 99.2% with tools, and it achieves 68.3% on LiveCodeBench and 38.8% on SWE-Bench for coding tasks. Its combination of low active-parameter count and long-context capability makes it a practical choice for high-volume or cost-sensitive deployments, edge agents, and instruction-following applications where compute efficiency matters.

Text Tools Structured Output
Context: 262.1K Output: 16,384 tokens
Input: $0.05 Output: $0.20
View model →