Mistral Large 3 is a 675-billion-parameter mixture-of-experts (MoE) text generation model developed by Mistral. It is the first MoE model Mistral has released since the Mixtral series, and was trained from scratch on 3,000 NVIDIA H200 GPUs. The model is released under a permissive open-weight license, making the weights publicly available for download and self-hosting. Mistral Large 3 supports a 256,000-token context window and includes image understanding alongside text generation. It is particularly noted for multilingual conversation handling, with Mistral highlighting non-English and non-Chinese language performance as a focus area. The model is well-suited for tasks requiring long-context reasoning, multilingual text processing, and instruction following across general-purpose prompts.

Text

Context: 256,000 Output: 16,000 tokens

Input: $0.50 Output: $1.50

View model →

›

Text

Mistral Medium 3

May 07, 2025

Mistral Medium 3 is a text generation model released on May 7, 2025 by Mistral, a French AI company. It is designed to balance performance with cost efficiency, priced at $0.40 per million input tokens and $2.00 per million output tokens. The model supports a 128,000-token context window and was trained on data through early 2025. It is available through Mistral La Plateforme and Amazon SageMaker, with additional platform support planned. Mistral Medium 3 is built with enterprise deployment in mind, supporting self-hosted setups with a minimum of four GPUs as well as any cloud environment. It can be customized through continuous pre-training, fine-tuning, and integration with enterprise knowledge bases, making it applicable to domain-specific workflows in sectors such as financial services, energy, and healthcare. The model is noted for its strengths in coding tasks and multimodal understanding, and is suited for use cases including customer service automation, business process personalization, and complex dataset analysis.

Text Image File

Context: 128,000 Output: 16,000 tokens

Input: $0.40 Output: $2.00

View model →

›

Text

Mistral Nemo

Jul 19, 2024

Mistral NeMo is a text generation model developed by Mistral, a French AI company. It features a 128,000-token context window and is trained with function calling support, making it suitable for agentic and tool-use workflows. The model has particular strength across eleven languages: English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. Mistral NeMo is a 12-billion parameter model built in collaboration with NVIDIA, which is reflected in the "NeMo" name referencing NVIDIA's NeMo framework. It is designed for developers and organizations building multilingual applications where broad language coverage and a large context window are priorities. The model's combination of function calling capability, multilingual training, and long-context handling makes it a practical choice for global deployment scenarios.

Text Tools Structured Output

Context: 128,000 Output: 64,000 tokens

Input: $0.15 Output: $0.04

View model →

›

Text

Mixtral 8x22B Instruct Deprecated

Apr 17, 2024

Mistral's official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include: - strong math, coding,...

Text File Tools

Context: 65.5K Output: 64,000 tokens

Input: $2.00 Output: $6.00

View model →

›

Text

Mistral 7B Instruct

Oct 10, 2023

Mistral 7B Instruct is a 7-billion-parameter language model developed by Mistral AI and released in September 2023. It is the instruction-tuned variant of the base Mistral 7B model, fine-tuned to follow user instructions and produce clear, direct responses. The model uses grouped-query attention (GQA) and sliding window attention (SWA) techniques, which allow it to handle sequences efficiently within its 4,096-token context window. This model is well-suited for instruction-following tasks such as conversational AI, content summarization, and task-oriented dialogue. Because it is optimized to adhere closely to user-provided instructions, it performs consistently in structured workflows where predictable output format matters. It is available through Amazon Bedrock and is also openly accessible on Hugging Face, making it usable in a range of deployment environments.

Text

Context: 4,096 Output: 2,500 tokens

Input: $0.15 Output: N/A

View model →

›

Text

Mistral 7B Instruct Deprecated

Oct 10, 2023

Focused on instruction-based tasks, providing clear, concise responses adhering to user instructions.

Text

Context: N/A Output: 2,500 tokens

Input: N/A Output: N/A

View model →

›

Text

Ministral 3 14B

Release date unavailable

Ministral 3 14B is the largest model in the Ministral 3 family, developed by Mistral AI. It is an open-source text generation model with a 256,000-token context window, designed to handle long-form inputs and extended conversations. The model is released under an open license, making it available for local deployment and self-hosted use cases. The model is optimized for running on diverse hardware configurations, including consumer-grade local setups, which makes it suitable for developers and researchers who prefer on-device inference. Its 14 billion parameter count positions it as the largest variant in the Ministral 3 series. Common use cases include text generation, summarization, instruction following, and tasks that benefit from a large context window without requiring cloud-based infrastructure.

Text

Context: 256,000 Output: 16,000 tokens

Input: $0.20 Output: N/A

View model →

›

Text

Ministral 3 3B

Release date unavailable

Ministral 3 3B is a 3-billion-parameter language model developed by Mistral AI as part of the Ministral 3 family. It is the smallest model in that family and is released as open-weight, meaning the model weights are publicly available for download and local use. The model supports a 256,000-token context window and includes both language and vision capabilities in a compact form factor. Ministral 3 3B is designed specifically for edge deployment, making it suitable for running on local hardware, embedded systems, and resource-constrained environments. Its small parameter count allows it to operate efficiently across a wide range of hardware configurations without requiring cloud infrastructure. It is well-suited for developers building on-device applications, offline workflows, or latency-sensitive pipelines where a smaller footprint is a requirement.

Text

Context: 256,000 Output: 16,000 tokens

Input: $0.10 Output: N/A

View model →

›

Text

Ministral 3 8B

Release date unavailable

Ministral 3 8B is a text generation model developed by Mistral AI, part of the Ministral 3 model family. It is open source and designed with edge deployment in mind, meaning it is optimized to run efficiently across a range of hardware configurations, including local setups without cloud infrastructure. The model supports a 256,000-token context window, enabling it to process and reason over long documents in a single pass. Ministral 3 8B is well-suited for developers and organizations that need a capable language model deployable on-device or in resource-constrained environments. Its 8-billion parameter size makes it practical for local inference while still handling a broad range of text generation tasks. The open-source availability means it can be downloaded, fine-tuned, and self-hosted without requiring API access.

Text

Context: 256,000 Output: 16,000 tokens

Input: $0.15 Output: N/A

View model →

›

Text

Mistral 8x7b Deprecated

Release date unavailable

Mixtral 8x7B is a high-performance mixture-of-experts language model from Mistral AI, offering a 32K token context window with efficient, fast inference.

Text

Context: N/A Output: 8,192 tokens

Input: N/A Output: N/A

View model →

›

Text

Mistral Codestral

Release date unavailable

Mistral Codestral is an open-weight generative AI model built by Mistral and designed specifically for code generation tasks. It operates through a shared instruction and completion API endpoint, allowing developers to both write new code and interact with existing codebases. The model is trained on a dataset spanning more than 80 programming languages, including Python, Java, C, C++, JavaScript, Bash, Swift, and Fortran. Codestral is intended for developers building AI-assisted coding tools and applications, as it handles both code and English fluently. Its broad language coverage makes it applicable across a wide range of development environments and project types. Because it is open-weight, it can be deployed and integrated in ways that closed models typically do not permit.

Text

Context: 32,000 Output: 16,000 tokens

Input: $0.20 Output: N/A

View model →

›

Text

Mistral Large 24.02

Release date unavailable

Mistral Large 24.02 is a text generation model developed by Mistral, built around 123 billion parameters and designed to run on a single node for large-throughput inference. It features a 128,000-token context window, making it suited for long-document processing and extended conversational tasks. The model supports dozens of natural languages, including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean. Beyond natural language, Mistral Large 24.02 supports over 80 programming languages, including Python, Java, C, C++, JavaScript, and Bash, making it applicable to code generation and analysis tasks. Its single-node inference design means it can deliver high throughput without requiring distributed infrastructure. This combination of broad language coverage, large context capacity, and coding support makes it well-suited for multilingual applications, long-context document workflows, and software development assistance.

Text

Context: 128,000 Output: 16,000 tokens

Input: $4.00 Output: N/A

View model →

›

Text

Mistral Large 24.07

Release date unavailable

Mistral Large 24.07 is a text generation model developed by Mistral, released in July 2024 as the second iteration of their Large series. It features 123 billion parameters and a 128,000-token context window, making it suitable for long-document processing and extended conversational tasks within a single inference node. The model supports dozens of natural languages, including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean. One of the model's defining characteristics is its design for single-node inference, meaning the full 123B parameter model can run at high throughput without requiring multi-node infrastructure. It also supports over 80 coding languages, including Python, Java, C, C++, JavaScript, and Bash, making it applicable to software development workflows. On MindStudio, it is available through Amazon Bedrock under the identifier mistral-large-24.07-bedrock.

Text

Context: 128,000 Output: 16,000 tokens

Input: $2.00 Output: N/A

View model →

›

Text

Mistral Small 24.02

Release date unavailable

Mistral Small 24.02 is a text generation model developed by Mistral, designed to run on a single node while supporting a 128,000-token context window. It covers dozens of natural languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean, as well as over 80 coding languages such as Python, Java, C, C++, JavaScript, and Bash. The model has 123 billion parameters, which enables high-throughput inference without requiring multi-node infrastructure. This model is well-suited for long-context applications where fitting large documents or extended conversations into a single prompt is necessary. Its broad language coverage makes it applicable to multilingual workflows, while its coding language support makes it useful for code generation and analysis tasks. The single-node inference design is a practical consideration for teams managing deployment costs and infrastructure complexity.

Text

Context: 128,000 Output: 16,000 tokens

Input: $1.00 Output: N/A

View model →

›

Text

Mistral Small 3.1 (25.03)

Release date unavailable

Mistral Small 3.1 (25.03) is a text generation model developed by Mistral, released in March 2025. It features a 128,000-token context window, multimodal understanding, and support for dozens of spoken languages alongside more than 80 coding languages. The model is designed to run on a single node, making it practical for deployment without distributed infrastructure. This version introduces improved text performance and expanded context handling compared to earlier Mistral Small releases. At an inference speed of approximately 150 tokens per second, it is suited for tasks that require both throughput and long-context processing, such as document analysis, multilingual applications, and code generation. Its combination of broad language coverage and single-node efficiency makes it a practical choice for developers building production applications with constrained compute budgets.

Text

Context: 128,000 Output: 16,000 tokens

Input: $0.10 Output: N/A

View model →

›

Text

Mixtral 8x7B Instruct

Release date unavailable

Mixtral 8x7B Instruct is a sparse mixture-of-experts (SMoE) language model developed by Mistral AI and released under the Apache 2.0 license. It uses a routing mechanism that activates only a subset of its expert networks per token, allowing it to draw on a large total parameter count while keeping active computation lower than a dense model of equivalent size. The instruct variant has been fine-tuned to follow instructions and engage in conversational tasks. The model has a context window of 4,096 tokens and was trained on data through September 2023. Its open-weight, permissive license makes it suitable for commercial and research use cases where model access and reproducibility matter. It is well-suited for tasks such as text generation, summarization, question answering, and general instruction following.

Text

Context: 4,096 Output: 2,500 tokens

Input: $0.45 Output: N/A

View model →

›

Text

Mixtral 8x7B Instruct Deprecated

Release date unavailable

High-quality, efficient sparse model outperforming larger models in speed and benchmarks.

Release date unavailable

Grok 3 Mini Beta is a compact text generation model developed by xAI, the AI division of X. It is designed as a thinking model, meaning it reasons through problems step by step before producing a final answer, and it exposes that reasoning trace so users can follow the model's logic in full. The model supports adjustable reasoning effort, defaulting to a lower setting for speed but allowing a high-effort mode for more demanding problems. It has a 131,072-token context window and was trained with data up to April 2025. Grok 3 Mini is best suited for tasks that rely heavily on structured reasoning rather than broad world knowledge — including math problems, logic puzzles, coding challenges, and quantitative analysis. According to xAI's published benchmarks, it scores 95.8% on AIME 2024 and 80.4% on LiveCodeBench. It also supports function calling and web search, making it usable in agentic workflows. Epoch AI has noted that with high reasoning effort, Grok 3 Mini outperforms the larger Grok 3 model on math benchmarks.

Release date unavailable

Jan 27, 2026

Kimi K2.5 is an open-source multimodal model developed by Moonshot AI and released in January 2026. It uses a Mixture-of-Experts architecture with 1 trillion total parameters and approximately 32 billion active at inference time, trained on roughly 15 trillion mixed visual and text tokens. Unlike models that add vision as a secondary capability, Kimi K2.5 was trained natively on both image and text data, enabling integrated understanding of charts, documents, video, and code. The model supports two operating modes — Instant Mode for direct responses and Thinking Mode for step-by-step reasoning on complex problems — within a 256,000-token context window. It introduces an Agent Swarm paradigm that can coordinate up to 100 parallel sub-agents, reducing execution time by 4.5x on parallelizable tasks. Kimi K2.5 is released under a modified MIT license, making it available for local deployment, fine-tuning, and commercial use, and is particularly suited for visual programming, document analysis, automated research, and multi-step agentic workflows.

Text Image Tools

Context: 262,144 Output: 16,384 tokens

Input: $0.45 Output: $1.90

View model →

›

Text

DeepSeek V3.2

Dec 01, 2025

DeepSeek-V3.2 is an open-weight large language model developed by DeepSeek and released on December 1, 2025. It uses a Mixture-of-Experts architecture combined with a novel sparse attention mechanism called DeepSeek Sparse Attention (DSA), which reduces computational complexity to near-linear scale (O(kL)) for long-context tasks. The model supports a 160,000-token context window and is available under the MIT License on Hugging Face. DeepSeek-V3.2 introduces three notable technical advances: a scalable reinforcement learning training framework, a large-scale agentic task synthesis pipeline covering over 1,800 environments and 85,000+ complex instructions, and native support for Thinking in Tool-Use — the ability to reason while invoking external tools in both thinking and non-thinking modes. It is best suited for complex multi-step reasoning, agentic workflows involving search and code execution, long-context document processing, and developers building AI applications that require integrated reasoning and tool use.

Text Tools Structured Output

Context: 160,000 Output: 8,000 tokens

Input: $0.26 Output: $0.38

View model →

›

Text

DeepSeek V3.1

Aug 21, 2025

DeepSeek-V3.1 is a 671-billion parameter large language model developed by DeepSeek, using a Mixture-of-Experts (MoE) architecture that activates 37 billion parameters at any given time. It supports a 128,000-token context window and was trained through August 2025, with an enhanced base model built using a two-phase long-context extension process that included 630 billion tokens at the 32K phase and 209 billion tokens at the 128K phase. The model accepts text input and produces text output across a wide range of general-purpose tasks. What distinguishes DeepSeek-V3.1 from earlier versions is its hybrid thinking design: a single model that can operate in a fast conversational mode or a slower step-by-step reasoning mode, selectable through prompting rather than requiring a separate model. Post-training improvements have also focused on tool use and agentic workflows, including multi-step API calls, web search, and code execution. This makes it well-suited for coding, mathematical reasoning, long-document analysis, and complex multi-turn agent tasks.

Text Tools Structured Output

Context: 128,000 Output: 8,000 tokens

Input: $0.27 Output: $0.79

View model →

›

Text

DeepSeek-R1

Jan 22, 2025

DeepSeek-R1 is a text generation model developed by DeepSeek, a Chinese AI company. It is a reasoning-focused model that generates a Chain of Thought (CoT) before producing a final answer, a technique designed to improve accuracy on multi-step problems. The model was trained through late 2024 and supports a context window of 64,000 tokens. DeepSeek released the model weights publicly, making it available for local deployment and research use. DeepSeek-R1 is well suited for tasks that benefit from structured reasoning, such as mathematics, logic puzzles, coding challenges, and scientific problem-solving. Because the model externalizes its reasoning steps before answering, users can inspect the thought process that led to a given response. DeepSeek also released a series of distilled versions of R1 based on smaller base models, broadening its accessibility across different hardware configurations.

Text

Context: 64,000 Output: 8,000 tokens

Input: $0.55 Output: N/A

View model →

›

Text

DeepSeek-V3

Dec 26, 2024

DeepSeek-V3 is a large language model developed by DeepSeek, a Chinese AI company. It is a general-purpose text generation model designed to handle a wide range of tasks including coding, reasoning, summarization, and open-ended conversation. The model supports a 128,000-token context window and was trained on data through late 2024. It is identified by the model ID deepseek-chat and is available via API. DeepSeek-V3 uses a Mixture-of-Experts (MoE) architecture with 671 billion total parameters, activating 37 billion per forward pass, which allows it to maintain efficiency at scale. The model was trained using an optimized pipeline that includes multi-token prediction and FP8 mixed-precision training. It is well-suited for tasks that require long-context understanding, instruction following, and multi-step reasoning across technical and general domains.

Text Tools Structured Output

Context: 128,000 Output: 8,000 tokens

Input: $0.27 Output: $0.89

View model →

›

Text

DeepSeek R1 Turbo

Release date unavailable

DeepSeek R1 Turbo is a text generation model developed by DeepSeek, designed as an accelerated variant of the R1 reasoning model family. It retains the chain-of-thought reasoning capabilities of the base R1 model while incorporating architectural and inference optimizations aimed at reducing latency. The model supports a 128,000-token context window and was trained on data through late 2024. It accepts text input and produces text output across a wide range of analytical and generative tasks. DeepSeek R1 Turbo is particularly well-suited for applications where multi-step reasoning is required but response time is a practical constraint. Common use cases include coding assistance, mathematical problem-solving, logical deduction, and structured analytical workflows. Developers building interactive tools or real-time applications that depend on reasoning-intensive outputs are the primary intended audience for this model.

Text

Context: 128,000 Output: 8,000 tokens

Input: $1.00 Output: N/A

View model →

›

Text

DeepSeek-V3 Deprecated

Release date unavailable

General-purpose LLM from Chinese AI company DeepSeek.

Release date unavailable

Perplexity's latest model family surpassing earlier versions in cost-efficiency, speed, and performance.

Text

Context: N/A Output: 28,000 tokens

Input: N/A Output: N/A

View model →

Qwen

2 models

›

Text

Qwen3.6-35B-A3B

Apr 27, 2026

Qwen3.6-35B-A3B is an open-weight multimodal model from Alibaba Cloud with 35 billion total parameters and 3 billion active parameters per token. It uses a hybrid sparse mixture-of-experts architecture combining Gated...

Text Image Video

Context: 262.1K Output: 262,144 tokens

Input: $0.20 Output: $1.00

View model →

›

Text

Qwen3 235B

Apr 28, 2025

Qwen3 235B is an instruction-tuned large language model developed by Alibaba's Qwen team, built on a Mixture-of-Experts (MoE) architecture with 235 billion total parameters. During inference, only 22 billion parameters are activated at a time, which reduces computational cost relative to the model's full parameter count. The model supports a native context window of 262,144 tokens and is released under the Apache 2.0 license, permitting commercial use. This release, versioned as Qwen3-235B-A22B-Instruct-2507, is the non-thinking instruct variant, meaning it produces direct responses without exposing an internal chain-of-thought. It is designed for instruction following, agentic workflows, tool use, multilingual tasks, complex question answering, and coding. The model scores 51.8% on LiveCodeBench v6, 70.3% on AIME25, and 77.5% on GPQA, reflecting its range across coding, mathematical reasoning, and knowledge-intensive tasks.

Text Tools Structured Output

Context: 262,144 Output: 262,144 tokens

Input: $0.15 Output: $1.82

View model →

Z.ai

5 models

›

Text

GLM 5.1

Apr 07, 2026

Open source

GLM-5.1 delivers a major leap in coding capability, with particularly significant gains in handling long-horizon tasks. Unlike previous models built around minute-level interactions, GLM-5.1 can work independently and continuously on...

Text Tools Structured Output

Context: 202.8K Output: 16,384 tokens

Input: $1.40 Output: $3.08

View model →

›

Text

GLM 5

Feb 11, 2026

GLM-5 is a 744-billion-parameter Mixture-of-Experts language model developed by Z.ai (formerly Zhipu AI), released in February 2026 under the MIT license. It activates 40 billion parameters per token and supports a 200,000-token context window, making it suited for tasks that require processing large volumes of text in a single pass. The model was pre-trained on 28.5 trillion tokens and incorporates DeepSeek Sparse Attention to reduce inference costs while maintaining long-context performance. GLM-5 is designed primarily for agentic workflows, autonomous software engineering, tool use, and long-horizon planning tasks. A notable aspect of its development is that it was trained entirely on Huawei Ascend chips using the MindSpore framework, with no dependency on NVIDIA hardware. It also introduces an asynchronous reinforcement learning training system called slime, which improves training throughput and enables more fine-grained post-training alignment. The model is freely available for both research and commercial use under its MIT license.

Text Tools Structured Output

Context: 202.8K Output: 16,384 tokens

Input: $0.80 Output: $1.92

View model →

›

Text

GLM 4.7

Dec 22, 2025

GLM-4.7 is a 358-billion-parameter large language model developed by Z.ai (formerly Zhipu AI/THUDM) and released in December 2025. It is designed specifically for agentic workflows, multi-step coding tasks, terminal automation, and complex mathematical and scientific reasoning. The model is available under an MIT license, making it usable for both commercial and non-commercial applications. It supports a 131,072-token context window, allowing it to handle long documents and extended coding sessions. What distinguishes GLM-4.7 from earlier GLM releases is a set of three reasoning mechanisms: Interleaved Thinking, which applies reasoning before every response and tool call; Preserved Thinking, which retains reasoning context across conversation turns to maintain consistency; and Turn-level Thinking, which lets developers toggle reasoning depth on or off per turn. On benchmarks, the model scores 73.8% on SWE-bench Verified, 95.7% on AIME 2025, and 87.4% on τ²-Bench. It is best suited for developers and researchers building agent pipelines, automated coding tools, or applications requiring reliable multi-step planning.

Text Tools Structured Output

Context: 131,072 Output: 16,384 tokens

Input: $0.40 Output: $1.75

View model →

›

Text

GLM 4.6V

Dec 08, 2025

GLM-4.6V is a large-scale multimodal foundation model developed by Z.ai, available in two variants: the full 106B parameter version designed for cloud and high-performance cluster deployments, and a lightweight 9B Flash version optimized for local and low-latency use. The model supports a 128K token context window, allowing it to process long documents, multi-page files, and complex mixed-media inputs natively without converting content to plain text first. It was trained with a data cutoff of December 2025. What distinguishes GLM-4.6V is its native integration of tool-use capabilities within a visual model — it can accept images, screenshots, and document pages directly as inputs to function calls, connecting visual perception to executable actions in agent workflows. The model also supports interleaved image-text generation, frontend replication from UI screenshots, and joint understanding of text, layout, charts, tables, and figures. It is best suited for enterprise and agent-based applications such as document analysis pipelines, multimodal AI assistants, UI automation, and content generation workflows.

Text Image Video

Context: 131,072 Output: 16,384 tokens

Input: $0.30 Output: $0.90

View model →

›

Text

GLM 4.6

Sep 30, 2025

GLM-4.6 is a large language model developed by Zhipu AI (Z.ai), built on a Mixture-of-Experts architecture with approximately 357 billion parameters. It supports both English and Chinese, carries a 200,000-token context window, and is released under the MIT license, making it available for commercial and personal use without restrictions. The model was released in late 2025 and represents Zhipu AI's flagship offering in the GLM series. GLM-4.6 is designed for tasks that require extended context handling, multi-step reasoning, and agentic workflows. A notable characteristic is its ability to invoke tools during the reasoning process itself — not only after completing a chain of thought — which enables more dynamic problem-solving in agent-based applications. It is well suited for developers and researchers working on complex coding tasks, long-document analysis, bilingual applications, and automated multi-step pipelines.

Dec 05, 2024

Amazon Nova Pro is a multimodal foundation model developed by Amazon and made available through Amazon Bedrock. It accepts text and vision inputs and is designed to handle a wide range of tasks where accuracy, response speed, and cost-efficiency all need to be balanced together. It is part of the Amazon Nova family, which also includes Nova Lite and Nova Micro, each targeting different points on the capability-cost spectrum. Nova Pro was released in December 2024 and supports a 300,000-token context window. Nova Pro is particularly suited for agentic workflows and UI actuation, meaning it can be used to build systems that take sequences of actions or interact with interfaces. It supports fine-tuning on Amazon Bedrock, allowing developers to customize the model for specific domains or cost targets. Within the Nova family, Pro occupies the highest capability tier among the understanding models, making it the appropriate choice when tasks require processing both text and images at scale.

Release date unavailable

Fast and capable 21B model outperforming larger models while delivering outsized value.

Text

Context: N/A Output: 128,000 tokens

Input: N/A Output: N/A

View model →

Cohere

2 models

›

Text

Command R

Aug 30, 2024

Command R is an instruction-following conversational model developed by Cohere, designed for enterprise language tasks with a focus on reliability and scalability. It is available through Amazon Bedrock and carries a knowledge cutoff of March 2024. The model is purpose-built for retrieval-augmented generation (RAG) and tool use, making it well-suited for workflows that require grounding responses in external data sources or integrating with external APIs and functions. One of Command R's defining characteristics is its 128,000-token context window, which allows it to process long documents, extended multi-turn conversations, and complex inputs in a single pass. It also supports multilingual tasks and is tagged for low-latency performance, making it a practical choice for organizations building scalable AI applications where response speed and contextual accuracy matter. It is best suited for enterprise use cases such as document analysis, agentic pipelines, and knowledge-grounded question answering.

Text Tools Structured Output

Context: 128,000 Output: 4,000 tokens

Input: $0.50 Output: $0.60

View model →

›

Text

Command R+

Aug 30, 2024

Command R+ is a large language model developed by Cohere, positioned as the company's flagship text generation model for enterprise use. It is available through Amazon Bedrock, allowing organizations to deploy it within AWS's managed cloud infrastructure. The model supports a 128,000-token context window and was trained on data up to January 2023. It is designed specifically for demanding enterprise workloads that require high accuracy and reliability. What distinguishes Command R+ is its purpose-built support for retrieval-augmented generation, enabling it to ground responses in external knowledge sources rather than relying solely on parametric memory. It also supports multi-step tool use and agentic workflows, allowing it to interact with APIs, databases, and other external systems. The model handles multiple languages, making it applicable for global deployments. It is best suited for production applications such as intelligent search, document summarization, customer support automation, and complex data analysis pipelines.

Text Tools Structured Output

Context: 128,000 Output: 4,000 tokens

Input: $3.00 Output: $10.00

View model →

Nvidia

2 models

›

Text

Nemotron 3 Super 120B

Mar 11, 2026

Open source

Nemotron 3 Super 120B is an open-weight large language model released by NVIDIA in March 2026. It uses a hybrid LatentMoE architecture that combines Mamba-2, Mixture-of-Experts, and Attention layers, activating only 12 billion of its 120 billion total parameters per token. This design allows the model to handle demanding tasks while using significantly less compute than a dense model of comparable parameter count. The model is built for agentic workflows, long-context reasoning, and high-throughput deployments. It supports a context window of up to 1 million tokens and achieves a RULER-100 retrieval score of 91.75 at that length. Nemotron 3 Super 120B also includes a configurable thinking mode for step-by-step reasoning, supports seven languages including English, French, German, Italian, Japanese, Spanish, and Chinese, and is available as an open-weight model suitable for both cloud API and self-hosted use.

Text Tools Structured Output

Context: 1M Output: 16,384 tokens

Input: $0.10 Output: $0.00

View model →

›

Text

Nemotron 3 Nano 30B

Dec 14, 2025

Nemotron 3 Nano 30B is an open-weight text generation model released by NVIDIA in December 2025 as part of the Nemotron 3 family. It uses a hybrid architecture combining 23 Mamba-2 layers, 23 Mixture-of-Experts (MoE) layers, and 6 Attention layers, with 30B total parameters but only 3.5B active per token. This design allows the model to handle complex tasks while using significantly less compute than a comparable dense model. It supports six languages: English, German, Spanish, French, Italian, and Japanese. The model supports a context window of up to 1 million tokens, making it well-suited for long-document processing, retrieval-augmented generation (RAG), and agentic workflows. On math benchmarks it scores 89.1% on AIME25 without tools and 99.2% with tools, and it achieves 68.3% on LiveCodeBench and 38.8% on SWE-Bench for coding tasks. Its combination of low active-parameter count and long-context capability makes it a practical choice for high-volume or cost-sensitive deployments, edge agents, and instruction-following applications where compute efficiency matters.

Text Tools Structured Output

Context: 262.1K Output: 16,384 tokens

Input: $0.05 Output: $0.20

View model →