OpenAI

GPT-5 mini

GPT-5 mini is a text generation model developed by OpenAI, designed as a faster and more cost-efficient variant of GPT-5. It supports a 400,000-token context window and has a training data cutoff of May 2024. The model is tagged as a latest release and supports tool use and MCP (Model Context Protocol) server integrations. GPT-5 mini is best suited for well-defined tasks where precise prompting is used and response speed or cost efficiency is a priority. It accepts structured inputs including tool calls and MCP server configurations, making it a practical choice for agentic workflows and automation pipelines. Developers working on tasks with clear, bounded requirements are the primary intended audience for this model.

Aug 07, 2025 400,000 context 128,000 tokens output

Large Context Window Tool Use MCP Server Support Text Generation Fast Inference

Overview ↓ About ↓ Capabilities ↓ Pricing ↓ Price Comparison ↓ Providers ↓ Parameters ↓ Benchmarks ↓ Compare ↓ Tools ↓ Daily ↓ Resources ↓ Community ↓ FAQ ↓

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

OpenAI

Model ID

The routed model identifier exposed by upstream providers.

openai/gpt-5-mini

Input Context Window

The number of tokens supported by the input context window.

400,000 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

128,000 tokens tokens

Open Source

Whether the model's code is available for public use.

Release Date

When the model was first released.

Aug 07, 2025 11 months ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

2024-05-31

API Providers

The providers that offer this model. This is not an exhaustive list.

OpenAI, Azure

Modalities

Types of data this model can process.

Text Image File

What is GPT-5 mini

A fuller summary of positioning, capabilities, and source-specific details for GPT-5 mini.

GPT-5 mini is a text generation model developed by OpenAI, designed as a faster and more cost-efficient variant of GPT-5. It supports a 400,000-token context window and has a training data cutoff of May 2024. The model is tagged as a latest release and supports tool use and MCP (Model Context Protocol) server integrations.

GPT-5 mini is best suited for well-defined tasks where precise prompting is used and response speed or cost efficiency is a priority. It accepts structured inputs including tool calls and MCP server configurations, making it a practical choice for agentic workflows and automation pipelines. Developers working on tasks with clear, bounded requirements are the primary intended audience for this model.

Capabilities

What GPT-5 mini supports

CTX

Large Context Window

Processes up to 400,000 tokens in a single context, enabling long documents, extended conversations, or large codebases to be handled in one request.

Tool Use

Supports function calling and tool integrations, allowing the model to invoke external tools or APIs as part of a response.

MCP

MCP Server Support

Accepts MCP (Model Context Protocol) server configurations as inputs, enabling standardized integration with external context and data sources.

Text Generation

Generates natural language text across a wide range of formats including summaries, instructions, and structured responses.

Fast Inference

Optimized for lower latency compared to full GPT-5, making it suitable for applications where response speed is a priority.

Pricing for GPT-5 mini

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Input tokens $0.25 Per million tokens

Output tokens $2.00 Per million tokens

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

Web search $10000.00

Cache read $0.02

maxTemperature 1

maxResponseSize 128,000 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

OpenAI Azure

Provider Endpoints

Endpoint-level provider data currently available for this model.

OpenAI

Max prompt: 272,000 Max output: 128,000 1d uptime: 95.1% Supported params: 8 Implicit caching: Yes

Azure

Supported params: 8 Implicit caching: No

Azure

1d uptime: 96.4% Supported params: 8 Implicit caching: No

Configuration & Parameters

The configurable options currently documented for this model.

Reasoning Effort

Select

Used to give the model guidance on how many reasoning tokens it should generate before creating a response to the prompt. Low will favor speed and economical token usage, and high will favor more complete reasoning at the cost of more tokens generated and slower responses. The default value is medium, which is a balance between speed and reasoning accuracy.

Default: medium

Low Medium High

Supported Request Parameters

Parameters currently listed by OpenRouter or the local catalog for this model.

Reasoning Effort

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark	Score
GPQA Diamond PhD-level science questions (biology, physics, chemistry)	82.8%
HLE Questions that challenge frontier models across many domains	19.7%
LiveCodeBench Real-world coding tasks from recent competitions	83.8%
MMLU-Pro Expert knowledge across 14 academic disciplines	83.7%
SciCode Scientific research coding and numerical methods	39.2%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

OpenAI GPT-5 Announcements

→

Documentation Documentation

→

OpenAI Platform Playground Playground

→

OpenAI API Reference Documentation

→

Official Website

→

Usage Policies

→

Enterprise privacy at OpenAI

→

OpenAI Status Page

→

OpenRouter Model Page OpenRouter

→