Large Context Window
Processes up to 400,000 tokens in a single context, enabling long documents, extended conversations, or large codebases to be handled in one request.
GPT-5 mini is a text generation model developed by OpenAI, designed as a faster and more cost-efficient variant of GPT-5. It supports a 400,000-token context window and has a training data cutoff of May 2024. The model is tagged as a latest release and supports tool use and MCP (Model Context Protocol) server integrations. GPT-5 mini is best suited for well-defined tasks where precise prompting is used and response speed or cost efficiency is a priority. It accepts structured inputs including tool calls and MCP server configurations, making it a practical choice for agentic workflows and automation pipelines. Developers working on tasks with clear, bounded requirements are the primary intended audience for this model.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The routed model identifier exposed by upstream providers.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for GPT-5 mini.
GPT-5 mini is a text generation model developed by OpenAI, designed as a faster and more cost-efficient variant of GPT-5. It supports a 400,000-token context window and has a training data cutoff of May 2024. The model is tagged as a latest release and supports tool use and MCP (Model Context Protocol) server integrations.
GPT-5 mini is best suited for well-defined tasks where precise prompting is used and response speed or cost efficiency is a priority. It accepts structured inputs including tool calls and MCP server configurations, making it a practical choice for agentic workflows and automation pipelines. Developers working on tasks with clear, bounded requirements are the primary intended audience for this model.
Processes up to 400,000 tokens in a single context, enabling long documents, extended conversations, or large codebases to be handled in one request.
Supports function calling and tool integrations, allowing the model to invoke external tools or APIs as part of a response.
Accepts MCP (Model Context Protocol) server configurations as inputs, enabling standardized integration with external context and data sources.
Generates natural language text across a wide range of formats including summaries, instructions, and structured responses.
Optimized for lower latency compared to full GPT-5, making it suitable for applications where response speed is a priority.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Additional usage-cost dimensions synced into the project for this model.
Places where this model is available, based on the synced detail-page metadata.
Endpoint-level provider data currently available for this model.
The configurable options currently documented for this model.
Used to give the model guidance on how many reasoning tokens it should generate before creating a response to the prompt. Low will favor speed and economical token usage, and high will favor more complete reasoning at the cost of more tokens generated and slower responses. The default value is medium, which is a balance between speed and reasoning accuracy.
Parameters currently listed by OpenRouter or the local catalog for this model.
Benchmark scores synced from the current model source and normalized into the local catalog.
| Benchmark | Score |
|---|---|
|
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
|
|
|
HLE
Questions that challenge frontier models across many domains
|
|
|
LiveCodeBench
Real-world coding tasks from recent competitions
|
|
|
MMLU-Pro
Expert knowledge across 14 academic disciplines
|
|
|
SciCode
Scientific research coding and numerical methods
|
Official model cards, release notes, docs, and other references synced from the source page.
GPT-5 mini discussions are most active in r/GithubCopilot, r/ChatGPT, r/OpenAI. Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions.
The strongest match in this snapshot has 340 upvotes and 29 comments.
I’m on ChatGPT Plus. Since May 8/9, ChatGPT web appears to route me to GPT-5 mini even when I manually select “Latest • 5.5 → Thinking”.
The UI still shows Thinking selected and does not show any usage-limit warning, but the assistant replies:
“I’m currently running on GPT-5 mini, so I cannot use GPT-5.5 Thinking.”
What makes it stranger:
- Web keeps giving me GPT-5 mini.
- Android mobile currently responds as GPT-5.5 Thinking.
- Yesterday mobile worked briefly too, then later started responding as mini.
- I tried Edge, other browsers, incognito, Windows app, Android app, sign out/in, “Log out all”, waiting, and brand-new chats.
- It has persisted across overnight gaps, so a simple 3-hour cap fallback doesn’t seem to explain it.
- As far as I understand, GPT-5.5 Instant should be unlimited or almost unlimited for my subscription tier, so being silently routed to mini even outside Thinking is especially confusing.
- OpenAI support escalated it, but I’m still waiting for a specialist response.
Has anyone else seen web and mobile route to different models despite the same manually selected model?
I'm too lazy to update the skills, so I asked it. It doesn't use the `findskills` skill. It checks out the skill repo and merges it with my repo.
there are only two free options in github copilot cli so i have been using GPT-5 mini for some tasks because i don't want to burn out my PR too quickly and to my surprise it is very capable with reasoning set to "high". since its free option, i always run plan mode first and after the task is done i run review command.
https://preview.redd.it/zb1gzzm9ahlg1.png?width=3000&format=png&auto=webp&s=2fe11dfb13a252dacd0ae8c250f4ec17d1a51d93
Qwen3.5-122B-A10B generally comes out ahead of gpt-5-mini and gpt-oss-120b across most benchmarks.
**vs GPT-5-mini:** Qwen3.5 wins on knowledge (MMLU-Pro 86.7 vs 83.7), STEM reasoning (GPQA Diamond 86.6 vs 82.8), agentic tasks (BFCL-V4 72.2 vs 55.5), and vision tasks (MathVision 86.2 vs 71.9). GPT-5-mini is only competitive in a few coding benchmarks and translation.
**vs GPT-OSS-120B:** Qwen3.5 wins more decisively. GPT-OSS-120B holds its own in competitive coding (LiveCodeBench 82.7 vs 78.9) but falls behind significantly on knowledge, agents, vision, and multilingual tasks.
**TL;DR:** Qwen3.5-122B-A10B is the strongest of the three overall. GPT-5-mini is its closest rival in coding/translation. GPT-OSS-120B trails outside of coding.
Lets see if the quants hold up to the benchmarks
Hi LocalLlama.
Here are the results from the March run of the GACL. A few observations from my side:
* **GPT-5.4** clearly leads among the major models at the moment.
* **Qwen3.5-27B** performed better than every other Qwen model except **397B**, trailing it by only **0.04 points**. In my opinion, it’s an outstanding model.
* **Kimi2.5** is currently the top **open-weight** model, ranking **#6 globally**, while **GLM-5** comes next at **#7 globally**.
* Significant difference between Opus and Sonnet, more than I expected.
* **GPT models dominate the Battleship game.** However, **Tic-Tac-Toe** didn’t work well as a benchmark since nearly all models performed similarly. I’m planning to replace it with another game next month. Suggestions are welcome.
For context, **GACL** is a league where models generate **agent code** to play **seven different games**. Each model produces **two agents**, and each agent competes against every other agent except its paired “friendly” agent from the same model. In other words, the models themselves don’t play the games but they generate the agents that do. Only the top-performing agent from each model is considered when creating the leaderboards.
All **game logs, scoreboards, and generated agent codes** are available on the league page.
[Github Link](https://github.com/summersonnn/Game-Agent-Coding-Benchmark)
[League Link](https://gameagentcodingleague.com/leaderboard.html)
GPT-5 mini supports a context window of 400,000 tokens, allowing large volumes of text, documents, or conversation history to be included in a single request.
GPT-5 mini has a training data cutoff of May 2024, meaning it does not have knowledge of events or information published after that date.
GPT-5 mini is described by OpenAI as a faster and more cost-efficient version of GPT-5, optimized for well-defined tasks and precise prompts rather than open-ended or highly complex reasoning.
Yes. GPT-5 mini supports tool use and MCP (Model Context Protocol) server inputs, making it compatible with agentic workflows and external integrations on platforms like MindStudio.
According to OpenAI's overview, GPT-5 mini is best suited for well-defined tasks where precise prompts are used, such as structured data extraction, classification, summarization, and automation pipelines.
Continue browsing adjacent models from the same provider.