Agentic Workflows
Executes multi-step tasks autonomously using built-in computer use capabilities, including tool orchestration, file access, and data extraction with minimal human oversight.
GPT-5.4 is a text generation model developed by OpenAI, released in March 2026 as their flagship model for professional and enterprise use. It is available in three variants — standard, Thinking, and Pro — and features a context window of 1 million tokens, the largest OpenAI has offered. The model is designed not only to plan complex tasks but to complete them reliably, with built-in computer use capabilities for orchestrating multi-step agentic workflows. GPT-5.4 is best suited for enterprise teams running AI in production environments, including customer support automation, document drafting, data analysis, and developer workflows. It recorded an 83% score on GDPval for knowledge work tasks and ranked second out of 116 models on the Artificial Analysis Intelligence Index. The Pro variant adds multi-path reasoning evaluation for scenarios where analytical depth is prioritized over speed, such as scientific research and complex decision-making.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The routed model identifier exposed by upstream providers.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for GPT 5.4.
GPT-5.4 is a text generation model developed by OpenAI, released in March 2026 as their flagship model for professional and enterprise use. It is available in three variants — standard, Thinking, and Pro — and features a context window of 1 million tokens, the largest OpenAI has offered. The model is designed not only to plan complex tasks but to complete them reliably, with built-in computer use capabilities for orchestrating multi-step agentic workflows.
GPT-5.4 is best suited for enterprise teams running AI in production environments, including customer support automation, document drafting, data analysis, and developer workflows. It recorded an 83% score on GDPval for knowledge work tasks and ranked second out of 116 models on the Artificial Analysis Intelligence Index. The Pro variant adds multi-path reasoning evaluation for scenarios where analytical depth is prioritized over speed, such as scientific research and complex decision-making.
Executes multi-step tasks autonomously using built-in computer use capabilities, including tool orchestration, file access, and data extraction with minimal human oversight.
Supports a context window of up to 1 million tokens, enabling processing of extensive documents, large codebases, and long multi-turn sessions in a single request.
The Thinking variant applies enhanced logical follow-through across long, complex interactions, maintaining consistency over extended reasoning chains.
Produces structured professional outputs including documents, spreadsheets, slide decks, financial models, and legal analyses in a single session.
Delivers 33% fewer factual errors in individual claims compared to GPT-5.2, according to OpenAI's internal benchmarks.
Solves problems using fewer tokens than its predecessor, reducing latency and cost for high-volume production workloads.
Generates, reviews, and debugs code across common programming languages, with support for developer workflows within the full 1M token context.
The Pro variant uses multi-path reasoning evaluation to provide greater analytical depth for research, legal analysis, and complex decision-making tasks.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Additional usage-cost dimensions synced into the project for this model.
Places where this model is available, based on the synced detail-page metadata.
Endpoint-level provider data currently available for this model.
Benchmark scores synced from the current model source and normalized into the local catalog.
| Benchmark | Score |
|---|---|
|
ARC-AGI-2
Novel abstract reasoning and pattern recognition
|
|
|
BrowseComp
Complex web browsing and information retrieval
|
|
|
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
|
|
|
HLE
Questions that challenge frontier models across many domains
|
|
|
OSWorld-Verified
Autonomous computer use and desktop tasks
|
|
|
SciCode
Scientific research coding and numerical methods
|
|
|
SWE-bench Pro
Challenging real-world software engineering tasks
|
|
|
Terminal-Bench 2.0
Agentic coding and terminal command tasks
|
Official model cards, release notes, docs, and other references synced from the source page.
Jump straight into the most relevant side-by-side comparison pages for this model.
Compare GPT 5.4 and GPT 5.4 Pro across pricing, context window, capabilities, benchmarks, and API access to choose the better fit for long-context workloads versus long-context workloads.
Compare GPT 5.5 and GPT 5.4 across pricing, context window, capabilities, benchmarks, and API access to choose the better fit for long-context workloads versus long-context workloads.
GPT 5.4 discussions are most active in r/singularity, r/codex, r/OpenAI. Top Reddit threads cluster around benchmark and model-comparison threads, coding workflow discussions.
The strongest match in this snapshot has 13105 upvotes and 912 comments.
I've been using GPT 5.4 high (extra high on a few occasions) for planning and reviewing code. (I use GPT 5.4-mini for implementing the plans from 5.4). It's been great. Last week, I tried to resolve an issue with a home screen widget not displaying correctly on IOS. I tried twice with GPT 5.4 high. It couldn't fix the issue. I decided to give GPT 5.5 a try for the first time. It resolve the issue in one shot, it was pretty incredible.
However, in the past couple of days, I've noticed GPT 5.4 makes silly mistakes for example, it doesn't include tests for critical functions, for unit tests it doesn't mock correctly, some of the changes it proposes leads to build failures, etc. It didn't make mistakes like this before. This has caused me to start using 5.5 more often than I would like because of how expensive it is.
Am I the only one experiencing this?
I'm on the $20 plan and was really struggling with the regular GPT-5.4 model. I exhausted my 5h limit within 1h and my weekly limit within 2-3 days.
But with mini I have yet to hit my 5h limit before it runs out! I'm currently mostly adding new features and debugging and not creating a code base from scratch. But even then it might be good enough if you work in small increments.
I’ve been using GPT-5.4 Nano and I’m honestly blown away by how well it performs for being a smaller model. The speed feels great, and the output quality has been consistently strong for tasks I normally use larger models for.
What I’m curious about:
* What kinds of prompts/workflows are you getting the best results with?
* How does it compare to models you were using before (quality, latency, reliability)?
* Any “best practices” you’ve found, prompt style, system instructions, or tool usage, that really improve results?
Would love to hear your experience and any tips.
The price-to-performance ratio is actually insane. It’s a total powerhouse for next to nothing, yet everyone is still busy glazing Claude??
Make it make sense.
I use gpt 5.3 codex for the research/plan phase and use 5.4 mini to execute. it will use like .5% max even for huge refactors/changes
in terms of planning it is kinda dumb even on high reasoning so use a different model for it. but with a detailed plan, it is REALLY good for execution. quite fast as well
GPT-5.4 supports a context window of up to 1 million tokens, which allows it to process large documents, codebases, and extended multi-step workflows within a single session.
The standard GPT-5.4 is designed for general professional and enterprise use. GPT-5.4 Thinking is optimized for tasks requiring enhanced logical reasoning across long interactions. GPT-5.4 Pro adds multi-path reasoning evaluation and greater analytical depth, making it suited for scientific research and complex decision-making where thoroughness is prioritized over speed.
According to the available metadata, GPT-5.4 has a training date of March 2026. A more specific knowledge cutoff date has not been confirmed in the provided metadata.
GPT-5.4 has been evaluated on OSWorld-Verified and WebArena Verified for computer use tasks, GDPval where it scored 83% for knowledge work, and Mercor's APEX-Agents benchmark for professional skills in law and finance. It ranks second out of 116 models on the Artificial Analysis Intelligence Index.
GPT-5.4 is designed for enterprise production environments and is well-suited for customer support automation, document drafting, data analysis, developer workflows, agentic task execution, and extended reasoning tasks. The Pro variant is additionally suited for scientific research and scenarios requiring deep analytical work.
Continue browsing adjacent models from the same provider.