Multi-Step Reasoning
Uses large-scale reinforcement learning to work through complex, multi-step problems, with performance improving the more reasoning time it is given.
OpenAI o3 is the flagship model in OpenAI's o-series of reasoning models, released in April 2025. It is designed to spend more time thinking through problems before responding, using large-scale reinforcement learning to work through complex, multi-step tasks. The model supports a 200,000-token context window and can process both text and images as inputs. According to OpenAI, o3 makes 20% fewer major errors than its predecessor on difficult real-world tasks, with particular strength in programming, business consulting, and creative ideation. A notable feature of o3 is its ability to integrate images directly into its reasoning process — not just interpreting them, but actively using them as part of problem-solving, including handling blurry, reversed, or low-quality visuals. The model can also autonomously combine tools such as web search, Python-based data analysis, and image generation to address multi-faceted questions. It is best suited for users who need rigorous analytical reasoning across domains like biology, mathematics, engineering, and software development, particularly when tasks require combining visual and textual information.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The routed model identifier exposed by upstream providers.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for o3.
OpenAI o3 is the flagship model in OpenAI's o-series of reasoning models, released in April 2025. It is designed to spend more time thinking through problems before responding, using large-scale reinforcement learning to work through complex, multi-step tasks. The model supports a 200,000-token context window and can process both text and images as inputs. According to OpenAI, o3 makes 20% fewer major errors than its predecessor on difficult real-world tasks, with particular strength in programming, business consulting, and creative ideation.
A notable feature of o3 is its ability to integrate images directly into its reasoning process — not just interpreting them, but actively using them as part of problem-solving, including handling blurry, reversed, or low-quality visuals. The model can also autonomously combine tools such as web search, Python-based data analysis, and image generation to address multi-faceted questions. It is best suited for users who need rigorous analytical reasoning across domains like biology, mathematics, engineering, and software development, particularly when tasks require combining visual and textual information.
Uses large-scale reinforcement learning to work through complex, multi-step problems, with performance improving the more reasoning time it is given.
Integrates images directly into its reasoning chain, including the ability to interpret blurry, reversed, or low-quality images and manipulate visuals as part of problem-solving.
Autonomously combines tools such as web search, Python-based data analysis, and image generation to tackle multi-faceted questions, typically completing tasks in under a minute.
Supports a 200,000-token context window, enabling processing of very long documents and complex workflows requiring large amounts of context.
Achieves 98.4% pass@1 on AIME 2025 math competition problems with tool access and scored 87.7% on the GPQA Diamond expert-level science benchmark.
Sets benchmark results on SWE-bench for software engineering and Codeforces for competitive programming, supporting complex code generation and debugging tasks.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Additional usage-cost dimensions synced into the project for this model.
Places where this model is available, based on the synced detail-page metadata.
Endpoint-level provider data currently available for this model.
The configurable options currently documented for this model.
Used to give the model guidance on how many reasoning tokens it should generate before creating a response to the prompt. Low will favor speed and economical token usage, and high will favor more complete reasoning at the cost of more tokens generated and slower responses. The default value is medium, which is a balance between speed and reasoning accuracy.
Parameters currently listed by OpenRouter or the local catalog for this model.
Benchmark scores synced from the current model source and normalized into the local catalog.
| Benchmark | Score |
|---|---|
|
AIME 2024
American math olympiad problems
|
|
|
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
|
|
|
HLE
Questions that challenge frontier models across many domains
|
|
|
LiveCodeBench
Real-world coding tasks from recent competitions
|
|
|
MATH-500
Undergraduate and competition-level math problems
|
|
|
MMLU-Pro
Expert knowledge across 14 academic disciplines
|
|
|
SciCode
Scientific research coding and numerical methods
|
Official model cards, release notes, docs, and other references synced from the source page.
o3 discussions are most active in r/OpenAI, r/singularity, r/ChatGPT. Top Reddit threads cluster around benchmark and model-comparison threads, coding workflow discussions.
The strongest match in this snapshot has 10833 upvotes and 3795 comments.
I know everyone is talking about all the GPT-4o going but I don’t actually understand why they’re getting rid of those but keeping o3. Have OAI actually said why?
It is known that using -O3 globally is a bad idea. Not only dos it lead to longer and more ram intensive compiling, many things just break because some code relies on technically undefined behavior. Trying to compile everything individually as -03 and seeing if it breaks seems like a big hassle. Is there some database of packages that have been tested with -03 optimizations? After a brief search I don't find any. Or do you have some personal scheme to figure out which options work best for each package?
I dont have time to play 8+ hours a day to grind 8/8s
Not getting the chance to practice O3, losing multiple characters trying to practice, its so time consuming and painful. They really need to bring back some practice servers for the new comers to try at 03.
Edit: Gotta love the sweats down voting because 03 is so easy to them since they enjoyed their free 03 practice.
If you have a plus subscription, under settings if you toggle on “show additional models,” it’s there.
The GCC documentation makes it pretty clear that higher -O options may make things harder to debug and also increase compile time. I don't really mind increased compile times, but how practical is the impact on debugging? And are these really the only drawbacks?
[https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Optimize-Options.html](https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Optimize-Options.html)
Seems odd that there are so many options for this feature when the impact seems quite limited, this makes me feel like I don't really have the full picture here.
Edit: thanks for the feedback everyone, I opted to only use -O2, but I curated a couple extra optimization flags just to experiment using y'all's suggestions
o3 supports a 200,000-token context window, which allows it to process very long documents and handle complex workflows that require large amounts of context in a single request.
Based on the available metadata, o3's training date is listed as April 2025. For the most precise knowledge cutoff date, refer to OpenAI's official model release notes.
Yes. o3 can accept images as inputs and incorporates them directly into its reasoning process. It can interpret blurry, reversed, or low-quality images and use visual manipulation as part of solving a problem.
o3 is designed for tasks requiring deep analytical reasoning, including complex coding, mathematics, scientific hypothesis evaluation, and problems that combine visual and textual information. It is particularly noted for performance in programming, business consulting, and creative ideation.
Yes. o3 supports agentic tool use, meaning it can autonomously invoke tools such as web search, Python-based data analysis, and image generation to address multi-step questions, typically within under a minute.
Continue browsing adjacent models from the same provider.