Mathematical Reasoning
Applies step-by-step internal reasoning to solve math problems, from algebra to competition-level challenges. Performs well on formal mathematical benchmarks.
o3-mini is a text generation model developed by OpenAI and released in January 2025. It belongs to OpenAI's o-series, a family of models trained to reason through problems step by step before producing a response. The model is designed to balance reasoning quality with speed and cost efficiency, making it practical for high-volume deployments where deliberate thinking is needed without long wait times. o3-mini is particularly well-suited for tasks involving mathematical reasoning, programming challenges, and scientific questions. It operates with a 200,000-token context window, allowing it to process long documents, extended codebases, or multi-turn conversations in a single session. The model generates output at approximately 137 tokens per second and uses an internal reasoning process rather than responding immediately, which contributes to its accuracy on structured, logic-intensive tasks.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The routed model identifier exposed by upstream providers.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for o3-mini.
o3-mini is a text generation model developed by OpenAI and released in January 2025. It belongs to OpenAI's o-series, a family of models trained to reason through problems step by step before producing a response. The model is designed to balance reasoning quality with speed and cost efficiency, making it practical for high-volume deployments where deliberate thinking is needed without long wait times.
o3-mini is particularly well-suited for tasks involving mathematical reasoning, programming challenges, and scientific questions. It operates with a 200,000-token context window, allowing it to process long documents, extended codebases, or multi-turn conversations in a single session. The model generates output at approximately 137 tokens per second and uses an internal reasoning process rather than responding immediately, which contributes to its accuracy on structured, logic-intensive tasks.
Applies step-by-step internal reasoning to solve math problems, from algebra to competition-level challenges. Performs well on formal mathematical benchmarks.
Generates, debugs, and explains code across common programming languages. Well-suited for technical problem-solving tasks that require logical precision.
Handles graduate-level scientific questions, including those tested by benchmarks like GPQA Diamond covering biology, chemistry, and physics.
Supports a 200,000-token context window, equivalent to roughly 300 pages of text, enabling processing of long documents or extended conversations.
Produces output at approximately 137 tokens per second, enabling responsive interactions even on queries that require internal reasoning steps.
Uses an internal reasoning process before generating a final answer, improving accuracy on structured and multi-step problems.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Additional usage-cost dimensions synced into the project for this model.
Places where this model is available, based on the synced detail-page metadata.
Endpoint-level provider data currently available for this model.
The configurable options currently documented for this model.
Used to give the model guidance on how many reasoning tokens it should generate before creating a response to the prompt. Low will favor speed and economical token usage, and high will favor more complete reasoning at the cost of more tokens generated and slower responses. The default value is medium, which is a balance between speed and reasoning accuracy.
Parameters currently listed by OpenRouter or the local catalog for this model.
Benchmark scores synced from the current model source and normalized into the local catalog.
| Benchmark | Score |
|---|---|
|
AIME 2024
American math olympiad problems
|
|
|
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
|
|
|
HLE
Questions that challenge frontier models across many domains
|
|
|
LiveCodeBench
Real-world coding tasks from recent competitions
|
|
|
MATH-500
Undergraduate and competition-level math problems
|
|
|
MMLU-Pro
Expert knowledge across 14 academic disciplines
|
|
|
SciCode
Scientific research coding and numerical methods
|
Official model cards, release notes, docs, and other references synced from the source page.
o3-mini discussions are most active in r/OpenAI, r/singularity, r/LocalLLaMA. Top Reddit threads cluster around benchmark and model-comparison threads, coding workflow discussions.
The strongest match in this snapshot has 2351 upvotes and 225 comments.
Here was my prompt:
"Could you please help me create a tetris game in python but instead of a human playing it, an AI plays it above human level with same rules as a human would."
It created that autonomous tetris game code in easily less than 20 seconds. Probably 10 even.
If you're afraid for your jobs now because of AI, you'll be more afraid within days.
I posted a lot here yesterday to vote for the o3-mini. Thank you all!
**You can copy this strategy for yourself in a single click**
[Pic: The OpenAI o3-mini model backtest from 12/31/2022 to 12/31/2023](https://miro.medium.com/v2/resize:fit:1400/1*3FP_VKbIpyXh7OzvhguEBg.png)
When I first tried the new o3‑mini model, I was beyond impressed. Unlike other reasoning models, like DeepSeek R1 or OpenAI’s o1, o3‑mini was reliable, lightning fast, and most importantly extremely accurate.
And it cost less than GPT‑4o.
So, like with other models, I sought to see how I could showcase it within my algorithmic trading platform, NexusTrade.
And accidentally created a strategy that beat the market. In Every. Single. Metric.
# A Recap: How I created an algorithmic trading strategy using an LLM
For those who are new to my page, you may be wondering how LLMs can create algorithmic trading strategies.
The answer isn’t simple – it’s a complex multi‑step process.
[Pic: The “Create Portfolio” prompt chain](https://miro.medium.com/v2/resize:fit:1174/0*rqK5GB9XMEbaXzo-.png)
This starts with:
1. Creating an outline of the strategy. This includes a strategy name, an action (“buy” or “sell”), the asset we want to buy, an amount (for example 10% of your buying power or 100 shares), and a description of when we want to perform the action.
2. Creating a “condition” from the description of when we want to perform the action.
3. Creating “indicators” which are compared to each other and determine whether a condition is satisfied.
After this long process, we create the portfolio of trading strategies.
Thanks to the power of LLMs, we can be as vague or as specific as we want. For this test, I want to see if I can use o3 to create a trading strategy that can beat the market.
Spoiler alert: I can.
# My previous attempt at creating a market‑beating trading strategy
In a previous article, I described how O1 was capable of creating a market‑beating trading strategy.
[ I used OpenAI’s o1 model to develop a trading strategy. It is DESTROYING the market](https://medium.datadriveninvestor.com/i-used-openais-o1-model-to-develop-a-trading-strategy-it-is-destroying-the-market-576a6039e8fa)
However, from the discussion in the comments, I noticed that the methodology had several flaws:
1. **Lack of transparency:** Users who came across the article were unable to track the real‑time trading progress of the portfolio across time. Thus, they were unable to determine if the strategies *really* beat the market.
2. **Didn’t outperform the underlying:** While the strategy outperformed SPY, it did NOT beat simply buying and holding the underlying ETF.
Thus, my goal was to see if O3 was any better. We know that O3 is faster and cheaper, but can it be used to create fully autonomous trading rules?
Let’s find out.
# The key differences in this article
There are several key differences with this article since the original. For one is the ability to track the progress of any of these portfolios.
For one, I’ve publicly shared the portfolios from the original article. While they’ve been deployed for a while, now anybody can track their progress in‑real‑time regardless of how long ago this article was posted.
- [**GPT‑o1‑mini TQQQ**](https://nexustrade.io/shared-portfolio/678069c3cf790a73f63af24d)
- [**GPT‑4 TQQQ**](https://nexustrade.io/shared-portfolio/678069e84e8241cf2b08cdb6)
With this new interface, anybody can take the strategies I’ve created and clone them for themselves.
[Pic: The new shared portfolio UI allows anybody to clone these strategies](https://miro.medium.com/v2/resize:fit:1400/1*Yn55WXwqQhTYblHcEeKXMw.png)
You can also look at an audit of the portfolio’s events. This audit allows you to understand what trading decisions were made at every timestep and why.
[Pic: The portfolio’s audit history](https://miro.medium.com/v2/resize:fit:1400/1*_lLeKYdDfHYP1xiIxH-Skg.png)
Moreover, you can also clone and audit the portfolio that I will create in this article.
Finally, the testing in this article will be much more robust. We’re not going to just try to beat the market, but we’re also going to try to outperform the underlying that the strategy is based on.
This is *way* harder, and doing so can suggest that O3 is genuinely very useful for helping traders create their own investing strategy.
[For full transparency, you can read the EXACT conversation I had with the AI here.](https://nexustrade.io/share/679d935a4c92c39decfc33af)
[Link: SMA Crossover Strategy for TQQQ: Portfolio Creation and Backtesting](https://nexustrade.io/share/679d935a4c92c39decfc33af)
This allows you to re‑create these strategies, make your own changes, and further promote trust and transparency with the process.
Without further ado, let’s get started!
# Creating a Portfolio with OpenAI o3‑mini
Just like in the previous article, we’re going to say the following to create our trading strategy.
> I want a SMA crossover strategy on TQQQ. I want a take profit strategy, but no stop losses — I’m bullish on tech long‑term and don’t want to be stop lossed out. I also want to space out my buys and not go all‑in at once.
After just a couple of minutes, the model responds with an **amazing** trading strategy on its very first try!
[Pic: The trading strategy generated from the model](https://miro.medium.com/v2/resize:fit:1400/1*OL3Z2n4yx36xm7OT8fIklA.png)
If we zoom in on this strategy, we see that:
[Pic: Zooming in on the strategy we created](https://miro.medium.com/v2/resize:fit:1400/1*Z5PhM59WpqjgBlGCJrMPsA.png)
- The strategy outperforms buying and holding the S&P 500 by 500%!
- The sharpe ratio is 1.38 vs the sharpe ratio of 1.17 for the baseline.
- Similarly, the sortino ratio is 1.96 vs the sortino ratio of 1.76 for the baseline.
- Finally, the maximum drawdown and average drawdown was nearly 3x that of holding the baseline!
So, while the portfolio is clearly better, with higher risk‑adjusted returns, the baseline is less volatile, with a much lower drawdown.
Finally, we can see the exact rules for this strategy by scrolling down.
- Buy 20 percent of buying power in TQQQ Stock when (20 Day TQQQ SMA > 50 Day TQQQ SMA) and (# of Days Since the Last Filled Buy Order of TQQQ ≥ 1)
- Sell 50 percent of current positions in TQQQ Stock when (TQQQ Price > 1.1 * 20 Day TQQQ SMA) and (# of Days Since the Last Filled Sell Order of TQQQ ≥ 3)
At first glance, this is impressive. But does it stand the test of time and outperform the other strategies?
Let’s see.
# Recreating the GPT‑o1‑mini strategy
[Pic: The Upload Attachment option](https://miro.medium.com/v2/resize:fit:1012/1*YY0vniETL75RpLm4lxl4_g.png)
By creating an “attachment”, I can re‑create the old GPT‑o1 strategy easily with the click of a button.
[Pic: Re-creating the portfolio from the original article](https://miro.medium.com/v2/resize:fit:1400/1*VtXZz_2mm4FdG7h5eCSOdg.png)
We see that this portfolio still outperforms the market, but by a much lower degree than our new strategy. In fact, if we zoom in, we see that it only has 2x the return at a lower sharpe and sortino ratio. This means that the original portfolio is MUCH more risky than just buying and holding SPY.
[Pic: Zooming in on the original o1 strategy](https://miro.medium.com/v2/resize:fit:1400/1*hW28HwX1u7gAMq9cfn6UrA.png)
Now comes the real test. If we test these strategies for the past year, do they outperform the underlying asset?
Let’s find out.
To do this, I simply typed the following:
> Backtest both these portfolios for the past year. Compare them to TQQQ as the baseline
Here was the result.
[Pic: Looking at the backtest result of these portfolios](https://miro.medium.com/v2/resize:fit:1400/1*RqW-zjhaw7rAZ69pMwcxqw.png)
If we zoom in, we see the following:
[Pic: Zooming in on the backtests](https://miro.medium.com/v2/resize:fit:1400/1*wvR5JQRk1zh4pY0q3DM7Sw.png)
- The old GPT‑o1‑mini strategy underperformed buying and holding the underlying TQQQ baseline asset.
- The new GPT o3‑mini model outperforms the baseline, with a higher sharpe ratio, higher sortino ratio, AND a lower drawdown.
These results suggest that the new o3‑mini model is genuinely better at creating more profitable, less risky algorithmic trading strategies.
I’m shocked.
And, as promised, I’m going to deploy this portfolio to the market.
First, I’m going to create a new paper‑trading portfolio.
[Pic: Creating a new paper‑trading portfolio](https://miro.medium.com/v2/resize:fit:1400/1*MRVuFClJE8Y_rd5jXV_-2A.png)
Then, I’m going to deploy it, and share it publicly to the rest of the world.
[Pic: Sharing the portfolio with the entire world](https://miro.medium.com/v2/resize:fit:1400/1*8XH2Yi41N5nUNf92eukR0Q.png)
[You can follow along with this portfolio’s progress by clicking this link.](https://nexustrade.io/shared-portfolio/679d95b94c92c39decfc3b94)
Now anybody can look at the strategies, see how they perform in 2025 and beyond, copy them, modify them, audit them, and deploy their own versions easily within the NexusTrade platform.
# Concluding Thoughts
Each generation of language models get 10x better than the previous.
O3‑mini is the leap that has impressed me the most. For the cost of (the already inexpensive) GPT‑4o, o3‑mini outperforms significantly. It’s faster, cheaper, more reliable, and more accurate than any language model I’ve ever used.
And now, I’ve shown it can be used for algorithmic trading. In this article, I asked o3 to create an algorithmic trading strategy. I’ve shown that it not only outperforms SPY in metrics like percent change and risk‑adjusted returns, but it also outperforms the underlying, achieving greater returns with less risk for the past year.
I’ve also deployed this portfolio for real‑time trading. Anybody can copy it, make their own changes, and deploy their version of this strategy easily using the NexusTrade platform.
This includes both “paper‑trading” (trading with monopoly money) or “real‑trading” through Alpaca.
This isn’t just a minor change – it’s a seismic shift. The AI race is on, and its impact on many fields, like finance, is yet to be seen.
But we’ve at least seen a glimpse — OpenAI developed a model that has the potential to beat the stock market. How cool is that?
Thank you for reading! By using NexusTrade, you can create your own algorithmic trading strategies using natural language. Want to try it out for yourself? Create a free account on NexusTrade today.
[NexusTrade - No-Code Automated Trading and Research](https://nexustrade.io/)
J
o3-mini supports a 200,000-token context window, which is roughly equivalent to 300 pages of text. This allows it to handle long documents, large codebases, and extended multi-turn conversations in a single session.
o3-mini was released in January 2025. The training date listed in the metadata is January 2025. For specific knowledge cutoff details, refer to OpenAI's official model release notes.
o3-mini is designed for tasks that benefit from deliberate, step-by-step reasoning, including mathematical problem-solving, code generation and debugging, and scientific reasoning. It is particularly effective where logical accuracy matters more than conversational fluency.
o3-mini is a proprietary reasoning model that thinks through problems internally before producing a final response. This internal reasoning step is not visible to the user but contributes to improved accuracy on structured and logic-intensive tasks.
o3-mini has been succeeded by o4-mini but remains available as a capable option for users who need reliable reasoning at scale. It can be accessed through MindStudio without requiring separate API key management.
Continue browsing adjacent models from the same provider.