OpenAI

GPT OSS 120B

GPT OSS 120B is OpenAI's largest open-weight model, released in August 2025 under the Apache 2.0 license. It has approximately 116.8 billion total parameters and uses a Mixture-of-Experts (MoE) architecture that activates only around 5.1 billion parameters per token, enabling efficient inference on a single H100 GPU. The model is part of the GPT OSS family and is designed for commercial and private deployments without licensing restrictions. The model is built for coding, mathematical reasoning, scientific analysis, and agentic workflows. It supports a 128,000-token context window, adjustable reasoning levels (low, medium, and high), and native tool use including web browsing, Python code execution, and custom developer-defined functions. Architecturally, it uses 36 transformer layers with 128 experts per MoE layer (top 4 active per token), Grouped Query Attention, Rotary Position Embeddings, and an alternating local/dense attention pattern, and it is available for local inference via Hugging Face Transformers, llama.cpp, and vLLM.

Aug 05, 2025 131.1K context 32,768 tokens output

Mixture-of-Experts Architecture Adjustable Reasoning Long Context Window Coding and Math Tool Use Agentic Workflows

Overview ↓ About ↓ Capabilities ↓ Pricing ↓ Price Comparison ↓ Benchmarks ↓ Tools ↓ Daily ↓ Resources ↓ Community ↓ FAQ ↓

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

OpenAI

Model ID

The routed model identifier exposed by upstream providers.

openai/gpt-oss-120b:free

Input Context Window

The number of tokens supported by the input context window.

131.1K tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

32,768 tokens tokens

Open Source

Whether the model's code is available for public use.

Release Date

When the model was first released.

Aug 05, 2025 11 months ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

August 2025

API Providers

The providers that offer this model. This is not an exhaustive list.

OpenAI

Modalities

Types of data this model can process.

Text Code

What is GPT OSS 120B

A fuller summary of positioning, capabilities, and source-specific details for GPT OSS 120B.

GPT OSS 120B is OpenAI's largest open-weight model, released in August 2025 under the Apache 2.0 license. It has approximately 116.8 billion total parameters and uses a Mixture-of-Experts (MoE) architecture that activates only around 5.1 billion parameters per token, enabling efficient inference on a single H100 GPU. The model is part of the GPT OSS family and is designed for commercial and private deployments without licensing restrictions.

The model is built for coding, mathematical reasoning, scientific analysis, and agentic workflows. It supports a 128,000-token context window, adjustable reasoning levels (low, medium, and high), and native tool use including web browsing, Python code execution, and custom developer-defined functions. Architecturally, it uses 36 transformer layers with 128 experts per MoE layer (top 4 active per token), Grouped Query Attention, Rotary Position Embeddings, and an alternating local/dense attention pattern, and it is available for local inference via Hugging Face Transformers, llama.cpp, and vLLM.

Capabilities

What GPT OSS 120B supports

Mixture-of-Experts Architecture

Uses a MoE design with 128 experts per layer, activating only ~5.1 billion of 116.8 billion total parameters per token for efficient inference.

Adjustable Reasoning

Supports low, medium, and high reasoning levels, allowing developers to tune the trade-off between response speed and reasoning depth.

CTX

Long Context Window

Handles up to 128,000 tokens per request, equivalent to roughly 100,000 words of text in a single prompt.

</>

Coding and Math

Designed for software development, mathematical reasoning, and scientific analysis tasks requiring multi-step problem solving.

Tool Use

Natively supports web browsing, Python code execution, and custom developer-defined functions as callable tools.

Agentic Workflows

Built for multi-step agentic tasks and integrates with agent frameworks, supporting complex sequences of tool calls and decisions.

Open Source License

Released under the Apache 2.0 license, permitting commercial use, fine-tuning, and private deployment without royalty obligations.

Fast Inference

Tagged as very fast; the MoE architecture keeps active parameter count low, and the model fits on a single H100 GPU for local deployment.

Fine-Tuning Support

Supports fine-tuning workflows, allowing developers to adapt the base model to domain-specific tasks using standard training pipelines.

Pricing for GPT OSS 120B

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Input tokens $0.15 Per million tokens

Output tokens $0.00 Per million tokens

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

maxTemperature 2

maxResponseSize 32,768 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

OpenAI

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark	Score
GPQA Diamond PhD-level science questions (biology, physics, chemistry)	78.2%
HLE Questions that challenge frontier models across many domains	18.5%
LiveCodeBench Real-world coding tasks from recent competitions	87.8%
MMLU-Pro Expert knowledge across 14 academic disciplines	80.8%
SciCode Scientific research coding and numerical methods	38.9%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Hugging Face Announcement Blog Post Announcements

→

Official GitHub Repository Open Source

→

AWS Availability Announcement Announcements

→

NVIDIA NIM Model Card Documentation

→

GPT OSS 120B on Hugging Face Documentation

→

Official Website

→

Usage Policies

→

Enterprise privacy at OpenAI

→

OpenAI Status Page

→

OpenRouter Model Page OpenRouter

→

AI tools related to GPT OSS 120B

These tools are strongly connected to GPT OSS 120B through direct product references, provider mentions, or explicit model mappings.

AI Assistant

ChatGptDemo

ChatGptDemo is a free online platform modeled after ChatGPT-4 that provides an accessible way to interact with artificial intelligence. Featuring advanced machine learning algorithms and a flexible interface, it allows users to engage with the AI for free without requiring a login.

Free 0 visits 25 saves

AI Assistant

Chatgptfree.ink

Chatgptfree.ink offers free online access to ChatGPT, enabling users to receive instant answers, spark creative inspiration, and explore new topics. The platform allows for web and mobile usage without the need for an account login. It highlights its free accessibility while providing direct links to the official OpenAI website for those who prefer using the primary platform.

Free 0 visits 6 saves

AI Assistant

MaxAI.me

MaxAI.me is a Chrome and Edge extension designed to boost productivity by offering one-click AI tools for summarizing, searching, explaining, analyzing, translating, and writing content across any website. It supports major AI providers, including ChatGPT, Google Bard, Bing Chat AI, and Claude, and integrates with ChatGPT Plus features like GPT-4, Web Browsing, Code Interpreter, and Plugins. Users can also utilize their own OpenAI API key to access models such as GPT-4, GPT-3.5-turbo-16k, and GPT-4-32k. Additionally, the extension provides one-click ChatGPT prompts tailored for marketing, sales, copywriting, operations, productivity, and customer support.

Free 0 visits 5 saves

AI Assistant

ChatGPT for Shop

ChatGPT for Shop is a browser extension powered by ChatGPT technology. It summarizes and analyzes customer reviews on major e-commerce platforms, enabling e-commerce professionals to better understand user profiles and support data-driven product selection.

Free 0 visits 4 saves

Related Daily Briefs

Recent daily stories tied to GPT OSS 120B through direct model mentions or provider-level coverage.

Frontier Models

Anthropic Opus 5 Nears Fable 5 as Midjourney V8.2 Lands and OpenAI Agents Gain Web Access

NVIDIA and Hugging Face move deeper into real workflows.

2026-07-24 AI Models Security

Agents Workflows

OpenAI launches Building AI; OpenAI launches Enterprise AI Agents; Cohere launches Synthetic media labels

OpenAI and Hugging Face move deeper into real workflows.

2026-07-22 AI API AI Agent

Frontier Models

Anthropic, Alibaba, and OpenAI Signal a Broader Shift Around Economic Index

Anthropic and Qwen move deeper into real workflows.

2026-07-22 AI Models AI API

Frontier Models

OpenAI and Moonshot AI Signal a Broader Shift Around Codex

Hugging Face and OpenAI move deeper into real workflows.

2026-07-21 AI Models Partnership

Community discussion

What people think about GPT OSS 120B

GPT OSS 120B discussions are most active in r/LocalLLaMA, r/LLMDevs, r/AIToolsPerformance.

Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions. The strongest match in this snapshot has 1501 upvotes and 260 comments.

r/AIToolsPerformance 26 upvotes 12 comments April 5, 2026

With Qwen3 Coder 480B free and OpenAI gpt-oss-120b at $0.04/M, is local inference only for privacy now?

Looking at current pricing, the economics of local inference are getting harder to justify for pure capability:

- **Qwen: Qwen3 Coder 480B A35B** - free with 262,000 context
- **OpenAI: gpt-oss-120b** - $0.04/M with 131,072 context
- **Z.ai: GLM 4 32B** - $0.10/M with 128,000 context
- **Qwen: Qwen3 235B A22B Thinking 2507** - $0.15/M with 131,072 context

Even **Arcee AI: Maestro Reasoning** at $0.90/M for a dedicated reasoning model with 131K context is competitive against the electricity cost of running a 48GB+ VRAM rig at full load.

The local inference crowd has historically argued three pillars: cost, privacy, and latency. But when a 480B-parameter coder model is free with 262K context, the cost argument weakens significantly. Apple's work on self-distillation for code generation suggests models will keep getting more efficient on the API side too.

That said, the DGX Spark situation - NVFP4 support still missing after 6 months - shows the hardware side moves slower. And the "Signals" paper on trajectory sampling for agentic interactions hints that complex agent workflows may still benefit from local control.

So honest question: for those of you still running local inference in April 2026, is it purely privacy/compliance driving that choice, or are there workloads where local still beats these API prices on quality?

Open Reddit thread

r/openrouter 10 comments April 18, 2026

openrouter not work on gpt-oss-120b (free)?

i am on opencode and i wanted to try the gpt-oss-120b (free), but i get as error:

\[OpenInference\] no healthy upstream

is that normal?

Open Reddit thread

r/selfhosted 1,501 upvotes 260 comments August 6, 2025

You can now run OpenAI's gpt-oss model on your local device! (14GB RAM)

Hello everyone! OpenAI just released their first open-source models in 5 years, and now, you can have your own GPT-4o and o3 model at home! They're called 'gpt-oss'.

There's two models, a smaller 20B parameter model and a 120B one that rivals o4-mini. **Both** models outperform GPT-4o in various tasks, including reasoning, coding, math, health and agentic tasks.

To run the models locally (laptop, Mac, desktop etc), we at [**Unsloth**](https://docs.unsloth.ai/) converted these models and also **fixed bugs** to increase the model's output quality. Our GitHub repo: [https://github.com/unslothai/unsloth](https://github.com/unslothai/unsloth)

Optimal setup:

* The 20B model runs at >10 tokens/s in **full precision**, with **14GB RAM**/unified memory. Smaller versions use 12GB RAM.
* The 120B model runs in full precision at >40 token/s with \~64GB RAM/unified mem.

There is no minimum requirement to run the models as they run even if you only have a 6GB CPU, but it will be slower inference.

Thus, **no is GPU required**, especially for the 20B model, but having one significantly boosts inference speeds (\~80 tokens/s). With something like an H100 you can get 140 tokens/s throughput which is way faster than the ChatGPT app.

You can run our uploads with bug fixes via llama.cpp or Unsloth Studio for the best performance. If the 120B model is too slow, try the smaller 20B version - it’s super fast and performs as well as o3-mini.

* Links to the model GGUFs to run: [gpt-oss-20B-GGUF](https://huggingface.co/unsloth/gpt-oss-20b-GGUF) and [gpt-oss-120B-GGUF](https://huggingface.co/unsloth/gpt-oss-120b-GGUF)
* Our **step-by-step guide** which we'd recommend you guys to read as it pretty much covers everything: [https://docs.unsloth.ai/basics/gpt-oss](https://docs.unsloth.ai/basics/gpt-oss)

Thanks so much once again for reading! I'll be replying to **every person** btw so feel free to ask any questions!

Open Reddit thread

r/n8n_on_server 60 upvotes 33 comments August 6, 2025

Setup GPT-OSS-120B in Kilo Code [ COMPLETELY FREE]

https://preview.redd.it/2us0qrfxqehf1.png?width=630&format=png&auto=webp&s=1bfeee4f5c507cb78b493d80d227de8f1ce1c402

https://preview.redd.it/aatui1dxqehf1.png?width=635&format=png&auto=webp&s=0a1e46362a0db0d5c301c19814e317defc5c60af

kilo code: [Signup](https://kilocode.ai/users/sign_up?referral-code=36b1ea02-7746-4fa9-a660-e199cefdbe29)

**1. Get Your API Key:** Visit [https://build.nvidia.com/settings/api-keys](https://build.nvidia.com/settings/api-keys) to generate your free NVIDIA API key.

**2. Configure Kilo Code**

* Open Kilo Code Settings → Providers
* Set **API Provider**: "OpenAI Compatible"
* **Base URL**: [`https://integrate.api.nvidia.com/v1`](https://integrate.api.nvidia.com/v1)
* **API Key**: Paste your NVIDIA API key
* **Model**: `openai/gpt-oss-120b`

**3. Enable Key Features**

* ✅ **Image Support** \- Model handles visual inputs
* ✅ **Prompt Caching** \- Faster responses for repeated prompts
* ✅ **Enable R1 model parameters** \- Optimized reasoning
* Set **Context Window**: 128000 tokens
* **Model Reasoning Effort**: High

**4. Save & Start Coding** Click "Save" and you're ready to use this powerful 120B parameter model for free coding assistance with image understanding capabilities!

The model offers enterprise-grade performance with multimodal support, perfect for complex coding tasks that require both text and visual understanding.

Open Reddit thread

r/vibecoding 3 upvotes 2 comments February 25, 2026

I made a TUI that makes vibe coding basically free (thxs to NVIDIA NIM + other free tiers) works with OpenCode & OpenClaw. Deepseek, GPT OSS 120B, Kimi 2.5, GLM 5... & more

I was tired of hopping between NVIDIA NIM endpoints trying to find one that actually responds (and doing that while wasting my paid Claude/Codex/Gemini quotas).

So I built free-coding-models: a TUI that pings coding-focused LLMs in parallel, ranks them by latency + uptime, and then lets you launch OpenCode / configure OpenClaw with the best one in a keypress.

`npm i -g free-coding-models`

**What it does**

* Monitors **134 coding models** across **17 providers** (NVIDIA NIM, Groq, Cerebras, SambaNova, OpenRouter, HuggingFace, Replicate, DeepInfra, Fireworks, Codestral, Hyperbolic, Scaleway, Google AI, Together, Cloudflare Workers AI, Perplexity…)
* **Parallel pings + continuous monitoring** (latency updates live + rolling averages + uptime %)
* Built-in **provider key management** (press P) + optional --no-telemetry
* For OpenClaw: it can also **patch the allowlist** so you can use *all* NVIDIA models without “model not allowed” errors

**If you don’t know what NVIDIA NIM is:**

NVIDIA NIM is capped at 40 RPM which is honestly huge for a free tier, and plenty for day-to-day vibe coding ! You just have to make an account and set the API Key.

NIM = **NVIDIA Inference Microservices** (hosted APIs / containers for running foundation models on NVIDIA infra). NVIDIA advertises **free access for NVIDIA Developer Program members** (intended for dev/testing/prototyping).

Repo: [https://github.com/vava-nessa/free-coding-models](https://github.com/vava-nessa/free-coding-models) Please star it ;)

**Feedback wanted:** which tool should I support next after OpenCode/OpenClaw ?

(Cursor? Claude Code via proxy? KiloCode?)

Open Reddit thread

View more discussions →

FAQ

Common questions about GPT OSS 120B

What is the context window for GPT OSS 120B?

GPT OSS 120B supports a 128,000-token context window, which is roughly equivalent to 100,000 words of text in a single request.

What license does GPT OSS 120B use?

The model is released under the Apache 2.0 license, which permits commercial use, modification, fine-tuning, and private deployment.

What is the training data cutoff for GPT OSS 120B?

Based on the available metadata, the model was released in August 2025. A specific training data cutoff date is not stated in the provided metadata.

How many parameters does GPT OSS 120B have, and how does the MoE architecture affect inference?

The model has approximately 116.8 billion total parameters, but its Mixture-of-Experts architecture activates only around 5.1 billion parameters per token during inference, reducing compute requirements compared to a dense model of the same total size.

Where can GPT OSS 120B be deployed?

The model is available on AWS via Amazon Bedrock and SageMaker JumpStart, on NVIDIA NIM, and locally through Hugging Face Transformers, llama.cpp, and vLLM. It fits on a single H100 GPU for local inference.

Does GPT OSS 120B support tool use and agentic tasks?

Yes. The model natively supports web browsing, Python code execution, and custom developer-defined functions, and it is designed for multi-step agentic workflows and integration with agent frameworks.

More models from OpenAI

Continue browsing adjacent models from the same provider.

← All AI Models