OpenAI

Sora 2 Pro

Sora 2 Pro is the premium tier of OpenAI's second-generation video generation model, available to ChatGPT Pro subscribers. It generates videos up to 25 seconds in length at resolutions up to 1080p and frame rates between 24 and 60 fps, with synchronized dialogue, sound effects, and ambient audio produced alongside the video. The model also includes a Cameo feature that lets users inject a consistent character — based on an uploaded video of a person, pet, or object — into any generated scene. Sora 2 Pro is designed for filmmakers, content creators, marketers, and storytellers who require longer, higher-fidelity AI-generated video with professional-grade audio. The model handles complex, multi-part prompts and maintains character and scene continuity across multiple shots within a single clip. It models physical phenomena such as gravity, collisions, and fluid dynamics, and scored approximately 8.5 out of 10 in independent physics evaluations. The model's training data has a cutoff of September 2025.

September 2025 5,000 context N/A output
Extended Video Length Synchronized Audio Physics Simulation Cameo Character Injection Multi-Part Prompt Control Image-to-Video Input

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

OpenAI

Input Context Window

The number of tokens supported by the input context window.

5,000 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

N/A tokens

Open Source

Whether the model's code is available for public use.

No

Release Date

When the model was first released.

September 2025

Knowledge Cut-off Date

When the model's knowledge was last updated.

September 2025

API Providers

The providers that offer this model. This is not an exhaustive list.

OpenAI API

Modalities

Types of data this model can process.

Video Audio

What is Sora 2 Pro

A fuller summary of positioning, capabilities, and source-specific details for Sora 2 Pro.

Sora 2 Pro is the premium tier of OpenAI's second-generation video generation model, available to ChatGPT Pro subscribers. It generates videos up to 25 seconds in length at resolutions up to 1080p and frame rates between 24 and 60 fps, with synchronized dialogue, sound effects, and ambient audio produced alongside the video. The model also includes a Cameo feature that lets users inject a consistent character — based on an uploaded video of a person, pet, or object — into any generated scene.

Sora 2 Pro is designed for filmmakers, content creators, marketers, and storytellers who require longer, higher-fidelity AI-generated video with professional-grade audio. The model handles complex, multi-part prompts and maintains character and scene continuity across multiple shots within a single clip. It models physical phenomena such as gravity, collisions, and fluid dynamics, and scored approximately 8.5 out of 10 in independent physics evaluations. The model's training data has a cutoff of September 2025.

Capabilities

What Sora 2 Pro supports

VID

Extended Video Length

Generates videos up to 25 seconds long at resolutions up to 1080p and frame rates between 24 and 60 fps.

AUD

Synchronized Audio

Produces dialogue, sound effects, and ambient noise in sync with on-screen action, eliminating the need for separate audio tools.

AI

Physics Simulation

Models gravity, collisions, fluid dynamics, and object permanence, scoring approximately 8.5/10 in independent physics evaluations.

AI

Cameo Character Injection

Accepts a short uploaded video of a person, pet, or object and inserts that subject as a consistent character into any generated scene.

AI

Multi-Part Prompt Control

Handles complex, multi-part text prompts while maintaining character and scene continuity across multiple shots within a single video.

IMG

Image-to-Video Input

Accepts an image URL as input to anchor the visual style or starting frame of a generated video.

Pricing for Sora 2 Pro

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

maxTemperature 1

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

OpenAI API

Configuration & Parameters

The configurable options currently documented for this model.

Duration

Select
Default: 4
4s 8s 12s

Size

Select
Default: 720x1280
720x1280 1280x720 1024x1792 1792x1024 1080x1920 (HD) 1920x1080 (HD)

Input Image

Image URL

Optional URL of an input image to animate. Must be the exact dimensions as the video output.

Character Reference

Video URL

(Optional) Upload a 2-4 second video of objects, animals, or animated characters as references to appear within your videos to maintain recognizable mascots and products across campaigns and creative assets.

Character Name

Text

(Optional) Name for the character (must be mentioned in the prompt)

Supported Request Parameters

Parameters currently listed by OpenRouter or the local catalog for this model.

Duration Size Input Image Character Reference Character Name

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Community discussion

What people think about Sora 2 Pro

Sora 2 Pro discussions are most active in r/SoraAi, r/OpenAI, r/n8n. Top Reddit threads cluster around benchmark and model-comparison threads. The strongest match in this snapshot has 972 upvotes and 43 comments.

r/SoraAi 65 upvotes 3,541 comments September 30, 2025
Official Sora AI 2 code request/questions thread

EDIT: Please use this thread only to request codes.

1. Make a comment stating that you promise you're going to give at least 3 codes back
2. Click on your profile to find the codes once your in (only works on mobile, click ... on mobile web for android) or the (...) in the bottom left on PC.
3. Add R- to the front of every code you put here to confuse bots
4. Share away!

Sora AI 2 is out and everyone is looking for an invite code. To keep this subreddit from becoming a mess, I'm making this thread the only place where you should be:

1. Asking basic questions about Sora AI 2
2. Asking for invite codes

Since we've collaborated with them in the past, I'll be reaching out to them directly to see if I can get you guys some invite codes.

Let's use this opportunity to answer some basic questions. If you can respond to the thread with answers, I will make this an FAQ.

Once we have codes or SoraAI is released to all, we will [ping you in Discord](https://discord.gg/6xsNRnSvYQ).

# How can I get an invite code to Sora AI 2?

Access is currently invite-only. The first users are receiving direct invitations and are also being given a few "friend passes" to share. The best way to get in line is to be a ChatGPT Pro or Plus subscriber, as these users will be prioritized after the initial wave. You can also download the Sora app now and register to be notified when you get access.

# How did people get their initial invite code?

The very first users received direct invitations. Each of these people was also given four invite codes they could share with friends.

# Do you need to be a pro subscriber or a plus subscriber to get Sora?

While not an absolute requirement, being a ChatGPT Pro or Plus subscriber gives you priority access. The rollout prioritizes early Sora 1 power users first, followed by Pro subscribers, and then Plus and Team plan users.

Initially, Sora 2 will be free with generous limits. However, Pro subscribers will have access to a higher-quality, experimental version called Sora 2 Pro.

# In which countries is Sora currently available?

The initial launch of the Sora 2 iOS app is in the U.S. and Canada, with plans to expand to more countries soon.

# What are the key improvements of Sora 2?

Sora 2 is a major upgrade with several key improvements:

* Synchronized audio: It can generate video with synchronized dialogue, sound effects, and background sounds, which was not possible before.
* Improved physics: The model has a much better grasp of real-world physics, making object interactions and movements more realistic.
* Better consistency: It is far better at keeping characters and objects consistent throughout a video, even across multiple scenes.
* Cameo feature: This new app feature lets you insert yourself or friends into the AI-generated videos by taking a short recording.
* Higher quality: The videos are longer (up to 60 seconds), sharper, and more detailed than the previous version.

# Is it true that Sora can generate audio and sound effects as well?

Yes, this is one of the biggest new features. Sora 2 is a full audio-video model. It can create realistic and synchronized dialogue, sound effects, and background noise that match the action happening in the video, making the final result much more immersive.

# What are some new uses for Sora 2?

Based on its new features, here are some of the most common ways people are planning to use Sora 2 are likely:

* Social content: Creating fun, viral videos for social media, especially using the new Cameo feature to put themselves and friends into fantastic scenarios.
* Filmmaking and storytelling: Quickly creating storyboards, animated shorts, or even generating high-quality B-roll footage for professional projects.
* Marketing and advertising: Visualizing products in action and creating engaging ad content without the need for expensive physical video shoots.
* Personalized entertainment: Simply having fun by generating videos of yourself in impossible situations, like walking on the moon or starring in a fantasy movie.

Open Reddit thread
r/OpenAI 353 upvotes 248 comments October 10, 2025
It's insane how badly they've ruined SORA 2 already

I already knew this would happen, as I predicted here:

[https://www.reddit.com/r/OpenAI/comments/1nvoq9u/enjoy\_sora\_2\_while\_it\_lasts\_we\_all\_know\_openais/](https://www.reddit.com/r/OpenAI/comments/1nvoq9u/enjoy_sora_2_while_it_lasts_we_all_know_openais/)

However, I’m still stunned by how little time it took. I thought they would let us use the good version for at least 4-8 weeks before subtly reducing its quality over time (like they did with their image generator), but it has already dipped to VEO 3 level or lower, **and it hasn’t even been two weeks!**

I’m using the SORA 2 Pro model, which is supposed to be the good one, yet it has already reached a point where all the original selling points (e.g. strong understanding of the world, realistic physics, and logical sequencing of events) are gone. Most generations are now, at best, no better than VEO 3, and sometimes even worse. This is effectively not the same product we had at launch.

What shocks me is not that they reduced its quality, **but how quickly and blatantly they did it.** OpenAI clearly doesn’t care anymore. They don’t mind that it’s obvious the model performs poorly now. They built early hype, presumably to satisfy investors, and now that they’ve achieved that, they’re throwing it all under the bus. Again.

Open Reddit thread
r/ChatGPT 972 upvotes 43 comments October 17, 2025
Sora-2-pro is the best model for creepy videos

Prompt taken from [sora.chatgpt.com](http://sora.chatgpt.com) \- put into sora-2-pro:

>Real life Authentic raw VHS camcorder footage from the 2000s, recorded directly onto magnetic tape — not playback on a TV. The image has soft blur, muted colors, analog noise, faint static, and occasional horizontal tracking lines near the bottom. Slight handheld camera motion with natural jitter. Subtle chroma bleeding, color drift, and scanline flicker give it a genuine analog feel. Timestamp overlay with seconds in small white digital text is locked in the bottom-right corner, perfectly stable and unaffected by camera movement, static, or distortion. The timestamp never glitches, warps, or disappears — it remains consistently visible throughout. The audio has a low hiss, faint tape hum, and distant ambient noise captured from the camcorder’s built-in microphone. Looks exactly like genuine VHS footage, not a digital filter — timestamp reads ‘SEPT 20 1994 11:23:51 PM’ — a woman films her backyard after hearing strange noises. A pale humanoid figure rushes out of the trees and straight towards the camera, the woman screams, the video stops.Real life Authentic raw VHS camcorder footage from the 2000s, recorded directly onto magnetic tape — not playback on a TV. The image has soft blur, muted colors, analog noise, faint static, and occasional horizontal tracking lines near the bottom. Slight handheld camera motion with natural jitter. Subtle chroma bleeding, color drift, and scanline flicker give it a genuine analog feel. Timestamp overlay with seconds in small white digital text is locked in the bottom-right corner, perfectly stable and unaffected by camera movement, static, or distortion. The timestamp never glitches, warps, or disappears — it remains consistently visible throughout. The audio has a low hiss, faint tape hum, and distant ambient noise captured from the camcorder’s built-in microphone. Looks exactly like genuine VHS footage, not a digital filter — timestamp reads ‘SEPT 20 1994 11:23:51 PM’ — a woman films her backyard after hearing strange noises. A pale humanoid figure rushes out of the trees and straight towards the camera, the woman screams, the video stops.

Credit for the prompt:

[https://sora.chatgpt.com/p/s\_68e6cc54535c8191998ee422adb34c70](https://sora.chatgpt.com/p/s_68e6cc54535c8191998ee422adb34c70) (ssponge) & [https://sora.chatgpt.com/p/s\_68e701593e248191acc569b7870d15ab](https://sora.chatgpt.com/p/s_68e701593e248191acc569b7870d15ab) (marrowstoned)

Meanwhile, Veo 3.1 (tried multiple prompts, turned out very underwhelming with results like that: [https://streamable.com/of8k6h](https://streamable.com/of8k6h)

Open Reddit thread

I built this AI UGC video generator that takes in a single physical product image as input. It uses OpenAI's new Sora 2 video model combined with vision AI to analyze the product, generate an ideal influencer persona, write multiple UGC scripts, and produce professional-looking videos in seconds.

Here's a demo video of the whole automation in action: https://www.youtube.com/watch?v=-HnyKkP2K2c

And here's some of the output for a quick run I did of both Ridge Wallet and Function of Beauty Shampoo: https://drive.google.com/drive/u/0/folders/1m9ziBbywD8ufFTJH4haXb60kzSkAujxE

## Here's how the automation works

### 1. Process the initial product image that gets uploaded.

The workflow starts with a simple form trigger that accepts two inputs:

- A product image (any format, any dimensions)
- The product name for context To be used in the video scripts.

I convert the uploaded image to a base64 string immediately for flexibility when working with the Gemini API.

### 2. Generate an ideal influencer persona to promote the product just uploaded.

I then use OpenAI's Vision API to analyze the product image and generates a detailed profile of the ideal influencer who should promote this product. The prompt acts as an expert casting director and consumer psychologist.

The AI creates a complete character profile including:

- Name, age, gender, and location
- Physical appearance and personality traits
- Lifestyle details and communication style
- Why they're the perfect advocate for this specific product

For the Ridge Wallet demo example, it generated a profile for an influencer named Marcus, a 32-year-old UI/UX designer from San Francisco who values minimalism and efficiency.

Here's the prompt I use for this:

```markdown
**// ROLE & GOAL //**
You are an expert Casting Director and Consumer Psychologist. Your entire focus is on understanding people. Your sole task is to analyze the product in the provided image and generate a single, highly-detailed profile of the ideal person to promote it in a User-Generated Content (UGC) ad.

The final output must ONLY be a description of this person. Do NOT create an ad script, ad concepts, or hooks. Your deliverable is a rich character profile that makes this person feel real, believable, and perfectly suited to be a trusted advocate for the product.

**// INPUT //**

Product Name: `{{ $node['form_trigger'].json['Product Name'] }}`

**// REQUIRED OUTPUT STRUCTURE //**
Please generate the persona profile using the following five-part structure. Be as descriptive and specific as possible within each section.

**I. Core Identity**
* **Name:**
* **Age:** (Provide a specific age, not a range)
* **Sex/Gender:**
* **Location:** (e.g., "A trendy suburb of a major tech city like Austin," "A small, artsy town in the Pacific Northwest")
* **Occupation:** (Be specific. e.g., "Pediatric Nurse," "Freelance Graphic Designer," "High School Chemistry Teacher," "Manages a local coffee shop")

**II. Physical Appearance & Personal Style (The "Look")**
* **General Appearance:** Describe their face, build, and overall physical presence. What is the first impression they give off?
* **Hair:** Color, style, and typical state (e.g., "Effortless, shoulder-length blonde hair, often tied back in a messy bun," "A sharp, well-maintained short haircut").
* **Clothing Aesthetic:** What is their go-to style? Use descriptive labels. (e.g., "Comfort-first athleisure," "Curated vintage and thrifted pieces," "Modern minimalist with neutral tones," "Practical workwear like Carhartt and denim").
* **Signature Details:** Are there any small, defining features? (e.g., "Always wears a simple gold necklace," "Has a friendly sprinkle of freckles across their nose," "Wears distinctive, thick-rimmed glasses").

**III. Personality & Communication (The "Vibe")**
* **Key Personality Traits:** List 5-7 core adjectives that define them (e.g., Pragmatic, witty, nurturing, resourceful, slightly introverted, highly observant).
* **Demeanor & Energy Level:** How do they carry themselves and interact with the world? (e.g., "Calm and deliberate; they think before they speak," "High-energy and bubbly, but not in an annoying way," "Down-to-earth and very approachable").
* **Communication Style:** How do they talk? (e.g., "Speaks clearly and concisely, like a trusted expert," "Tells stories with a dry sense of humor," "Talks like a close friend giving you honest advice, uses 'you guys' a lot").

**IV. Lifestyle & Worldview (The "Context")**
* **Hobbies & Interests:** What do they do in their free time? (e.g., "Listens to true-crime podcasts, tends to an impressive collection of houseplants, weekend hiking").
* **Values & Priorities:** What is most important to them in life? (e.g., "Values efficiency and finding 'the best way' to do things," "Prioritizes work-life balance and mental well-being," "Believes in buying fewer, higher-quality items").
* **Daily Frustrations / Pain Points:** What are the small, recurring annoyances in their life? (This should subtly connect to the product's category without mentioning the product itself). (e.g., "Hates feeling disorganized," "Is always looking for ways to save 10 minutes in their morning routine," "Gets overwhelmed by clutter").
* **Home Environment:** What does their personal space look like? (e.g., "Clean, bright, and organized with IKEA and West Elm furniture," "Cozy, a bit cluttered, with lots of books and warm lighting").

**V. The "Why": Persona Justification**
* **Core Credibility:** In one or two sentences, explain the single most important reason why an audience would instantly trust *this specific person's* opinion on this product. (e.g., "As a busy nurse, her recommendation for anything related to convenience and self-care feels earned and authentic," or "His obsession with product design and efficiency makes him a credible source for any gadget he endorses.")
```

### 3. Write the UGC video ad scripts.

Once I have this profile generated, I then use Gemini 2.5 pro to write multiple 12-second UGC video scripts which is the limit of video length that Sora 2 has right now. Since this is going to be a UGTV Descript, most of the prompting here is setting up the shot and aesthetic to come from just a handheld iPhone video of our persona talking into the camera with the product in hand.

Key elements of the script generation:

- Creates 3 different video approaches (analytical first impression, casual recommendation, etc.)
- Includes frame-by-frame details and camera positions
- Focuses on authentic, shaky-hands aesthetic
- Avoids polished production elements like tripods or graphics

Here's the prompt I use for writing the scripts. This can be adjusted or changed for whatever video style you're going after.

```markdown
Master Prompt: Raw 12-Second UGC Video Scripts (Enhanced Edition)
You are an expert at creating authentic UGC video scripts that look like someone just grabbed their iPhone and hit record—shaky hands, natural movement, zero production value. No text overlays. No polish. Just real.
Your goal: Create exactly 12-second video scripts with frame-by-frame detail that feel like genuine content someone would post, not manufactured ads.

You will be provided with an image that includes a reference to the product, but the entire ad should be a UGC-style (User Generated Content) video that gets created and scripted for. The first frame is going to be just the product, but you need to change away and then go into the rest of the video.

The Raw iPhone Aesthetic
What we WANT:

Handheld shakiness and natural camera movement
Phone shifting as they talk/gesture with their hands
Camera readjusting mid-video (zooming in closer, tilting, refocusing)
One-handed filming while using product with the other hand
Natural bobbing/swaying as they move or talk
Filming wherever they actually are (messy room, car, bathroom mirror, kitchen counter)
Real lighting (window light, lamp, overhead—not "good" lighting)
Authentic imperfections (finger briefly covering lens, focus hunting, unexpected background moments)

What we AVOID:

Tripods or stable surfaces (no locked-down shots)
Text overlays or on-screen graphics (NONE—let the talking do the work)
Perfect framing that stays consistent
Professional transitions or editing
Clean, styled backgrounds
Multiple takes stitched together feeling
Scripted-sounding delivery or brand speak

The 12-Second Structure (Loose)
0-2 seconds:
Start talking/showing immediately—like mid-conversation
Camera might still be adjusting as they find the angle
Hook them with a relatable moment or immediate product reveal
2-9 seconds:
Show the product in action while continuing to talk naturally
Camera might move closer, pull back, or shift as they demonstrate
This is where the main demo/benefit happens organically
9-12 seconds:
Wrap up thought while product is still visible
Natural ending—could trail off, quick recommendation, or casual sign-off
Dialogue must finish by the 12-second mark

Critical: NO Invented Details

Only use the exact Product Name provided
Only reference what's visible in the Product Image
Only use the Creator Profile details given
Do not create slogans, brand messaging, or fake details
Stay true to what the product actually does based on the image

Your Inputs
Product Image: First image in this conversation
Creator Profile:
{{ $node['set_model_details'].json.prompt }}
Product Name:
{{ $node['form_trigger'].json['Product Name'] }}

Output: 3 Natural Scripts
Three different authentic approaches:

Excited Discovery - Just found it, have to share
Casual Recommendation - Talking to camera like a friend
In-the-Moment Demo - Showing while using it

Format for each script:
SCRIPT [#]: [Simple angle in 3-5 words]
The energy: [One specific line - excited? Chill? Matter-of-fact? Caffeinated? Half-awake?]
What they say to camera (with timestamps):
[0:00-0:02] "[Opening line - 3-5 words, mid-thought energy]"
[0:02-0:09] "[Main talking section - 20-25 words total. Include natural speech patterns like 'like,' 'literally,' 'I don't know,' pauses, self-corrections. Sound conversational, not rehearsed.]"
[0:09-0:12] "[Closing thought - 3-5 words. Must complete by 12-second mark. Can trail off naturally.]"
Shot-by-Shot Breakdown:
SECOND 0-1:

Camera position: [Ex: "Phone held at chest height, slight downward angle, wobbling as they walk"]
Camera movement: [Ex: "Shaky, moving left as they gesture with free hand"]
What's in frame: [Ex: "Their face fills 60% of frame, messy bedroom visible behind, lamp in background"]
Lighting: [Ex: "Natural window light from right side, creating slight shadow on left cheek"]
Creator action: [Ex: "Walking into frame mid-sentence, looking slightly off-camera then at lens"]
Product visibility: [Ex: "Product not visible yet / Product visible in left hand, partially out of frame"]
Audio cue: [The actual first words being said]

SECOND 1-2:

Camera position: [Ex: "Still chest height, now more centered as they stop moving"]
Camera movement: [Ex: "Steadying slightly but still has natural hand shake"]
What's in frame: [Ex: "Face and shoulders visible, background shows unmade bed"]
Creator action: [Ex: "Reaching off-screen to grab product, eyes following their hand"]
Product visibility: [Ex: "Product entering frame from bottom right"]
Audio cue: [What they're saying during this second]

SECOND 2-3:

Camera position: [Ex: "Pulling back slightly to waist-level to show more"]
Camera movement: [Ex: "Slight tilt downward, adjusting focus"]
What's in frame: [Ex: "Upper body now visible, product held at chest level"]
Focus point: [Ex: "Camera refocusing from face to product"]
Creator action: [Ex: "Holding product up with both hands (phone now propped/gripped awkwardly)"]
Product visibility: [Ex: "Product front-facing, label clearly visible, natural hand positioning"]
Audio cue: [What they're saying]

SECOND 3-4:

Camera position: [Ex: "Zooming in slightly (digital zoom), frame getting tighter"]
Camera movement: [Ex: "Subtle shake as they demonstrate with one hand"]
What's in frame: [Ex: "Product and hands take up 70% of frame, face still partially visible top of frame"]
Creator action: [Ex: "Opening product cap with thumb while talking"]
Product interaction: [Ex: "Twisting cap, showing interior/applicator"]
Audio cue: [What they're saying]

SECOND 4-5:

Camera position: [Ex: "Shifting angle right as they move product"]
Camera movement: [Ex: "Following their hand movement, losing focus briefly"]
What's in frame: [Ex: "Closer shot of product in use, background blurred"]
Creator action: [Ex: "Applying product to face/hand/surface naturally"]
Product interaction: [Ex: "Dispensing product, showing texture/consistency"]
Physical details: [Ex: "Product texture visible, their expression reacting to feel/smell"]
Audio cue: [What they're saying, might include natural pause or 'um']

SECOND 5-6:

Camera position: [Ex: "Pulling back to shoulder height"]
Camera movement: [Ex: "Readjusting frame, slight pan left"]
What's in frame: [Ex: "Face and product both visible, more balanced composition"]
Creator action: [Ex: "Rubbing product in, looking at camera while demonstrating"]
Product visibility: [Ex: "Product still in frame on counter/hand, showing before/after"]
Audio cue: [What they're saying]

SECOND 6-7:

Camera position: [Ex: "Stable at eye level (relatively)"]
Camera movement: [Ex: "Natural sway as they shift weight, still handheld"]
What's in frame: [Ex: "Mostly face, product visible in periphery"]
Creator action: [Ex: "Touching face/area where product applied, showing result"]
Background activity: [Ex: "Pet walking by / roommate door visible opening / car passing by window"]
Audio cue: [What they're saying]

SECOND 7-8:

Camera position: [Ex: "Tilting down to show product placement"]
Camera movement: [Ex: "Quick pan down then back up to face"]
What's in frame: [Ex: "Product on counter/vanity, their hand reaching for it"]
Creator action: [Ex: "Holding product up one more time, pointing to specific feature"]
Product highlight: [Ex: "Finger tapping on label/size/specific element"]
Audio cue: [What they're saying]

SECOND 8-9:

Camera position: [Ex: "Back to face level, slightly closer than before"]
Camera movement: [Ex: "Wobbling as they emphasize point with hand gesture"]
What's in frame: [Ex: "Face takes up most of frame, product visible bottom right"]
Creator action: [Ex: "Nodding while talking, genuine expression"]
Product visibility: [Ex: "Product remains in shot naturally, not forced"]
Audio cue: [What they're saying, building to conclusion]

SECOND 9-10:

Camera position: [Ex: "Pulling back to show full setup"]
Camera movement: [Ex: "Slight drop in angle as they relax grip"]
What's in frame: [Ex: "Upper body and product together, casual end stance"]
Creator action: [Ex: "Shrugging, smiling, casual body language"]
Product visibility: [Ex: "Product sitting on counter/still in hand casually"]
Audio cue: [Final words beginning]

SECOND 10-11:

Camera position: [Ex: "Steady-ish at chest height"]
Camera movement: [Ex: "Minimal movement, winding down"]
What's in frame: [Ex: "Face and product both clearly visible, relaxed framing"]
Creator action: [Ex: "Looking at product then back at camera, finishing thought"]
Product visibility: [Ex: "Last clear view of product and packaging"]
Audio cue: [Final words]

SECOND 11-12:

Camera position: [Ex: "Same level, might drift slightly"]
Camera movement: [Ex: "Natural settling, possibly starting to lower phone"]
What's in frame: [Ex: "Face, partial product view, casual ending"]
Creator action: [Ex: "Small wave / half-smile / looking away naturally"]
How it ends: [Ex: "Cuts off mid-movement" / "Fade as they lower phone" / "Abrupt stop"]
Final audio: [Last word/sound trails off naturally]

Overall Technical Details:

Phone orientation: [Vertical/horizontal?]
Filming method: [Selfie mode facing them? Back camera in mirror? Someone else holding phone? Propped on stack of books?]
Dominant hand: [Which hand holds phone vs. product?]
Location specifics: [What room? Time of day based on lighting? Any notable background elements?]
Audio environment: [Echo from bathroom? Quiet bedroom? Background TV/music? Street noise?]

Enhanced Authenticity Guidelines
Verbal Authenticity:

Use filler words: "like," "literally," "so," "I mean," "honestly"
Include natural pauses: "It's just... really good"
Self-corrections: "It's really—well actually it's more like..."
Conversational fragments: "Yeah so this thing..."
Regional speech patterns if relevant to creator profile

Visual Authenticity Markers:

Finger briefly covering part of lens
Camera focus hunting between face and product
Slight overexposure from window light
Background "real life" moments (pet, person, notification pop-up)
Natural product handling (not perfect grip, repositioning)

Timing Authenticity:

Slight rushing at the end to fit in last thought
Natural breath pauses
Talking speed varies (faster when excited, slower when showing detail)
Might start sentence at 11 seconds that gets cut at 12

Remember: Every second matters. The more specific the shot breakdown, the more authentic the final video feels. If a detail seems too polished, make it messier. No text overlays ever. All dialogue must finish by the 12-second mark (can trail off naturally).
```

### 4. Generate the first video frame featuring our product to get passed into the store to API

Sora 2's API requires that any reference image used as the first frame must match the exact dimensions of the output video. Since most product photos aren't in vertical video format, I need to process them.

In this part of the workflow:

- I use Nano Banana to resize the product image to fit vertical video dimensions / aspect ratio
- Prompt it to maintains the original product's proportions and visual elements
- Extends or crops the background naturally to fill the new canvas
- Ensures the final image is exactly 720x1280 pixels to match the video output

This step is crucial because Sora 2 uses the reference image as the literal first frame of the video before transitioning to the UGC content. Without doing this, you're going to get an error working with a Sora2 API, specifying that the provided image reference needs to be the same dimensions as the video you're asking for.

### 5. Generate each video with Sora 2 API

For each script generated earlier, I then loop through and creates individual videos using OpenAI's Sora 2 API. This involves:

- Passing the script as the prompt
- Including the processed product image as the reference frame
- Specifying 12-second duration and `720x1280` dimensions

Since video generation is compute-intensive, Sora 2 doesn't return videos immediately. Instead, it returns a job ID that will get used for polling.

I then take that ID, wait a few seconds, and then make another request into the endpoint to fetch the status of the current video getting processed. It's going to return something to me like "queued” “processing" or "completed". I'm going to keep retrying this until we get the "completed" status back and then finally upload the video into Google Drive.

### Sora 2 Pricing and Limitations

Sora 2 pricing is currently:

- Standard Sora 2: $0.10 per second ($1.20 for a 12-second video)
- Sora 2 Pro: $0.30 per second ($3.60 for a 12-second video)

Some limitations to be aware of:

- No human faces allowed (even AI-generated ones)
- No real people, copyrighted characters, or copyrighted music
- Reference images must match exact video dimensions
- Maximum video length is currently 12 seconds

The big one to note here is that no real people or faces can appear in this. That's why I'm taking the profile of the influencer and the description of the influencer once and passing it into the Sora 2 prompt instead of including that person in the first reference image. We'll see if this changes as time goes on, but this is the best approach I was able to set up right now working with their API.

## Workflow Link + Other Resources

- YouTube video that walks through this workflow step-by-step: https://www.youtube.com/watch?v=-HnyKkP2K2c
- The full n8n workflow, which you can copy and paste directly into your instance, is on GitHub here: https://github.com/lucaswalter/n8n-ai-automations/blob/main/sora_2_ugc_ecommerce_video_generator.json

Open Reddit thread

\*\*EDIT: Currently Sora 2 API went down after the App was shut down. I am hoping that it was a mistake on their end, but this may be invalidated by their incredulity. My sincere apologies.\*\*

Hey everyone!

I'm Chase, the founder of [Cannon Studio](https://www.cannonstudio.app). Many of you may have seen my post regarding Cannon Studio's extended **OpenAI Official API-based support for Sora**. I worked overnight to try and recapture some of what the Sora app had to offer so that you all may continue to enjoy it until September.

Here's what I built:

\- I added support for Sora-based **Video Extension, Video Editing, and Video Remixing** all based in the Sora 2 API provided by OpenAI officially until September.

\- I extended support for High-Res **1024p Pro generations**

\- I added the ability to Publish your generated videos directly to [Cannon Studio TV](https://www.cannonstudio.app/studio-tv) \- it's no Sora but it's a place to share your work!

\- **GPT Image 2** Support is live!

\- Lowered Prices to deliver Sora 2 **at cost!**

I plan to continue working to bring you direct access to the best of what Sora has to offer! If there are any features missing then please reach out to me directly and I will get back to you within the hour from 7:00 AM to Midnight CST, and I will implement your feature request within a day.

# FAQ:

**What else does Cannon Studio offer?**

Cannon Studio is a state of the art AI filmmaking and video production platform. You can build and reuse a world across multiple organized video projects complete with Characters, Locations, Lore, and much much more. Not only does it offer the best workflow on the market, but it also provides competitively priced access to all of the latest Image, Video, and Audio Models. Seedance 2.0, Kling 3.0, GPT Image 2, Nano Banana, Suno, Elevenlabs, you name it! Everything you need to create and more is available to you on Cannon Studio in a clean organized way.

**Is it free?**

You get 100 credits free, 1 cent = 1 credit. You also get a 3 day free trial, but reach out to me and I can extend this for you!

**Do we get daily free gens?**

Unfortunately not, I am a solo-founder so any free generations, including the sign up credits, come out of my pocket.

**How much does it cost?**

|Sora 2 Standard|720p|
|:-|:-|
|4 Seconds|$0.41|
|8 Seconds|$0.81|
|12 Seconds|$1.22|

|Sora 2 Pro|720p|1024p|
|:-|:-|:-|
|4 Seconds|$1.22|$2.02|
|8 Seconds|$2.43|$4.04|
|12 Seconds|$3.64|$6.06|

**This is directly based on the Official OpenAI pricing with a small markup for Storage Costs on my end.** [**https://developers.openai.com/api/docs/pricing?video-pricing=standard#:\~:text=Price%20per%20second-,sora%2D2,-720p**](https://developers.openai.com/api/docs/pricing?video-pricing=standard#:~:text=Price%20per%20second-,sora%2D2,-720p)

**How can I trust Cannon Studio enough to make a purchase on the platform?**

Our billing is powered by Stripe, we do NOT store any billing info, we have very simple and transparent pricing (1 credit = 1 cent), and we have a community full of creators actively using the application (which you can join via the site, Disc links are against the rules here :D )! Plus, you can try it for free!

**Please feel free to reach out with any questions or concerns. Thank you for taking the time to read over this! I hope to see you on the platform :D**

[https://www.cannonstudio.app](https://www.cannonstudio.app)

Open Reddit thread
View more discussions →
FAQ

Common questions about Sora 2 Pro

What is the maximum video length Sora 2 Pro can generate?

Sora 2 Pro can generate videos up to 25 seconds long, which is the maximum available through the Sora platform for Pro-tier users.

What resolution and frame rate options are available?

Videos can be generated at resolutions up to 1080p with frame rates between 24 and 60 fps.

What is the context window for Sora 2 Pro?

The model has a context window of 5,000 tokens, which governs the length and complexity of the text prompt it can process.

What is the training data cutoff for Sora 2 Pro?

The model's training data has a cutoff of September 2025.

How is Sora 2 Pro accessed?

Sora 2 Pro is accessible via the ChatGPT Pro subscription tier and is also available through Azure AI Foundry for enterprise and API use cases.

What input types does Sora 2 Pro accept?

The model accepts text prompts and image URLs as inputs, allowing users to guide video generation from a written description or a reference image.

More models from OpenAI

Continue browsing adjacent models from the same provider.

← All AI Models