ByteDance

Omni Human 1.5

OmniHuman 1.5 is an avatar animation model developed by ByteDance that converts still images into fully animated digital humans using audio input. It generates synchronized lip movements, facial expressions, and body language by combining audio signals with semantic understanding from Multimodal Large Language Models. The model is built on a dual-system cognitive architecture inspired by System 1 and System 2 theory, enabling both fast reactive animations and deliberate, context-aware responses. It supports a context window of 50,000 tokens and was trained through September 2025. The model works across a wide range of visual styles, including realistic photographs, anime characters, illustrated portraits, and stylized artwork, as well as non-human subjects like animals and anthropomorphic figures. It can produce videos exceeding one minute in length with dynamic motion, camera movement, and multi-character interactions. OmniHuman 1.5 is suited for use cases such as virtual persona creation, NPC animation in games, AI spokesperson production, virtual instructor development, and video content creation without large production teams. It accepts image URLs and audio URLs as inputs.

Unknown 50,000 context N/A output
Lip Sync Animation Facial Expression Generation Image-to-Video Conversion Extended Video Output Cross-Domain Avatar Support Cognitive Dual-System Architecture

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

ByteDance

Input Context Window

The number of tokens supported by the input context window.

50,000 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

N/A tokens

Open Source

Whether the model's code is available for public use.

No

Release Date

When the model was first released.

Unknown

Knowledge Cut-off Date

When the model's knowledge was last updated.

Unknown

API Providers

The providers that offer this model. This is not an exhaustive list.

ByteDance

Modalities

Types of data this model can process.

Text Image Video Audio

What is Omni Human 1.5

A fuller summary of positioning, capabilities, and source-specific details for Omni Human 1.5.

OmniHuman 1.5 is an avatar animation model developed by ByteDance that converts still images into fully animated digital humans using audio input. It generates synchronized lip movements, facial expressions, and body language by combining audio signals with semantic understanding from Multimodal Large Language Models. The model is built on a dual-system cognitive architecture inspired by System 1 and System 2 theory, enabling both fast reactive animations and deliberate, context-aware responses. It supports a context window of 50,000 tokens and was trained through September 2025.

The model works across a wide range of visual styles, including realistic photographs, anime characters, illustrated portraits, and stylized artwork, as well as non-human subjects like animals and anthropomorphic figures. It can produce videos exceeding one minute in length with dynamic motion, camera movement, and multi-character interactions. OmniHuman 1.5 is suited for use cases such as virtual persona creation, NPC animation in games, AI spokesperson production, virtual instructor development, and video content creation without large production teams. It accepts image URLs and audio URLs as inputs.

Capabilities

What Omni Human 1.5 supports

AI

Lip Sync Animation

Generates frame-accurate lip movements synchronized to an audio input URL, aligning phoneme timing with spoken content.

AI

Facial Expression Generation

Produces micro-expressions and eye movements that reflect the emotional and semantic content of the speech, derived from Multimodal LLM understanding.

IMG

Image-to-Video Conversion

Animates a static image URL into a video, supporting realistic photos, anime, illustrated portraits, and stylized artwork as input.

VID

Extended Video Output

Generates videos longer than one minute with dynamic motion, camera movement, and support for multi-character interactions.

AI

Cross-Domain Avatar Support

Handles humans, animals, anthropomorphic figures, and cartoon characters, making it usable across diverse visual styles and subject types.

AI

Cognitive Dual-System Architecture

Uses a System 1 and System 2 inspired architecture to simulate both fast intuitive reactions and deliberate, context-aware body language responses.

Pricing for Omni Human 1.5

Primary API pricing shown in the same “quick compare” spirit as the reference page.

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

ByteDance

Configuration & Parameters

The configurable options currently documented for this model.

Image

Image URL

Image to be lip synced.

Audio

Audio URL

Audio to be lip synced.

Supported Request Parameters

Parameters currently listed by OpenRouter or the local catalog for this model.

Image Audio

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Community discussion

What people think about Omni Human 1.5

Omni Human 1.5 discussions are most active in r/Freepik_AI. The strongest match in this snapshot has 1 upvotes and 3 comments.

I have a complaint about Freepik’s mobile app, and I’m wondering if anyone has info or if Freepik has shared anything official.
On desktop, models like Kling 3.0 Motion Control and Omni Human 1.5 are great—they let us upload videos (and audio for Omni Human) for Motion Control or Lip Sync tasks. But on the mobile app, we can’t upload videos for these models at all. It’s limiting when working on the go.
Has Freepik said when they’ll add video/audio upload support for these models on the Android app? Any timeline or roadmap? Would love to hear if anyone knows more!

Open Reddit thread
r/Freepik_AI 1 upvotes 1 comments February 2, 2026
Omni Human 1.5 Settings?

Where do i set the length for Omni Human 1.5 Outputs? My audio is 15 seconds long but it only generates 10 seconds, no matter what. pls fix!

Open Reddit thread
View more discussions →
FAQ

Common questions about Omni Human 1.5

What input types does OmniHuman 1.5 accept?

OmniHuman 1.5 accepts two input types: an image URL (the source portrait or character image) and an audio URL (the speech or sound that drives the animation).

What is the context window for OmniHuman 1.5?

OmniHuman 1.5 has a context window of 50,000 tokens.

What visual styles and subject types does the model support?

The model supports realistic photographs, anime characters, illustrated portraits, stylized artwork, animals, anthropomorphic figures, and cartoons — not just human faces.

How long can the generated videos be?

OmniHuman 1.5 can produce videos over one minute in length, with dynamic motion, camera movement, and multi-character interactions.

When was OmniHuman 1.5 trained and who developed it?

OmniHuman 1.5 was developed by ByteDance with a training date of September 2025.

More models from ByteDance

Continue browsing adjacent models from the same provider.

← All AI Models