Amazon

Amazon Nova Micro

Amazon Nova Micro is a text-only foundation model developed by Amazon and made available through Amazon Bedrock. It is part of the Amazon Nova family, which includes understanding models (Nova Pro, Nova Lite, and Nova Micro) as well as creative content generation models. Nova Micro is specifically designed to deliver the lowest latency responses within the Nova lineup at very low cost, making it a practical choice for applications where speed and cost efficiency are priorities. Because Nova Micro handles text input and output exclusively, it is well suited for tasks such as summarization, classification, question answering, and other text-based workflows where multimodal capabilities are not required. The model supports a 128,000-token context window, allowing it to process long documents or extended conversations in a single request. It can also be fine-tuned on Amazon Bedrock, enabling developers to adapt it to specific domains or use cases.

Dec 05, 2024 128,000 context 5,000 tokens output
Low Latency Responses Long Context Window Text Generation Fine-Tuning Support Agentic Task Support Cost-Efficient Inference

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

Amazon

Model ID

The routed model identifier exposed by upstream providers.

amazon/nova-micro-v1

Input Context Window

The number of tokens supported by the input context window.

128,000 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

5,000 tokens tokens

Open Source

Whether the model's code is available for public use.

No

Release Date

When the model was first released.

Dec 05, 2024 1 year ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

December 2024

API Providers

The providers that offer this model. This is not an exhaustive list.

Amazon Bedrock

Modalities

Types of data this model can process.

Text

What is Amazon Nova Micro

A fuller summary of positioning, capabilities, and source-specific details for Amazon Nova Micro.

Amazon Nova Micro is a text-only foundation model developed by Amazon and made available through Amazon Bedrock. It is part of the Amazon Nova family, which includes understanding models (Nova Pro, Nova Lite, and Nova Micro) as well as creative content generation models. Nova Micro is specifically designed to deliver the lowest latency responses within the Nova lineup at very low cost, making it a practical choice for applications where speed and cost efficiency are priorities.

Because Nova Micro handles text input and output exclusively, it is well suited for tasks such as summarization, classification, question answering, and other text-based workflows where multimodal capabilities are not required. The model supports a 128,000-token context window, allowing it to process long documents or extended conversations in a single request. It can also be fine-tuned on Amazon Bedrock, enabling developers to adapt it to specific domains or use cases.

Capabilities

What Amazon Nova Micro supports

AI

Low Latency Responses

Designed to return text completions faster than other models in the Nova family, making it suitable for real-time or high-throughput applications.

CTX

Long Context Window

Supports up to 128,000 tokens per request, enabling processing of long documents, transcripts, or multi-turn conversations in a single call.

AI

Text Generation

Generates coherent text output for tasks such as summarization, classification, question answering, and instruction following using text-only input.

AI

Fine-Tuning Support

Can be fine-tuned on Amazon Bedrock to customize model behavior for specific domains or tasks, adjusting the balance of accuracy and cost.

AG

Agentic Task Support

Supports agentic workflows, allowing the model to be used in multi-step task pipelines and tool-use scenarios within Amazon Bedrock.

AI

Cost-Efficient Inference

Positioned as the lowest-cost option in the Nova understanding model tier, making it practical for large-scale or budget-sensitive deployments.

Pricing for Amazon Nova Micro

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

maxTemperature 1
maxResponseSize 5,000 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

Amazon Bedrock

Provider Endpoints

Endpoint-level provider data currently available for this model.

Amazon Bedrock

Max output: 5,120 1d uptime: 91.9% Supported params: 6 Implicit caching: No

Amazon Bedrock

Max output: 5,120 1d uptime: 100.0% Supported params: 6 Implicit caching: No

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark Score
AIME 2024
American math olympiad problems
8.0%
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
35.8%
HLE
Questions that challenge frontier models across many domains
4.7%
LiveCodeBench
Real-world coding tasks from recent competitions
14.0%
MATH-500
Undergraduate and competition-level math problems
70.3%
MMLU-Pro
Expert knowledge across 14 academic disciplines
53.1%
SciCode
Scientific research coding and numerical methods
9.4%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Community discussion

What people think about Amazon Nova Micro

Amazon Nova Micro discussions are most active in r/SillyTavernAI, r/aws, r/ChatGPT. Top Reddit threads cluster around benchmark and model-comparison threads.

The strongest match in this snapshot has 3 upvotes and 1 comments.

r/SillyTavernAI 5 comments November 7, 2025
Need some help with Amazon Bedrock and OpenRouter Integration

I have created an API key on Amazon Bedrock, with the policies AmazonBedrockFullAccess, and AmazonBedrockLimitedAccess attached. Yet when I try to add the key to OpenRouter's BYOK, it says "Key validation failed: Operation not allowed (Tested with: Amazon Bedrock | amazon/nova-micro-v1)"

I have even tested the access on IAM Policy Simulator provided by AWS and both InvokeModel and InvokeModelWithResponseStream returned "allowed" for all resources (\*).

Would greatly appreciate any help... ><

https://preview.redd.it/c089vxbcbrzf1.png?width=382&format=png&auto=webp&s=6e15e50038330a9e881f12de9ebb5b7c605c0afd

https://preview.redd.it/8kal1yvcbrzf1.png?width=1311&format=png&auto=webp&s=96333dc295dc61d8e52bc287c34f6417fa120263

Open Reddit thread

Hey everyone,

I just ran into an issue trying to call **Claude Sonnet 4.5** via the **AWS Bedrock Runtime API**, and I’m hoping someone here might have insights or has faced the same thing.

**Setup:**

* **Account type:** Channel program account (via AWS Partner / Distributor)
* **Region:** `us-east-1`
* **API key:** Valid — works fine for `amazon.nova-micro-v1:0`
* **Model I’m calling:** `anthropic.claude-sonnet-4-5-20250929-v1:0`

Here’s the cURL command I used:

curl -X POST "https://bedrock-runtime.us-east-1.amazonaws.com/model/anthropic.claude-sonnet-4-5-20250929-v1:0/converse" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <valid-token>" \
-d '{
"messages": [
{
"role": "user",
"content": [{"text": "Hello"}]
}
]
}'

And here’s the **error response** I got back:

{
"message": "Invocation of model ID anthropic.claude-sonnet-4-5-20250929-v1:0 with on-demand throughput isn't supported. Retry your request with the ID or ARN of an inference profile that contains this model."
}

After reaching out to AWS Support, I also got this message:

>

Has anyone here successfully accessed Claude Sonnet 4.5 under a channel program account, or know how to obtain the required inference profile ARN?

I seem i can't use any claude variant of models but I can use aws nova variant tho

Any clarification or workaround would be super appreciated 🙏

Here’s a slightly refined and Reddit-ready version of your post — same message, just cleaner formatting and tone so it reads smoothly and attracts good replies:

# [Help] Can't Access Claude Sonnet 4.5 on AWS Bedrock (Channel Program Account)

Hey everyone,

I just ran into an issue trying to call Claude Sonnet 4.5 via the AWS Bedrock Runtime API, and I’m hoping someone here might have insights or has faced the same thing.

Setup

* Account type: Channel program account (via AWS Partner / Distributor)
* Region: us-east-1
* API key: Valid — works fine for amazon.nova-micro-v1:0
* Model I’m calling: anthropic.claude-sonnet-4-5-20250929-v1:0

cURL command:

curl -X POST "https://bedrock-runtime.us-east-1.amazonaws.com/model/anthropic.claude-sonnet-4-5-20250929-v1:0/converse" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <valid-token>" \
-d '{
"messages": [
{
"role": "user",
"content": [{"text": "Hello"}]
}
]
}'

Error response:

{
"message": "Invocation of model ID anthropic.claude-sonnet-4-5-20250929-v1:0 with on-demand throughput isn't supported. Retry your request with the ID or ARN of an inference profile that contains this model."
}

After reaching out to AWS Support, I got this message back:

>

It seems like I can’t use any Claude variant (Sonnet, Haiku, etc.), but I can use AWS Nova models just fine.

Has anyone here successfully accessed Claude Sonnet 4.5 under a channel program account, or know how to obtain the required inference profile ARN?

Any clarification or workaround would be super appreciated 🙏

Open Reddit thread

Today, the AI model that hallucinates the least is Google Gemini 2.0 Flash 001, with a factual consistency rate of 99.3%. This score is encouraging because it means that we're relatively close to solving the hallucination problem.

https://github.com/vectara/hallucination-leaderboard

What would happen if we built an AI agent that would first query Google Gemini 2.5 Pro about something, (because it is currently the most powerful model, completely dominating the Chatbot Arena Leaderboard by almost 40 points) and then ran the answer it generated by other models to catch any inaccuracies it may have generated?

https://lmarena.ai/?leaderboard

We presume that the different AI developers use different data sets to build their models, so while one may hallucinate about a certain query, it's possible that another would not. What would happen if we instructed our AI agent to run the content Gemini 2.5 generated through the next ten models by other developers, asking them each to analyze the answer for factual consistency?

Could this be a way to arrive at a factual consistency for answers that is perhaps 99.9% or higher? Could this be done relatively inexpensively and completely automatically?

Below are ten top models that our AI agent would run Gemini 2.5's answer through, ranked according to their factual consistency rate score. I asked 2.5 to comment on the usefulness of this approach, and point out some possible flaws and limitations, and end this post with its answer. While the technique may be relatively expensive, in some cases the usefulness of the accuracy may justify the extra expense.

OpenAI-o3-mini-high-reasoning - 99.2%
Zhipu AI GLM-4-9B-Chat - 98.7%
Amazon Nova-Micro-V1 - 98.4%
Qwen2-72B-Instruct - 95.3%
Mixtral-8x22B-Instruct-v0.1 - 95.3%
Anthropic Claude-3-5-haiku - 95.1%
01-AI Yi-1.5-9B-Chat - 95.1%
Cohere Command-R - 95.1%
IBM Granite-3.0-8B-Instruct - 93.5%
Microsoft phi-2 - 93.3%

Gemini 2.5 Pro:

"Okay, here's a concise summary of the analysis:
Your proposal involves using Gemini 2.5 Pro to generate an answer and then having 10 other models (ranked high for factual consistency) verify it. The goal is automated, inexpensive, near-perfect (99.9%+) factual accuracy.
Potential & Feasibility:
* Improved Accuracy: Yes, this ensemble approach could significantly increase factual accuracy compared to using a single model by catching diverse errors. Reaching 99.9%+ is ambitious but directionally plausible.
* Automation: The process is fully automatable using APIs.
Key Challenges & Limitations:
* Cost & Speed: Running 11 models per query will be substantially more expensive and slower than using one. "Relatively inexpensive" is unlikely.
* Consensus Logic: Deciding how to handle disagreements among the 10 verifiers is complex. A simple majority vote might miss valid corrections or reinforce errors.
* Verifier Imperfection: The verification models themselves aren't perfect and can make mistakes (hallucinate) or fail to spot errors.
* Shared Errors: Models might share common misconceptions from similar training data, leading to false consensus.
* Complexity: Designing effective verification prompts and consensus mechanisms is non-trivial.
In Short:
The idea leverages model diversity for better accuracy and is automatable. However, it faces significant hurdles in cost, speed, and the complexity of managing verification results. While promising, it's not a simple or cheap solution for achieving near-perfect factual consistency."

Open Reddit thread

Today, the AI model that hallucinates the least is Google Gemini 2.0 Flash 001, with a factual consistency rate of 99.3%. This score is encouraging because it means that we're relatively close to solving the hallucination problem.

https://github.com/vectara/hallucination-leaderboard

What would happen if we built an AI agent that would first query Google Gemini 2.5 Pro about something, (because it is currently the most powerful model, completely dominating the Chatbot Arena Leaderboard by almost 40 points) and then ran the answer it generated by other models to catch any inaccuracies it may have generated?

https://lmarena.ai/?leaderboard

We presume that the different AI developers use different data sets to build their models, so while one may hallucinate about a certain query, it's possible that another would not. What would happen if we instructed our AI agent to run the content Gemini 2.5 generated through the next ten models by other developers, asking them each to analyze the answer for factual consistency?

Could this be a way to arrive at a factual consistency for answers that is perhaps 99.9% or higher? Could this be done relatively inexpensively and completely automatically?

Below are ten top models that our AI agent would run Gemini 2.5's answer through, ranked according to their factual consistency rate score. I asked 2.5 to comment on the usefulness of this approach, and point out some possible flaws and limitations, and end this post with its answer. While the technique may be relatively expensive, in some cases the usefulness of the accuracy may justify the extra expense.

OpenAI-o3-mini-high-reasoning - 99.2%
Zhipu AI GLM-4-9B-Chat - 98.7%
Amazon Nova-Micro-V1 - 98.4%
Qwen2-72B-Instruct - 95.3%
Mixtral-8x22B-Instruct-v0.1 - 95.3%
Anthropic Claude-3-5-haiku - 95.1%
01-AI Yi-1.5-9B-Chat - 95.1%
Cohere Command-R - 95.1%
IBM Granite-3.0-8B-Instruct - 93.5%
Microsoft phi-2 - 93.3%

Gemini 2.5 Pro:

"Okay, here's a concise summary of the analysis:
Your proposal involves using Gemini 2.5 Pro to generate an answer and then having 10 other models (ranked high for factual consistency) verify it. The goal is automated, inexpensive, near-perfect (99.9%+) factual accuracy.
Potential & Feasibility:
* Improved Accuracy: Yes, this ensemble approach could significantly increase factual accuracy compared to using a single model by catching diverse errors. Reaching 99.9%+ is ambitious but directionally plausible.
* Automation: The process is fully automatable using APIs.
Key Challenges & Limitations:
* Cost & Speed: Running 11 models per query will be substantially more expensive and slower than using one. "Relatively inexpensive" is unlikely.
* Consensus Logic: Deciding how to handle disagreements among the 10 verifiers is complex. A simple majority vote might miss valid corrections or reinforce errors.
* Verifier Imperfection: The verification models themselves aren't perfect and can make mistakes (hallucinate) or fail to spot errors.
* Shared Errors: Models might share common misconceptions from similar training data, leading to false consensus.
* Complexity: Designing effective verification prompts and consensus mechanisms is non-trivial.
In Short:
The idea leverages model diversity for better accuracy and is automatable. However, it faces significant hurdles in cost, speed, and the complexity of managing verification results. While promising, it's not a simple or cheap solution for achieving near-perfect factual consistency."

Open Reddit thread

Today, the AI model that hallucinates the least is Google Gemini 2.0 Flash 001, with a factual consistency rate of 99.3%. This score is encouraging because it means that we're relatively close to solving the hallucination problem.

https://github.com/vectara/hallucination-leaderboard

What would happen if we built an AI agent that would first query Google Gemini 2.5 Pro about something, (because it is currently the most powerful model, completely dominating the Chatbot Arena Leaderboard by almost 40 points) and then ran the answer it generated by other models to catch any inaccuracies it may have generated?

https://lmarena.ai/?leaderboard

We presume that the different AI developers use different data sets to build their models, so while one may hallucinate about a certain query, it's possible that another would not. What would happen if we instructed our AI agent to run the content Gemini 2.5 generated through the next ten models by other developers, asking them each to analyze the answer for factual consistency?

Could this be a way to arrive at a factual consistency for answers that is perhaps 99.9% or higher? Could this be done relatively inexpensively and completely automatically?

Below are ten top models that our AI agent would run Gemini 2.5's answer through, ranked according to their factual consistency rate score. I asked 2.5 to comment on the usefulness of this approach, and point out some possible flaws and limitations, and end this post with its answer. While the technique may be relatively expensive, in some cases the usefulness of the accuracy may justify the extra expense.

OpenAI-o3-mini-high-reasoning - 99.2%
Zhipu AI GLM-4-9B-Chat - 98.7%
Amazon Nova-Micro-V1 - 98.4%
Qwen2-72B-Instruct - 95.3%
Mixtral-8x22B-Instruct-v0.1 - 95.3%
Anthropic Claude-3-5-haiku - 95.1%
01-AI Yi-1.5-9B-Chat - 95.1%
Cohere Command-R - 95.1%
IBM Granite-3.0-8B-Instruct - 93.5%
Microsoft phi-2 - 93.3%

Gemini 2.5 Pro:

"Okay, here's a concise summary of the analysis:
Your proposal involves using Gemini 2.5 Pro to generate an answer and then having 10 other models (ranked high for factual consistency) verify it. The goal is automated, inexpensive, near-perfect (99.9%+) factual accuracy.
Potential & Feasibility:
* Improved Accuracy: Yes, this ensemble approach could significantly increase factual accuracy compared to using a single model by catching diverse errors. Reaching 99.9%+ is ambitious but directionally plausible.
* Automation: The process is fully automatable using APIs.
Key Challenges & Limitations:
* Cost & Speed: Running 11 models per query will be substantially more expensive and slower than using one. "Relatively inexpensive" is unlikely.
* Consensus Logic: Deciding how to handle disagreements among the 10 verifiers is complex. A simple majority vote might miss valid corrections or reinforce errors.
* Verifier Imperfection: The verification models themselves aren't perfect and can make mistakes (hallucinate) or fail to spot errors.
* Shared Errors: Models might share common misconceptions from similar training data, leading to false consensus.
* Complexity: Designing effective verification prompts and consensus mechanisms is non-trivial.
In Short:
The idea leverages model diversity for better accuracy and is automatable. However, it faces significant hurdles in cost, speed, and the complexity of managing verification results. While promising, it's not a simple or cheap solution for achieving near-perfect factual consistency."

Open Reddit thread
View more discussions →
FAQ

Common questions about Amazon Nova Micro

What is the context window size for Amazon Nova Micro?

Amazon Nova Micro supports a context window of 128,000 tokens, allowing long documents or extended conversations to be processed in a single request.

Does Amazon Nova Micro support image or video inputs?

No. Amazon Nova Micro is a text-only model and does not accept image, video, or audio inputs. For multimodal input support, Amazon Nova Lite or Nova Pro would be required.

Where is Amazon Nova Micro available?

Amazon Nova Micro is available through Amazon Bedrock, Amazon's managed service for foundation models.

Can Amazon Nova Micro be fine-tuned?

Yes. Amazon Nova Micro supports fine-tuning on Amazon Bedrock, which allows developers to customize the model for specific domains or use cases.

What was the training data cutoff for Amazon Nova Micro?

Based on available metadata, Amazon Nova Micro was released in December 2024. Amazon has not publicly specified an exact training data cutoff date beyond this release timeframe.

More models from Amazon

Continue browsing adjacent models from the same provider.

← All AI Models