Low Latency Responses
Designed to return text completions faster than other models in the Nova family, making it suitable for real-time or high-throughput applications.
Amazon Nova Micro is a text-only foundation model developed by Amazon and made available through Amazon Bedrock. It is part of the Amazon Nova family, which includes understanding models (Nova Pro, Nova Lite, and Nova Micro) as well as creative content generation models. Nova Micro is specifically designed to deliver the lowest latency responses within the Nova lineup at very low cost, making it a practical choice for applications where speed and cost efficiency are priorities. Because Nova Micro handles text input and output exclusively, it is well suited for tasks such as summarization, classification, question answering, and other text-based workflows where multimodal capabilities are not required. The model supports a 128,000-token context window, allowing it to process long documents or extended conversations in a single request. It can also be fine-tuned on Amazon Bedrock, enabling developers to adapt it to specific domains or use cases.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The routed model identifier exposed by upstream providers.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for Amazon Nova Micro.
Amazon Nova Micro is a text-only foundation model developed by Amazon and made available through Amazon Bedrock. It is part of the Amazon Nova family, which includes understanding models (Nova Pro, Nova Lite, and Nova Micro) as well as creative content generation models. Nova Micro is specifically designed to deliver the lowest latency responses within the Nova lineup at very low cost, making it a practical choice for applications where speed and cost efficiency are priorities.
Because Nova Micro handles text input and output exclusively, it is well suited for tasks such as summarization, classification, question answering, and other text-based workflows where multimodal capabilities are not required. The model supports a 128,000-token context window, allowing it to process long documents or extended conversations in a single request. It can also be fine-tuned on Amazon Bedrock, enabling developers to adapt it to specific domains or use cases.
Designed to return text completions faster than other models in the Nova family, making it suitable for real-time or high-throughput applications.
Supports up to 128,000 tokens per request, enabling processing of long documents, transcripts, or multi-turn conversations in a single call.
Generates coherent text output for tasks such as summarization, classification, question answering, and instruction following using text-only input.
Can be fine-tuned on Amazon Bedrock to customize model behavior for specific domains or tasks, adjusting the balance of accuracy and cost.
Supports agentic workflows, allowing the model to be used in multi-step task pipelines and tool-use scenarios within Amazon Bedrock.
Positioned as the lowest-cost option in the Nova understanding model tier, making it practical for large-scale or budget-sensitive deployments.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Additional usage-cost dimensions synced into the project for this model.
Places where this model is available, based on the synced detail-page metadata.
Endpoint-level provider data currently available for this model.
Benchmark scores synced from the current model source and normalized into the local catalog.
| Benchmark | Score |
|---|---|
|
AIME 2024
American math olympiad problems
|
|
|
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
|
|
|
HLE
Questions that challenge frontier models across many domains
|
|
|
LiveCodeBench
Real-world coding tasks from recent competitions
|
|
|
MATH-500
Undergraduate and competition-level math problems
|
|
|
MMLU-Pro
Expert knowledge across 14 academic disciplines
|
|
|
SciCode
Scientific research coding and numerical methods
|
Official model cards, release notes, docs, and other references synced from the source page.
Amazon Nova Micro discussions are most active in r/SillyTavernAI, r/aws, r/ChatGPT. Top Reddit threads cluster around benchmark and model-comparison threads.
The strongest match in this snapshot has 3 upvotes and 1 comments.
I have created an API key on Amazon Bedrock, with the policies AmazonBedrockFullAccess, and AmazonBedrockLimitedAccess attached. Yet when I try to add the key to OpenRouter's BYOK, it says "Key validation failed: Operation not allowed (Tested with: Amazon Bedrock | amazon/nova-micro-v1)"
I have even tested the access on IAM Policy Simulator provided by AWS and both InvokeModel and InvokeModelWithResponseStream returned "allowed" for all resources (\*).
Would greatly appreciate any help... ><
https://preview.redd.it/c089vxbcbrzf1.png?width=382&format=png&auto=webp&s=6e15e50038330a9e881f12de9ebb5b7c605c0afd
https://preview.redd.it/8kal1yvcbrzf1.png?width=1311&format=png&auto=webp&s=96333dc295dc61d8e52bc287c34f6417fa120263
Hey everyone,
I just ran into an issue trying to call **Claude Sonnet 4.5** via the **AWS Bedrock Runtime API**, and I’m hoping someone here might have insights or has faced the same thing.
**Setup:**
* **Account type:** Channel program account (via AWS Partner / Distributor)
* **Region:** `us-east-1`
* **API key:** Valid — works fine for `amazon.nova-micro-v1:0`
* **Model I’m calling:** `anthropic.claude-sonnet-4-5-20250929-v1:0`
Here’s the cURL command I used:
curl -X POST "https://bedrock-runtime.us-east-1.amazonaws.com/model/anthropic.claude-sonnet-4-5-20250929-v1:0/converse" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <valid-token>" \
-d '{
"messages": [
{
"role": "user",
"content": [{"text": "Hello"}]
}
]
}'
And here’s the **error response** I got back:
{
"message": "Invocation of model ID anthropic.claude-sonnet-4-5-20250929-v1:0 with on-demand throughput isn't supported. Retry your request with the ID or ARN of an inference profile that contains this model."
}
After reaching out to AWS Support, I also got this message:
>
Has anyone here successfully accessed Claude Sonnet 4.5 under a channel program account, or know how to obtain the required inference profile ARN?
I seem i can't use any claude variant of models but I can use aws nova variant tho
Any clarification or workaround would be super appreciated 🙏
Here’s a slightly refined and Reddit-ready version of your post — same message, just cleaner formatting and tone so it reads smoothly and attracts good replies:
# [Help] Can't Access Claude Sonnet 4.5 on AWS Bedrock (Channel Program Account)
Hey everyone,
I just ran into an issue trying to call Claude Sonnet 4.5 via the AWS Bedrock Runtime API, and I’m hoping someone here might have insights or has faced the same thing.
Setup
* Account type: Channel program account (via AWS Partner / Distributor)
* Region: us-east-1
* API key: Valid — works fine for amazon.nova-micro-v1:0
* Model I’m calling: anthropic.claude-sonnet-4-5-20250929-v1:0
cURL command:
curl -X POST "https://bedrock-runtime.us-east-1.amazonaws.com/model/anthropic.claude-sonnet-4-5-20250929-v1:0/converse" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <valid-token>" \
-d '{
"messages": [
{
"role": "user",
"content": [{"text": "Hello"}]
}
]
}'
Error response:
{
"message": "Invocation of model ID anthropic.claude-sonnet-4-5-20250929-v1:0 with on-demand throughput isn't supported. Retry your request with the ID or ARN of an inference profile that contains this model."
}
After reaching out to AWS Support, I got this message back:
>
It seems like I can’t use any Claude variant (Sonnet, Haiku, etc.), but I can use AWS Nova models just fine.
Has anyone here successfully accessed Claude Sonnet 4.5 under a channel program account, or know how to obtain the required inference profile ARN?
Any clarification or workaround would be super appreciated 🙏
Today, the AI model that hallucinates the least is Google Gemini 2.0 Flash 001, with a factual consistency rate of 99.3%. This score is encouraging because it means that we're relatively close to solving the hallucination problem.
https://github.com/vectara/hallucination-leaderboard
What would happen if we built an AI agent that would first query Google Gemini 2.5 Pro about something, (because it is currently the most powerful model, completely dominating the Chatbot Arena Leaderboard by almost 40 points) and then ran the answer it generated by other models to catch any inaccuracies it may have generated?
https://lmarena.ai/?leaderboard
We presume that the different AI developers use different data sets to build their models, so while one may hallucinate about a certain query, it's possible that another would not. What would happen if we instructed our AI agent to run the content Gemini 2.5 generated through the next ten models by other developers, asking them each to analyze the answer for factual consistency?
Could this be a way to arrive at a factual consistency for answers that is perhaps 99.9% or higher? Could this be done relatively inexpensively and completely automatically?
Below are ten top models that our AI agent would run Gemini 2.5's answer through, ranked according to their factual consistency rate score. I asked 2.5 to comment on the usefulness of this approach, and point out some possible flaws and limitations, and end this post with its answer. While the technique may be relatively expensive, in some cases the usefulness of the accuracy may justify the extra expense.
OpenAI-o3-mini-high-reasoning - 99.2%
Zhipu AI GLM-4-9B-Chat - 98.7%
Amazon Nova-Micro-V1 - 98.4%
Qwen2-72B-Instruct - 95.3%
Mixtral-8x22B-Instruct-v0.1 - 95.3%
Anthropic Claude-3-5-haiku - 95.1%
01-AI Yi-1.5-9B-Chat - 95.1%
Cohere Command-R - 95.1%
IBM Granite-3.0-8B-Instruct - 93.5%
Microsoft phi-2 - 93.3%
Gemini 2.5 Pro:
"Okay, here's a concise summary of the analysis:
Your proposal involves using Gemini 2.5 Pro to generate an answer and then having 10 other models (ranked high for factual consistency) verify it. The goal is automated, inexpensive, near-perfect (99.9%+) factual accuracy.
Potential & Feasibility:
* Improved Accuracy: Yes, this ensemble approach could significantly increase factual accuracy compared to using a single model by catching diverse errors. Reaching 99.9%+ is ambitious but directionally plausible.
* Automation: The process is fully automatable using APIs.
Key Challenges & Limitations:
* Cost & Speed: Running 11 models per query will be substantially more expensive and slower than using one. "Relatively inexpensive" is unlikely.
* Consensus Logic: Deciding how to handle disagreements among the 10 verifiers is complex. A simple majority vote might miss valid corrections or reinforce errors.
* Verifier Imperfection: The verification models themselves aren't perfect and can make mistakes (hallucinate) or fail to spot errors.
* Shared Errors: Models might share common misconceptions from similar training data, leading to false consensus.
* Complexity: Designing effective verification prompts and consensus mechanisms is non-trivial.
In Short:
The idea leverages model diversity for better accuracy and is automatable. However, it faces significant hurdles in cost, speed, and the complexity of managing verification results. While promising, it's not a simple or cheap solution for achieving near-perfect factual consistency."
Today, the AI model that hallucinates the least is Google Gemini 2.0 Flash 001, with a factual consistency rate of 99.3%. This score is encouraging because it means that we're relatively close to solving the hallucination problem.
https://github.com/vectara/hallucination-leaderboard
What would happen if we built an AI agent that would first query Google Gemini 2.5 Pro about something, (because it is currently the most powerful model, completely dominating the Chatbot Arena Leaderboard by almost 40 points) and then ran the answer it generated by other models to catch any inaccuracies it may have generated?
https://lmarena.ai/?leaderboard
We presume that the different AI developers use different data sets to build their models, so while one may hallucinate about a certain query, it's possible that another would not. What would happen if we instructed our AI agent to run the content Gemini 2.5 generated through the next ten models by other developers, asking them each to analyze the answer for factual consistency?
Could this be a way to arrive at a factual consistency for answers that is perhaps 99.9% or higher? Could this be done relatively inexpensively and completely automatically?
Below are ten top models that our AI agent would run Gemini 2.5's answer through, ranked according to their factual consistency rate score. I asked 2.5 to comment on the usefulness of this approach, and point out some possible flaws and limitations, and end this post with its answer. While the technique may be relatively expensive, in some cases the usefulness of the accuracy may justify the extra expense.
OpenAI-o3-mini-high-reasoning - 99.2%
Zhipu AI GLM-4-9B-Chat - 98.7%
Amazon Nova-Micro-V1 - 98.4%
Qwen2-72B-Instruct - 95.3%
Mixtral-8x22B-Instruct-v0.1 - 95.3%
Anthropic Claude-3-5-haiku - 95.1%
01-AI Yi-1.5-9B-Chat - 95.1%
Cohere Command-R - 95.1%
IBM Granite-3.0-8B-Instruct - 93.5%
Microsoft phi-2 - 93.3%
Gemini 2.5 Pro:
"Okay, here's a concise summary of the analysis:
Your proposal involves using Gemini 2.5 Pro to generate an answer and then having 10 other models (ranked high for factual consistency) verify it. The goal is automated, inexpensive, near-perfect (99.9%+) factual accuracy.
Potential & Feasibility:
* Improved Accuracy: Yes, this ensemble approach could significantly increase factual accuracy compared to using a single model by catching diverse errors. Reaching 99.9%+ is ambitious but directionally plausible.
* Automation: The process is fully automatable using APIs.
Key Challenges & Limitations:
* Cost & Speed: Running 11 models per query will be substantially more expensive and slower than using one. "Relatively inexpensive" is unlikely.
* Consensus Logic: Deciding how to handle disagreements among the 10 verifiers is complex. A simple majority vote might miss valid corrections or reinforce errors.
* Verifier Imperfection: The verification models themselves aren't perfect and can make mistakes (hallucinate) or fail to spot errors.
* Shared Errors: Models might share common misconceptions from similar training data, leading to false consensus.
* Complexity: Designing effective verification prompts and consensus mechanisms is non-trivial.
In Short:
The idea leverages model diversity for better accuracy and is automatable. However, it faces significant hurdles in cost, speed, and the complexity of managing verification results. While promising, it's not a simple or cheap solution for achieving near-perfect factual consistency."
Today, the AI model that hallucinates the least is Google Gemini 2.0 Flash 001, with a factual consistency rate of 99.3%. This score is encouraging because it means that we're relatively close to solving the hallucination problem.
https://github.com/vectara/hallucination-leaderboard
What would happen if we built an AI agent that would first query Google Gemini 2.5 Pro about something, (because it is currently the most powerful model, completely dominating the Chatbot Arena Leaderboard by almost 40 points) and then ran the answer it generated by other models to catch any inaccuracies it may have generated?
https://lmarena.ai/?leaderboard
We presume that the different AI developers use different data sets to build their models, so while one may hallucinate about a certain query, it's possible that another would not. What would happen if we instructed our AI agent to run the content Gemini 2.5 generated through the next ten models by other developers, asking them each to analyze the answer for factual consistency?
Could this be a way to arrive at a factual consistency for answers that is perhaps 99.9% or higher? Could this be done relatively inexpensively and completely automatically?
Below are ten top models that our AI agent would run Gemini 2.5's answer through, ranked according to their factual consistency rate score. I asked 2.5 to comment on the usefulness of this approach, and point out some possible flaws and limitations, and end this post with its answer. While the technique may be relatively expensive, in some cases the usefulness of the accuracy may justify the extra expense.
OpenAI-o3-mini-high-reasoning - 99.2%
Zhipu AI GLM-4-9B-Chat - 98.7%
Amazon Nova-Micro-V1 - 98.4%
Qwen2-72B-Instruct - 95.3%
Mixtral-8x22B-Instruct-v0.1 - 95.3%
Anthropic Claude-3-5-haiku - 95.1%
01-AI Yi-1.5-9B-Chat - 95.1%
Cohere Command-R - 95.1%
IBM Granite-3.0-8B-Instruct - 93.5%
Microsoft phi-2 - 93.3%
Gemini 2.5 Pro:
"Okay, here's a concise summary of the analysis:
Your proposal involves using Gemini 2.5 Pro to generate an answer and then having 10 other models (ranked high for factual consistency) verify it. The goal is automated, inexpensive, near-perfect (99.9%+) factual accuracy.
Potential & Feasibility:
* Improved Accuracy: Yes, this ensemble approach could significantly increase factual accuracy compared to using a single model by catching diverse errors. Reaching 99.9%+ is ambitious but directionally plausible.
* Automation: The process is fully automatable using APIs.
Key Challenges & Limitations:
* Cost & Speed: Running 11 models per query will be substantially more expensive and slower than using one. "Relatively inexpensive" is unlikely.
* Consensus Logic: Deciding how to handle disagreements among the 10 verifiers is complex. A simple majority vote might miss valid corrections or reinforce errors.
* Verifier Imperfection: The verification models themselves aren't perfect and can make mistakes (hallucinate) or fail to spot errors.
* Shared Errors: Models might share common misconceptions from similar training data, leading to false consensus.
* Complexity: Designing effective verification prompts and consensus mechanisms is non-trivial.
In Short:
The idea leverages model diversity for better accuracy and is automatable. However, it faces significant hurdles in cost, speed, and the complexity of managing verification results. While promising, it's not a simple or cheap solution for achieving near-perfect factual consistency."
Amazon Nova Micro supports a context window of 128,000 tokens, allowing long documents or extended conversations to be processed in a single request.
No. Amazon Nova Micro is a text-only model and does not accept image, video, or audio inputs. For multimodal input support, Amazon Nova Lite or Nova Pro would be required.
Amazon Nova Micro is available through Amazon Bedrock, Amazon's managed service for foundation models.
Yes. Amazon Nova Micro supports fine-tuning on Amazon Bedrock, which allows developers to customize the model for specific domains or use cases.
Based on available metadata, Amazon Nova Micro was released in December 2024. Amazon has not publicly specified an exact training data cutoff date beyond this release timeframe.
Continue browsing adjacent models from the same provider.