OpenAI

GPT-4o Mini Vision

GPT-4o Mini Vision is a multimodal language model developed by OpenAI, released in mid-2024. It is a smaller, more cost-efficient variant of the GPT-4o family, designed to process both text and images within a single context window of 128,000 tokens. The model supports the same range of languages as GPT-4o and is optimized for low latency, making it suitable for high-throughput or real-time applications. The model is well-suited for tasks that require fast responses at scale, such as customer-facing chat interfaces, document analysis with visual content, and pipelines where cost per token is a primary constraint. Its multimodal reasoning capability allows it to interpret images alongside text in the same request. Developers working with large volumes of context or needing to process mixed text-and-image inputs at reduced cost are the primary intended audience.

Unknown 128,000 context 16,383 tokens output

Image Understanding Large Context Window Low Latency Responses Cost-Efficient Inference Multilingual Text Processing Structured Output

Overview ↓ About ↓ Capabilities ↓ Pricing ↓ Price Comparison ↓ Parameters ↓ Benchmarks ↓ Tools ↓ Daily ↓ Resources ↓ Community ↓ FAQ ↓

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

OpenAI

Input Context Window

The number of tokens supported by the input context window.

128,000 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

16,383 tokens tokens

Open Source

Whether the model's code is available for public use.

Release Date

When the model was first released.

Unknown

Knowledge Cut-off Date

When the model's knowledge was last updated.

Unknown

API Providers

The providers that offer this model. This is not an exhaustive list.

OpenAI API

Modalities

Types of data this model can process.

Text Image

What is GPT-4o Mini Vision

A fuller summary of positioning, capabilities, and source-specific details for GPT-4o Mini Vision.

GPT-4o Mini Vision is a multimodal language model developed by OpenAI, released in mid-2024. It is a smaller, more cost-efficient variant of the GPT-4o family, designed to process both text and images within a single context window of 128,000 tokens. The model supports the same range of languages as GPT-4o and is optimized for low latency, making it suitable for high-throughput or real-time applications.

The model is well-suited for tasks that require fast responses at scale, such as customer-facing chat interfaces, document analysis with visual content, and pipelines where cost per token is a primary constraint. Its multimodal reasoning capability allows it to interpret images alongside text in the same request. Developers working with large volumes of context or needing to process mixed text-and-image inputs at reduced cost are the primary intended audience.

Capabilities

What GPT-4o Mini Vision supports

IMG

Image Understanding

Accepts image inputs alongside text in a single request, enabling the model to describe, analyze, or answer questions about visual content.

CTX

Large Context Window

Supports up to 128,000 tokens of context per request, allowing long documents, conversation histories, or multiple images to be passed in one call.

Low Latency Responses

Optimized for fast inference, making it suitable for real-time applications such as customer chat interfaces or interactive tools.

Cost-Efficient Inference

Priced significantly lower per token than larger GPT-4o variants, enabling high-volume deployments without proportional cost increases.

Multilingual Text Processing

Supports the same broad set of languages as GPT-4o, covering text generation, comprehension, and reasoning across multiple languages.

JSON

Structured Output

Can return responses in structured formats such as JSON, useful for downstream data processing or API integrations.

Pricing for GPT-4o Mini Vision

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Input tokens $0.15 Per million tokens

Output tokens N/A Per million tokens

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

maxTemperature 2

maxResponseSize 16,383 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

OpenAI API

Configuration & Parameters

The configurable options currently documented for this model.

Temperature

Number

Default: 1 Range: 0 - 2 (step 0.1)

Max Response Tokens

Number

Default: 8191 Range: 1 - 16383 (step 1)

Supported Request Parameters

Parameters currently listed by OpenRouter or the local catalog for this model.

Temperature Max Response Tokens

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark	Score
AIME 2024 American math olympiad problems	15.0%
GPQA Diamond PhD-level science questions (biology, physics, chemistry)	54.3%
HLE Questions that challenge frontier models across many domains	3.3%
LiveCodeBench Real-world coding tasks from recent competitions	30.9%
MATH-500 Undergraduate and competition-level math problems	75.9%
MMLU-Pro Expert knowledge across 14 academic disciplines	74.8%
SciCode Scientific research coding and numerical methods	33.3%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Official Product Page Other

→

GPT-4o mini: advancing cost-efficient intelligence Announcements

→

OpenAI Models Documentation Documentation

→

OpenAI API Pricing Documentation

→

OpenAI Playground Playground

→

Official Website

→

Usage Policies

→

Enterprise privacy at OpenAI

→

OpenAI Status Page

→

AI tools related to GPT-4o Mini Vision

These tools are strongly connected to GPT-4o Mini Vision through direct product references, provider mentions, or explicit model mappings.

AI Assistant

MaxAI.me

MaxAI.me is a Chrome and Edge extension designed to boost productivity by offering one-click AI tools for summarizing, searching, explaining, analyzing, translating, and writing content across any website. It supports major AI providers, including ChatGPT, Google Bard, Bing Chat AI, and Claude, and integrates with ChatGPT Plus features like GPT-4, Web Browsing, Code Interpreter, and Plugins. Users can also utilize their own OpenAI API key to access models such as GPT-4, GPT-3.5-turbo-16k, and GPT-4-32k. Additionally, the extension provides one-click ChatGPT prompts tailored for marketing, sales, copywriting, operations, productivity, and customer support.

Free 0 visits 5 saves

AI Chatbot

ChatGPT Phantom: Lofi Tutor

ChatGPT Phantom: Lofi Tutor is a Chrome extension that integrates AI models, including ChatGPT, Bing Chat, and Google Bard, to support writing and coding tasks. By leveraging real-time data—specifically from YouTube—it provides an advanced search experience for generating customized news articles and video scripts, serving as an alternative to traditional search engines.

Free 0 visits 4 saves

AI Assistant

Powerly.ai

Powerly.ai is a no-code platform designed for building custom ChatGPT-powered chatbots. It provides white-label solutions that allow users to create branded AI assistants for customer support, sales, and content generation. Users can integrate their own OpenAI API keys, train bots on custom data, utilize interactive video guides, and embed unlimited chatbots into websites and mobile applications.

Free 0 visits 1 saves

AI Assistant

GPT Omni

GPT Omni (gptomni.ai) offers a free, accessible web interface for interacting with the GPT-4o model. Designed for ease of use, it allows users to engage in AI conversations without technical requirements. By leveraging OpenAI's GPT-4o, the platform supports text, audio, and visual inputs, providing real-time audio responses, improved multilingual capabilities, and advanced vision features to make AI technology widely available.

Free 0 visits 7 saves

Related Daily Briefs

Recent daily stories tied to GPT-4o Mini Vision through direct model mentions or provider-level coverage.

Frontier Models

Mistral and OpenAI Signal a Broader Shift Around Costs Using PNGs

Claude and Mistral are becoming more practical to evaluate and deploy.

2026-07-04 AI Models AI API

Frontier Models

Hugging Face, xAI, and Anthropic Signal a Broader Shift Around DojoZero

Hugging Face and xAI move deeper into real workflows.

2026-07-01 AI Models Benchmark

Capital Industry

OpenAI and Nvidia Signal a Broader Shift Around Design-Dependent Observation-Window Sufficiency

OpenAI and NVIDIA are raising the stakes for enterprise adoption.

2026-06-30 Funding

Agents Workflows

Amazon, Runway, and Pika Signal a Broader Shift Around FDE

Pika and OpenAI move deeper into real workflows.

2026-06-30 AI Agent AI API

Community discussion

What people think about GPT-4o Mini Vision

GPT-4o Mini Vision discussions are most active in r/OpenAI, r/OpenAIDev, r/arduino. Top Reddit threads cluster around benchmark and model-comparison threads, coding workflow discussions.

The strongest match in this snapshot has 69 upvotes and 34 comments.

r/OpenAI 69 upvotes 34 comments July 19, 2024

GPT-4o mini vision pricing is odd

Sorry if someone's posted this before but I couldn't see anything.

I find it a bit strange that OpenAI have made their GPT-4o mini functionally the same as the non-mini model for vision, by making each "image tile" more tokens in the mini vs the original 4o model.

[https://openai.com/api/pricing/](https://openai.com/api/pricing/)

GPT-4o:
150 x 150px image = 255 tokens (155 + 85 base tokens)
255 tokens = US$0.001275

GPT-4o mini:
150 x 150px image = 8500 tokens (5667 + 2833 base tokens)
8500 tokens = US$0.001275

I had a bit of a fun project in mind which would compare images, so I was super excited about a really cheap model (especially with their batch 50% discount) but it's a bit dissapointing that the discount doesn't carry over to images.

In contrast, Anthropic just use the formula \`tokens = (width px \* height px)/750\` and charge you the corresponding model's rate for the tokens, and for now Haiku is nearly 10x cheaper per image than 4o mini.

Note:
I did test that this isn't an error on their page, I compared two small images and got the following response. `CompletionUsage(completion_tokens=13, prompt_tokens=17128, total_tokens=17141)`

Edit:
Seems like it's official, there's a tweet from OpenAI acknowledging it
[https://x.com/romainhuet/status/1814054938986885550?t=AMFK4svMvCluYqAXUqRDMQ&s=19](https://x.com/romainhuet/status/1814054938986885550?t=AMFK4svMvCluYqAXUqRDMQ&s=19)

Open Reddit thread

r/OpenAI 7 upvotes 6 comments September 9, 2024

Will GPT-4o Mini Vision be available to free users in the near future?

Open Reddit thread

r/OpenAIDev 3 upvotes 1 comments December 26, 2024

Sudden 88% drop in GPT-4o mini vision API token usage - what's going on?

Hey everyone! I'm seeing some strange behavior with my GPT-4o mini vision API usage and hoping someone can shed some light on this.

My Setup:
- I have an app that uses GPT-4o mini vision to extract data from images
- Images are sent as base64 directly in the prompt
- No recent changes made to the application

What Changed:
- Average token usage dropped from 137k to 16k tokens (88% decrease)
- Error rate increased from 1.9% to 2.9%

This happened suddenly without any changes on my end. Has anyone else experienced something similar? Were there any recent pricing changes or updates to the API that might explain this?
Any insights would be greatly appreciated!

Open Reddit thread

r/arduino 1 comments October 19, 2024

Chat gpt vision ai with gpt 4o mini

I am making a project using chat gpt's vision api with an esp32cam. Works for first loop (first picture it takes and sends to chat gpt), but the esp32 has "connection error" with chat gpt when i try to take another picture. Need help. Here is my code so far: (I have used chat gpt to try and fix the code but didn't work)

#include "esp_camera.h"
#include "FS.h"
#include "SD.h"
#include "SPI.h"
#include "mbedtls/base64.h" // For Base64 encoding
#include "WiFi.h" // Include Wi-Fi library
#include "wifi_credentials.h" // Include the file with Wi-Fi credentials

#define CAMERA_MODEL_XIAO_ESP32S3 // Has PSRAM

#include "camera_pins.h"

int imageCount = 1; // File Counter
bool camera_sign = false; // Check camera status
bool sd_sign = false; // Check sd status
int button = 0;
const int buttonPin = 3; // Pin where the button is connected

// Function to delete all files in the root directory
void deleteAllFiles(fs::FS &fs) {
File root = fs.open("/");
File file = root.openNextFile();
while (file) {
fs.remove(file.name()); // Delete each file
file = root.openNextFile();
}
Serial.println("All files deleted from SD card.");
}

// Function to create necessary folders
void createFolders(fs::FS &fs) {
if (!fs.exists("/pictures")) {
fs.mkdir("/pictures");
Serial.println("Created folder: /pictures");
}
if (!fs.exists("/encoded")) {
fs.mkdir("/encoded");
Serial.println("Created folder: /encoded");
}
}

// Save pictures to SD card in /pictures folder
void photo_save(const char * fileName) {
// Take a photo
camera_fb_t *fb = esp_camera_fb_get();
if (!fb) {
Serial.println("Failed to get camera frame buffer");
return;
}
// Save photo to file in the /pictures directory
writeFile(SD, fileName, fb->buf, fb->len);

// Base64 encode and save the image
encodeBase64AndSave(fb->buf, fb->len);

// Release image buffer
esp_camera_fb_return(fb);

Serial.println("Photo saved to file and encoded.");
}

// SD card write file
void writeFile(fs::FS &fs, const char * path, uint8_t * data, size_t len){
Serial.printf("Writing file: %s\r\n", path);

File file = fs.open(path, FILE_WRITE);
if(!file){
Serial.println("Failed to open file for writing");
return;
}
if(file.write(data, len) == len){
Serial.println("File written");
} else {
Serial.println("Write failed");
}
file.close();
}

// Function to Base64 encode the image and save it to the encoded folder
void encodeBase64AndSave(uint8_t *imageData, size_t len) {
// Calculate the output buffer size for Base64 encoded data
size_t encodedLen = (len * 4 / 3) + 4; // Base64 increases size by ~33%
char *encodedData = (char*) malloc(encodedLen); // Allocate memory for encoded data

if (encodedData == NULL) {
Serial.println("Failed to allocate memory for Base64 encoding");
return;
}

// Perform Base64 encoding
size_t outputLen;
int ret = mbedtls_base64_encode((unsigned char*)encodedData, encodedLen, &outputLen, imageData, len);

if (ret != 0) {
Serial.println("Failed to encode image to Base64");
free(encodedData);
return;
}

// Create the filename for the encoded file in the /encoded folder
char encodedFileName[64];
sprintf(encodedFileName, "/encoded/image%d.txt", imageCount); // Save Base64 data as a .txt file

// Save the encoded data to the SD card
writeFile(SD, encodedFileName, (uint8_t*)encodedData, outputLen);

free(encodedData); // Free allocated memory after encoding
}

// Function to connect to Wi-Fi
void connectToWiFi() {
WiFi.begin(WIFI_SSID, WIFI_PASSWORD);
Serial.print("Connecting to Wi-Fi");

// Wait until the ESP32 connects to the Wi-Fi
while (WiFi.status() != WL_CONNECTED) {
delay(500);
Serial.print(".");
}

Serial.println("");
Serial.println("Wi-Fi connected.");
Serial.print("IP address: ");
Serial.println(WiFi.localIP());
}

void setup() {
Serial.begin(115200);
while(!Serial); // When the serial monitor is turned on, the program starts to execute

// Connect to Wi-Fi
connectToWiFi();

camera_config_t config;
config.ledc_channel = LEDC_CHANNEL_0;
config.ledc_timer = LEDC_TIMER_0;
config.pin_d0 = Y2_GPIO_NUM;
config.pin_d1 = Y3_GPIO_NUM;
config.pin_d2 = Y4_GPIO_NUM;
config.pin_d3 = Y5_GPIO_NUM;
config.pin_d4 = Y6_GPIO_NUM;
config.pin_d5 = Y7_GPIO_NUM;
config.pin_d6 = Y8_GPIO_NUM;
config.pin_d7 = Y9_GPIO_NUM;
config.pin_xclk = XCLK_GPIO_NUM;
config.pin_pclk = PCLK_GPIO_NUM;
config.pin_vsync = VSYNC_GPIO_NUM;
config.pin_href = HREF_GPIO_NUM;
config.pin_sscb_sda = SIOD_GPIO_NUM;
config.pin_sscb_scl = SIOC_GPIO_NUM;
config.pin_pwdn = PWDN_GPIO_NUM;
config.pin_reset = RESET_GPIO_NUM;
config.xclk_freq_hz = 20000000;
config.frame_size = FRAMESIZE_UXGA;
config.pixel_format = PIXFORMAT_JPEG; // for streaming
config.grab_mode = CAMERA_GRAB_WHEN_EMPTY;
config.fb_location = CAMERA_FB_IN_PSRAM;
config.jpeg_quality = 12;
config.fb_count = 1;

// if PSRAM IC present, init with UXGA resolution and higher JPEG quality
if(config.pixel_format == PIXFORMAT_JPEG){
if(psramFound()){
config.jpeg_quality = 10;
config.fb_count = 2;
config.grab_mode = CAMERA_GRAB_LATEST;
} else {
// Limit the frame size when PSRAM is not available
config.frame_size = FRAMESIZE_SVGA;
config.fb_location = CAMERA_FB_IN_DRAM;
}
} else {
// Best option for face detection/recognition
config.frame_size = FRAMESIZE_240X240;
#if CONFIG_IDF_TARGET_ESP32S3
config.fb_count = 2;
#endif
}

// camera init
esp_err_t err = esp_camera_init(&config);
if (err != ESP_OK) {
Serial.printf("Camera init failed with error 0x%x", err);
return;
}

camera_sign = true; // Camera initialization check passes

// Initialize SD card
if(!SD.begin(21)){
Serial.println("Card Mount Failed");
return;
}
uint8_t cardType = SD.cardType();

// Determine if the type of SD card is available
if(cardType == CARD_NONE){
Serial.println("No SD card attached");
return;
}

Serial.print("SD Card Type: ");
if(cardType == CARD_MMC){
Serial.println("MMC");
} else if(cardType == CARD_SD){
Serial.println("SDSC");
} else if(cardType == CARD_SDHC){
Serial.println("SDHC");
} else {
Serial.println("UNKNOWN");
}

sd_sign = true; // SD initialization check passes

// Delete all files and create folders
deleteAllFiles(SD); // Delete all files on boot
createFolders(SD); // Create "pictures" and "encoded" folders

Serial.println("Photos will begin in one minute, please be ready.");
}

void loop() {
if (touchRead(4) <= 25000) {
button = 0;
}

if (touchRead(4) >= 25000 && button == 0) {
delay(500);
if (touchRead(4) >= 25000 && button == 0) {
char filename[64];
sprintf(filename, "/pictures/image%d.jpg", imageCount); // Save to the pictures folder
photo_save(filename);
Serial.printf("Saved picture: %s\r\n", filename);
imageCount++;
button = 1;
}
}
delay(50);
}

#include "esp_camera.h"
#include "FS.h"
#include "SD.h"
#include "SPI.h"
#include "WiFi.h"
#include <WiFiClientSecure.h>
#include <ArduinoJson.h>
#include "Base64.h"
#include "ChatGPT.hpp"
#include "credentials.h" // WiFi credentials and OpenAI API key

#define CAMERA_MODEL_XIAO_ESP32S3 // Has PSRAM

#include "camera_pins.h"

int imageCount = 1; // File Counter
bool camera_sign = false; // Check camera status
bool sd_sign = false; // Check sd status
int button = 0;
const int buttonPin = 3; // Pin where the button is connected

WiFiClientSecure client; // WiFiClientSecure for HTTPS connection
ChatGPT<WiFiClientSecure> chatGPT_Client(&client, "v1", openai_api_key, 60000); // Use WiFiClientSecure for HTTPS

void connectToWiFi() {
WiFi.begin(ssid, password);
Serial.println("Connecting to WiFi...");

// Wait until the device is connected to WiFi
while (WiFi.status() != WL_CONNECTED) {
delay(500);
Serial.print(".");
}
Serial.println();
Serial.print("Connected! IP address: ");
Serial.println(WiFi.localIP());
}

// Function to delete all files in the root directory
void deleteAllFiles(fs::FS &fs) {
File root = fs.open("/");
File file = root.openNextFile();
while (file) {
fs.remove(file.name()); // Delete each file
file = root.openNextFile();
}
Serial.println("All files deleted from SD card.");
}

// Function to create necessary folders
void createFolders(fs::FS &fs) {
if (!fs.exists("/pictures")) {
fs.mkdir("/pictures");
Serial.println("Created folder: /pictures");
}
if (!fs.exists("/encoded")) {
fs.mkdir("/encoded");
Serial.println("Created folder: /encoded");
}
}

// SD card write file
void writeFile(fs::FS &fs, const char * path, uint8_t * data, size_t len){
Serial.printf("Writing file: %s\r\n", path);

File file = fs.open(path, FILE_WRITE);
if(!file){
Serial.println("Failed to open file for writing");
return;
}
if(file.write(data, len) == len){
Serial.println("File written");
} else {
Serial.println("Write failed");
}
file.close();
}

// Save pictures to SD card and send to GPT-4o Mini Vision API
void photo_save_and_analyze(const char * fileName) {
// Take a photo
camera_fb_t *fb = esp_camera_fb_get();
if (!fb) {
Serial.println("Failed to get camera frame buffer");
return;
}

// Encode image to Base64
String encodedImage = base64::encode(fb->buf, fb->len);

// Print the Base64-encoded image (optional, can comment this line to reduce log size)
Serial.println("Base64 Encoded Image:");
Serial.println(encodedImage);

// Save photo to file in the /pictures directory
writeFile(SD, fileName, fb->buf, fb->len);

// Release image buffer
esp_camera_fb_return(fb);

Serial.println("Photo saved to file");

// Prepare the data URL for the API request
if (encodedImage.length() > 0) {
String base64Image = "data:image/jpeg;base64," + encodedImage;
String result;
Serial.println("\n\n[ChatGPT] - Asking a Vision Question");

// Send to the API
if (chatGPT_Client.vision_question("gpt-4o", "user", "text", "What’s in this image?", "image_url", base64Image.c_str(), "auto", 5000, true, result)) {
Serial.print("[ChatGPT] Response: ");
Serial.println(result);
encodedImage = "";
} else {
Serial.print("[ChatGPT] Error: ");
Serial.println(result);
}

// Clear the Base64 encoded image
encodedImage = ""; // Clear the base64 string after the API request
} else {
Serial.println("Encoded image is empty!");
}
}

void setup() {
Serial.begin(115200);
while(!Serial); // When the serial monitor is turned on, the program starts to execute

camera_config_t config;
config.ledc_channel = LEDC_CHANNEL_0;
config.ledc_timer = LEDC_TIMER_0;
config.pin_d0 = Y2_GPIO_NUM;
config.pin_d1 = Y3_GPIO_NUM;
config.pin_d2 = Y4_GPIO_NUM;
config.pin_d3 = Y5_GPIO_NUM;
config.pin_d4 = Y6_GPIO_NUM;
config.pin_d5 = Y7_GPIO_NUM;
config.pin_d6 = Y8_GPIO_NUM;
config.pin_d7 = Y9_GPIO_NUM;
config.pin_xclk = XCLK_GPIO_NUM;
config.pin_pclk = PCLK_GPIO_NUM;
config.pin_vsync = VSYNC_GPIO_NUM;
config.pin_href = HREF_GPIO_NUM;
config.pin_sscb_sda = SIOD_GPIO_NUM;
config.pin_sscb_scl = SIOC_GPIO_NUM;
config.pin_pwdn = PWDN_GPIO_NUM;
config.pin_reset = RESET_GPIO_NUM;
config.xclk_freq_hz = 20000000;
config.frame_size = FRAMESIZE_UXGA;
config.pixel_format = PIXFORMAT_JPEG; // for streaming
config.grab_mode = CAMERA_GRAB_WHEN_EMPTY;
config.fb_location = CAMERA_FB_IN_PSRAM;
config.jpeg_quality = 12;
config.fb_count = 1;

// if PSRAM IC present, init with UXGA resolution and higher JPEG quality
if(config.pixel_format == PIXFORMAT_JPEG){
if(psramFound()){
config.jpeg_quality = 10;
config.fb_count = 2;
config.grab_mode = CAMERA_GRAB_LATEST;
} else {
// Limit the frame size when PSRAM is not available
config.frame_size = FRAMESIZE_SVGA;
config.fb_location = CAMERA_FB_IN_DRAM;
}
} else {
// Best option for face detection/recognition
config.frame_size = FRAMESIZE_240X240;
#if CONFIG_IDF_TARGET_ESP32S3
config.fb_count = 2;
#endif
}

// camera init
esp_err_t err = esp_camera_init(&config);
if (err != ESP_OK) {
Serial.printf("Camera init failed with error 0x%x", err);
return;
}

camera_sign = true; // Camera initialization check passes

// Initialize SD card
if(!SD.begin(21)){
Serial.println("Card Mount Failed");
return;
}
uint8_t cardType = SD.cardType();

// Determine if the type of SD card is available
if(cardType == CARD_NONE){
Serial.println("No SD card attached");
return;
}

Serial.print("SD Card Type: ");
if(cardType == CARD_MMC){
Serial.println("MMC");
} else if(cardType == CARD_SD){
Serial.println("SDSC");
} else if(cardType == CARD_SDHC){
Serial.println("SDHC");
} else {
Serial.println("UNKNOWN");
}

sd_sign = true; // SD initialization check passes

// Delete all files and create folders
deleteAllFiles(SD); // Delete all files on boot
createFolders(SD); // Create "pictures" and "encoded" folders

Serial.println("Photos will begin in one minute, please be ready.");

// Connect to WiFi
connectToWiFi();
}

void loop() {
if (touchRead(4) <= 25000) {
button = 0;
}

// If it has been more than 1 minute since the last shot, take a picture, save it to the SD card, and analyze it with GPT-4o Mini Vision API
if (touchRead(4) >= 25000 && button == 0) {
delay(500);
if (touchRead(4) >= 25000 && button == 0) {
char filename[64];
sprintf(filename, "/pictures/image%d.jpg", imageCount); // Save to the pictures folder only
photo_save_and_analyze(filename);
Serial.printf("Saved and analyzed picture: %s\r\n", filename);
imageCount++;
button = 1;
}
}
delay(50);
}

Open Reddit thread

View more discussions →

FAQ

Common questions about GPT-4o Mini Vision

What is the context window size for GPT-4o Mini Vision?

GPT-4o Mini Vision supports a context window of 128,000 tokens, allowing large amounts of text and image content to be included in a single request.

What is the knowledge cutoff date for this model?

The training data cutoff for GPT-4o Mini Vision is October 2024, meaning it does not have knowledge of events that occurred after that date.

Does this model support image inputs?

Yes, GPT-4o Mini Vision is a multimodal model that accepts both text and image inputs within the same request, enabling visual question answering and image-based reasoning.

How does the pricing of GPT-4o Mini compare to other OpenAI models?

GPT-4o Mini is positioned as a low-cost model in OpenAI's lineup. For exact current pricing, refer to the OpenAI pricing page at platform.openai.com/docs/models.

What languages does GPT-4o Mini Vision support?

GPT-4o Mini Vision supports the same range of languages as GPT-4o, making it suitable for multilingual applications.

More models from OpenAI

Continue browsing adjacent models from the same provider.

← All AI Models