GPT-4o released with improved text, audio and vision capabilities

2024-05-14

GPT-4o (“o” for “omni”) is OpenAI’s latest multimodal large language model (LLM) and it brings major advancements in text, voice, and image content generation to offer more natural interaction between users and the assistant.

OpenAI claims its new AI model can respond to audio inputs in as little as 232 milliseconds and it is significantly faster in text response in non-English prompts with support for over 50 languages. You can also interrupt the model with new questions or clarifications while it is talking.

GPT-4o also features a more capable, human-sounding voice assistant that responds in real time and can observe your surroundings via the camera on your device. You can even tell the assistant to sound more cheerful or switch back to a more robotic-sounding voice. You also get real-time translations in over 50 languages and it can act as an accessibility assistant for the visually impaired.

OpenAI demoed a long list of GPT-4o’s capabilities in its live stream. You can catch all of the new GPT-4o feature demos on OpenAI"s YouTube channel.

GPT-4o will be available for the free tier ChatGPT users while those on ChatGPT Plus get 5x higher message limits. GPT-4o’s text and image features are already available in the ChatGPT app and on the web. The new voice mode will be available as an alpha mode for ChatGPT Plus in the coming weeks.

In related news, OpenAI announced a ChatGPT desktop app for macOS, while a Windows version is coming later this year. OpenAI also announced its ChatGPT Store which hosts millions of custom chat bots that users can access for free.

Source

GPT-4o released with improved text, audio and vision capabilities

Related articles

With iOS 18, Apple's Activation lock will guard more than the phone OS

GPT-4o can now be fine-tuned to make it a better fit for your project

Withings ScanWatch Horizon delivers dive watch looks, usual fitness and health tracking features

Withings ScanWatch focuses on proactive health tracking with ECG and sleep apnea detection

With 50 million daily minutes, India leads WhatsApp's video calling usage