GPT-4o released with improved text, audio and vision capabilities
GPT-4o (“o” for “omni”) is OpenAI’s latest multimodal large language model (LLM) and it brings major advancements in text, voice, and image content generation to offer more natural interaction between users and the assistant.
OpenAI claims its new AI model can respond to audio inputs in as little as 232 milliseconds and it is significantly faster in text response in non-English prompts with support for over 50 languages. You can also interrupt the model with new questions or clarifications while it is talking.
GPT-4o also features a more capable, human-sounding voice assistant that responds in real time and can observe your surroundings via the camera on your device. You can even tell the assistant to sound more cheerful or switch back to a more robotic-sounding voice. You also get real-time translations in over 50 languages and it can act as an accessibility assistant for the visually impaired.
OpenAI demoed a long list of GPT-4o’s capabilities in its live stream. You can catch all of the new GPT-4o feature demos on OpenAI"s YouTube channel.
GPT-4o will be available for the free tier ChatGPT users while those on ChatGPT Plus get 5x higher message limits. GPT-4o’s text and image features are already available in the ChatGPT app and on the web. The new voice mode will be available as an alpha mode for ChatGPT Plus in the coming weeks.
In related news, OpenAI announced a ChatGPT desktop app for macOS, while a Windows version is coming later this year. OpenAI also announced its ChatGPT Store which hosts millions of custom chat bots that users can access for free.