Technology

GPT-4o delivers human-like AI interaction with text, audio, and vision integration

, and Summarizely AI

May 15, 2024 . 12:07 AM

1 min read

GPT-4o delivers human-like AI interaction with text, audio, and vision integration — OpenAI has launched its new flagship model, GPT-4o, which seamlessly integrates text, audio, and visual inputs and outputs, promising to enhance the naturalness of machine interactions.

OpenAI has launched its new flagship model, GPT-4o, which integrates text, audio, and visual inputs and outputs, promising to enhance the naturalness of machine interactions. GPT-4o is designed to cater to a broader spectrum of input and output modalities, with a response time as quick as 232 milliseconds. The model marks a leap from its predecessors by processing all inputs and outputs through a single neural network, retaining critical information and context that were previously lost in the separate model pipeline used in earlier versions. GPT-4o matches GPT-4 Turbo performance levels in English text and coding tasks but outshines significantly in non-English languages, making it a more inclusive and versatile model. OpenAI has incorporated robust safety measures into GPT-4o by design, with a comprehensive scrutiny to mitigate risks introduced by the new modalities. GPT-4o’s text and image capabilities are available in ChatGPT, with a new Voice Mode powered by GPT-4o entering alpha testing within ChatGPT Plus in the coming weeks. Developers can access GPT-4o through the API for text and vision tasks, with OpenAI planning to expand GPT-4o’s audio and video functionalities to a select group of trusted partners via the API, with broader rollout expected in the near future. OpenAI invites community feedback to continuously refine GPT-4o, emphasizing the importance of user input in identifying and closing gaps where GPT-4 Turbo might still outperform.

Source