OpenAI has launched GPT-4o, a major (and free) ChatGPT upgrade that means the model is significantly faster and can rapidly process images, audio, and text.

Announced during an OpenAI live stream on Monday by OpenAI CTO Mira Murati, GPT-4o is a “much faster” iteration of OpenAI’s famous ChatGPT product that enhances “capabilities across text, vision and audio”.

OpenAI wrote in an accompanying blog post:

GPT-4o (‘o’ for ‘omni’) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs.”

OpenAI highlights that GPT-4o demonstrates impressive audio input response times, averaging 320 milliseconds and reaching as fast as 232 milliseconds, which it suggests is comparable to human conversational response times. Its performance in processing English text and code matches OpenAI’s advanced GPT-4 Turbo model, with notable advancements in handling non-English languages.

Additionally, GPT-4o is significantly faster and 50 percent cheaper to use via the API. It excels in vision and audio comprehension, surpassing the capabilities of existing models.

OpenAI CEO Sam Altman described GPT-4o as OpenAI’s “best model ever” and highlighted its “natively multimodal” capabilities on X.

Altman also wrote a separate blog following the live stream event where he enthused about GPT-4o’s potential.

“Our initial conception when we started OpenAI was that we’d create AI and use it to create all sorts of benefits for the world,” he wrote. “Instead, it now looks like we’ll create AI, and then other people will use it to create all sorts of amazing things that we all benefit from(…) The original ChatGPT showed a hint of what was possible with language interfaces; this new thing feels viscerally different. It is fast, smart, fun, natural, and helpful.”

GPT-4o’s capabilities are being introduced gradually, beginning with extended red team access starting today. However, its text and image functionalities are also available in ChatGPT starting today for anyone in the free tier to test out. GPT-4o is also accessible to Plus users, who will benefit from up to five times higher message limits.

More Information About GPT-4o

Additionally, in the coming weeks, a new version of Voice Mode featuring GPT-4o will be rolled out in alpha for ChatGPT Plus users.

Previously, Voice Mode allowed users to talk to ChatGPT with average latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4). This involved a three-model pipeline: one for audio-to-text, GPT-3.5 or GPT-4 for text processing, and another for text-to-audio. This setup limited GPT-4’s ability to interpret tone, multiple speakers, or background noises, and it couldn’t express laughter, singing, or emotions.

With GPT-4o, OpenAI emphasises that a single model now processes text, vision, and audio inputs and outputs. This end-to-end integration allows GPT-4o to retain more information and explore new capabilities, although OpenAI says it is still uncovering its full potential and limitations.

In its blog post, OpenAI cites several compelling use cases illustrating the advanced capabilities of GPT-4o, including sharing an audio file of a meeting with ChatGPT. The model then identifies the speakers and their job titles before precisely transcribing the audio. Another case study OpenAI highlighted was sharing a video of a lecture with ChatGPT, which then summarises the content of that video with eye-catching detail and accuracy.

OpenAI stresses that GPT-4o integrates safety measures across all modalities, such as data filtering and post-training refinements. Evaluated under its Preparedness Framework, it shows no higher than Medium risk in cybersecurity, CBRN, persuasion, and autonomy. Extensive external red teaming informed its safety interventions. Initially, OpenAI is releasing text and image inputs and text outputs, with audio outputs limited to preset voices under existing safety policies.

What Other OpenAI News Has Happened Recently?

Earlier this month, emails published in the US Justice Department’s antitrust case against Google suggest that Microsoft‘s investment in OpenAI was prompted by concerns about Google’s superior progress in AI.

As reported by Business Insider, the Justice Department’s investigation triggered the release of an internal email between Microsoft Co-Founder Bill Gates, CEO Satya Nadella, and CTO Kevin Scott. The June 2019 email, titled “Thoughts on OpenAI”, outlined the investment opportunity in the AI organisation while highlighting the areas in which Google was significantly ahead of Microsoft in its AI research and models.

In other OpenAI news, Microsoft and OpenAI’s relationship faces a potential antitrust probe by the European Union’s regulatory body.

As first reported by Reuters, the European Commission might launch an antitrust investigation of Microsoft’s $13 billion investment in OpenAI. The publication’s sources allegedly say regulators are building the case for such a probe.

According to a Reuters source, the European Commission is considering investigating whether Microsoft’s market power distorts the market through specific practices.



from UC Today https://ift.tt/HGnQBOt