OpenAI Launches New Voice Intelligence Features in Its API

OpenAI Launches New Voice Intelligence Features in Its API

OpenAI Launches Three New Voice Intelligence Models in Its API

OpenAI on May 7, 2026 announced the release of three new real-time audio models available through its developer API: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. The company says the trio is designed to "unlock a new class of voice apps for developers," moving beyond simple transcription and turn-based responses toward voice interfaces that can reason, translate, and act during live conversations. Early customers already testing the models include Zillow, Priceline, and Deutsche Telekom.

"Voice is becoming one of the most natural ways for people to use software," OpenAI said in its official announcement. "Together, the models we are launching move realtime audio from simple call-and-response toward voice interfaces that can actually do work."

What the Three New Models Do

GPT-Realtime-2: GPT-5-Class Reasoning in a Voice Model

GPT-Realtime-2 is OpenAI's first voice model built with GPT-5-class reasoning capabilities. According to OpenAI, the model is designed to handle complex requests, maintain conversational flow, and integrate with external tools during live spoken conversations — meaning it can call APIs, query databases, or trigger actions mid-conversation without the user having to pause and wait.

OpenAI describes three emerging voice AI patterns the model supports: voice-to-action (a user describes a task and the AI completes it), systems-to-voice (software proactively delivers spoken guidance), and voice-to-voice (real-time translation between speakers of different languages). The model is priced at $32 per million audio input tokens and $64 per million audio output tokens, with cached input tokens available at $0.40 per million.

Zillow is among the first companies to deploy GPT-Realtime-2, building a voice assistant that can listen to a buyer's request, reason about it, and act — for example, finding homes within a specified budget, filtering results by factors such as street noise, and scheduling tours, all through voice. On Zillow's hardest adversarial benchmark, GPT-Realtime-2 achieved a 95% call success rate after prompt optimization, compared to 69% with the previous model — a 26-point lift. A Zillow representative said in OpenAI's official release: "The combination of agentic competence and guardrail strength is what makes it viable for production voice at Zillow."

GPT-Realtime-Translate: Live Speech Translation Across 70+ Languages

GPT-Realtime-Translate is purpose-built for real-time spoken language translation. According to OpenAI, it supports more than 70 input languages translated into 13 output languages, and is aimed at use cases in customer support, cross-border sales, education, events, media, and creator platforms.

Deutsche Telekom is piloting the model to allow its customers to speak in their preferred language while the system translates the conversation in real time on the other end. The model is priced at $0.034 per minute of audio processed.

GPT-Realtime-Whisper: Streaming Live Transcription

GPT-Realtime-Whisper is a streaming speech-to-text model that transcribes speech live as the speaker talks — rather than processing audio in chunks after a pause. OpenAI says this enables use cases including live captions, meeting notes, and workflow documentation. It is priced at $0.017 per minute of audio processed.

All three models are available for developers to test in the OpenAI Playground.

moccet — AI built for you

Early Adopters Span Real Estate, Travel, and Telecom

Beyond Zillow and Deutsche Telekom, Priceline is working toward a voice-driven travel management experience where travelers can search for flights and hotels, handle booking changes, and receive real-time updates entirely through spoken conversation. According to reporting from Inc., companies including Vimeo and Glean are also among those building with the new models.

The breadth of early adopters — spanning real estate, travel logistics, multilingual telecommunications, video hosting, and enterprise search — reflects OpenAI's stated position that these voice models have applications well beyond customer service call centers.

Safety, Privacy, and Infrastructure

OpenAI says the Realtime API includes active content classifiers that monitor sessions in real time and can halt conversations detected as violating harmful content guidelines. The API also supports EU Data Residency and enterprise privacy commitments — a notable inclusion given the regulated environments many of the early enterprise customers operate in.

On the infrastructure side, OpenAI's engineering team published a post on May 4, 2026 describing a rearchitected WebRTC stack designed to deliver voice AI at the scale of more than 900 million weekly active users — a figure OpenAI has cited as its current weekly active user count for ChatGPT, up from 400 million in February 2025. The engineering work focused on global reach, fast connection setup, and low media round-trip time to make real-time voice interactions feel natural rather than delayed.

moccet — AI built for you

Why This Matters for Developers and Businesses

Prior to this release, OpenAI's Realtime API had been in general availability since August 2025 following a preview period that began in late 2024. The new models represent a meaningful capability jump: GPT-Realtime-2 in particular moves voice AI from a largely reactive, turn-based interface into something that can maintain context, handle interruptions, and take actions across longer sessions.

For businesses evaluating voice AI for production deployment, the pricing structure is concrete and worth examining. GPT-Realtime-2 at $32 per million input tokens and $64 per million output tokens is a token-based model, while GPT-Realtime-Translate and GPT-Realtime-Whisper are both priced per minute — making cost modeling relatively straightforward depending on use case volume.

The Zillow benchmark data, while drawn from a single company's internal testing rather than an independent evaluation, offers one early signal on performance improvement. A 26-percentage-point increase in call success rate on adversarial test cases — scenarios specifically designed to challenge the model — is a meaningful result for teams weighing whether to move from prototype to production deployment.

The multilingual translation capability also opens doors for applications with globally distributed user bases. Supporting more than 70 input languages into 13 output languages in real time is a practical tool for businesses operating across linguistic markets, whether in customer support, live events, or educational platforms serving international audiences.

What's Next

OpenAI has made all three models available for developer testing in the OpenAI Playground as of May 7, 2026. The company has not announced a specific timeline for additional language support, expanded output language coverage for GPT-Realtime-Translate, or further capability updates to the models.

What is clear is that OpenAI is building toward a voice AI infrastructure capable of serving its existing user base at scale. Whether the new API models translate into widespread enterprise adoption will depend on how developers integrate them into production systems — and how the models perform outside of the early-adopter environments OpenAI has highlighted at launch.

For more tech news, visit our news section.

Voice AI, Productivity, and What It Means for You

The shift toward voice-native software interfaces has direct implications for how people work, learn, and manage daily tasks. Real-time transcription tools like GPT-Realtime-Whisper can reduce the friction of capturing meeting notes and documentation. Voice-to-action models like GPT-Realtime-2 point toward a future where completing multi-step tasks — booking a trip, searching for a home, navigating a support issue — requires only a spoken conversation rather than navigating multiple screens and menus. For anyone tracking how AI is reshaping personal productivity and health management workflows, these developments are worth watching closely. Join the Moccet waitlist to stay ahead of the curve.

Share:
← Back to Tech News