Use the GPT Realtime API via WebRTC
This article refers to the Microsoft Foundry (new) portal.
- Lower latency: WebRTC is designed to minimize delay, making it more suitable for audio and video communication where low latency is critical for maintaining quality and synchronization.
- Media handling: WebRTC has built-in support for audio and video codecs, providing optimized handling of media streams.
- Error correction: WebRTC includes mechanisms for handling packet loss and jitter, which are essential for maintaining the quality of audio streams over unpredictable networks.
- Peer-to-peer communication: WebRTC allows direct communication between clients, reducing the need for a central server to relay audio data, which can further reduce latency.
- Stream audio data from a server to a client.
- Send and receive data in real time between a client and server.
Supported models
You can access the GPT real-time models for global deployments in the East US 2 and Sweden Central regions.gpt-4o-mini-realtime-preview(2024-12-17)gpt-4o-realtime-preview(2024-12-17)gpt-realtime(version 2025-08-28)gpt-realtime-mini(version 2025-10-06)gpt-realtime-mini-2025-12-15(version 2025-12-15)
2025-08-28 in the URL for the Realtime API. The API version is included in the sessions URL.
For more information about supported models, see the models and versions documentation.
Use the GA protocol for WebRTC.You can still use the beta protocol, but we recommend that you start with the GA Protocol. If you’re a current customer, plan to migrate to the GA Protocol.This article describes how to use WebRTC with the GA Protocol. We preserve the legacy protocol documentation here.
Prerequisites
Before you can use GPT real-time audio, you need:- An Azure subscription - Create one for free.
- A Microsoft Foundry resource - Create a Microsoft Foundry resource in one of the supported regions.
- A deployment of the
gpt-4o-realtime-preview,gpt-4o-mini-realtime-preview,gpt-realtime,gpt-realtime-mini, orgpt-realtime-mini-2025-12-15model in a supported region as described in the supported models section in this article.- In the Foundry portal, load your project. Select Build in the upper-right menu, then select the Models tab on the left pane, and select Deploy a base model. Search for the model you want, and select Deploy on the model page.
Set up WebRTC
To use WebRTC, you need two pieces of code:- A web browser application.
- A service where your web browser can retrieve an ephemeral token.
- Proxy the web browser’s session negotiation via Session Description Protocol through the same service retrieving the ephemeral token. This scenario is more secure because the web browser doesn’t have access to the ephemeral token.
- Filter the messages going to the web browser by using a query parameter.
- Create an observer WebSocket connection to listen to or record the session.
Steps
Step 1: Set up service to procure ephemeral token
The key to generating an ephemeral token is the REST API usingReplace placeholder values in the code samples:
<your azure resource>or<YOUR AZURE RESOURCE>- Your Azure OpenAI resource name<your model deployment name>or<YOUR MODEL DEPLOYMENT NAME>- Your realtime model deployment name
| Field | Required | Description |
|---|---|---|
session.type | Yes | Must be realtime |
session.model | Yes | Your model deployment name |
session.instructions | No | System prompt for the assistant |
session.audio.output.voice | No | Voice for audio output: alloy, ash, ballad, coral, echo, sage, shimmer, or verse |
Step 2: Set up your browser application
Your browser application calls your token service to get the token and then initiates a webRTC connection with the RealtimeAPI. To initiate the webRTC connection, use the following URL with the ephemeral token for authentication.- input_audio_buffer.speech_started
- input_audio_buffer.speech_stopped
- output_audio_buffer.started
- output_audio_buffer.stopped
- conversation.item.input_audio_transcription.completed
- conversation.item.added
- conversation.item.created
- response.output_text.delta
- response.output_text.done
- response.output_audio_transcript.delta
- response.output_audio_transcript.done
✅ RTCPeerConnection created✅ Microphone access granted✅ Data channel is open🎵 Audio playback started
response.output_audio_transcript.done events with the transcribed response.
Reference: RTCPeerConnection, Realtime API events
Step 3 (optional): Create a websocket observer/controller
If you proxy the session negotiation through your service application, you can parse the Location header that’s returned and use it to create a websocket connection to the WebRTC call. This connection can record the WebRTC call and even control it by issuing session.update events and other commands directly. Here’s an updated version of the token_service shown earlier, now with a /connect endpoint that you can use to both get the ephemeral token and negotiate the session initiation. It also includes a websocket connection that listens to the WebRTC session.Troubleshooting
Authentication errors
- 401 Unauthorized: Verify your API key or Microsoft Entra ID token is valid. Ensure the identity has the Cognitive Services User role assigned on the Azure OpenAI resource.
- 403 Forbidden: Check that your resource is deployed in a supported region (East US 2 or Sweden Central).
Connection errors
- WebRTC connection failed: Ensure your browser supports WebRTC and allows microphone access. Check that you’re using HTTPS (required for
getUserMedia). - Data channel not opening: Check the browser console for ICE connection state errors. Verify the ephemeral token hasn’t expired.
- SDP exchange failed: Verify the WebRTC endpoint URL is correct and the ephemeral token is valid.
Model errors
- Model not found: Verify your deployment name matches exactly (case-sensitive). Ensure you’ve deployed a realtime model (
gpt-4o-realtime-preview,gpt-realtime, etc.). - Quota exceeded: Check your Azure OpenAI quota in the Azure portal. Realtime API has separate quota from chat completions.
Audio issues
- No audio output: Check that
audioElement.autoplay = trueis set and browser autoplay policies aren’t blocking playback. Try clicking the page first to enable audio. - Poor audio quality: WebRTC automatically adjusts for network conditions. Check your network connection and try reducing other network traffic.
Related content
- Try the real-time audio quickstart
- See the Realtime API reference
- Learn more about Azure OpenAI quotas and limits