GPT Realtime API for speech and audio
Azure OpenAI GPT Realtime API for speech and audio is part of the GPT-4o model family that supports low-latency, “speech in, speech out” conversational interactions.
You can use the Realtime API via WebRTC or WebSocket to send audio input to the model and receive audio responses in real time.
Follow the instructions in this article to get started with the Realtime API via WebSockets. Use the Realtime API via WebSockets in server-to-server scenarios where low latency isn’t a requirement.
In most cases, use the Realtime API via WebRTC for real-time audio streaming in client-side applications such as a web application or mobile app. WebRTC is designed for low-latency, real-time audio streaming and is the best choice for most scenarios.
Supported models
The GPT real-time models are available for global deployments.
gpt-4o-realtime-preview (version 2024-12-17)
gpt-4o-mini-realtime-preview (version 2024-12-17)
gpt-realtime (version 2025-08-28)
gpt-realtime-mini (version 2025-10-06)
gpt-realtime-mini-2025-12-15 (version 2025-12-15)
For more information, see the models and versions documentation.
API support
Support for the Realtime API was first added in API version 2024-10-01-preview (retired). Use version 2025-08-28 to access the latest Realtime API features. We recommend you select the generally available API version (without ‘-preview’ suffix) when possible.
You need to use different endpoint formats for Preview and Generally Available (GA) models. All samples in this article use GA models and GA endpoint format, and don’t use api-version parameter, which is required for Preview endpoint format only. See detailed information on the endpoint format in this article.
The Realtime API has specific rate limits for audio tokens and concurrent sessions. Before deploying to production, review Azure OpenAI quotas and limits for your deployment type.
Prerequisites
Microsoft Entra ID prerequisites
For the recommended keyless authentication with Microsoft Entra ID, you need to:
- Install the Azure CLI used for keyless authentication with Microsoft Entra ID.
- Assign the
Cognitive Services OpenAI User role to your user account. You can assign roles in the Azure portal under Access control (IAM) > Add role assignment.
Deploy a model for real-time audio
To deploy the gpt-realtime model in the Microsoft Foundry portal:
- Go to the Foundry portal and create or select your project.
- Select your model deployments:
- For Azure OpenAI resource, select Deployments from Shared resources section in the left pane.
- For Foundry resource, select Models + endpoints from under My assets in the left pane.
- Select + Deploy model > Deploy base model to open the deployment window.
- Search for and select the
gpt-realtime model and then select Confirm.
- Review the deployment details and select Deploy.
- Follow the wizard to finish deploying the model.
Now that you have a deployment of the gpt-realtime model, you can interact with it in the Foundry portal Audio playground or Realtime API.
Set up
-
Create a new folder
realtime-audio-quickstart-js and go to the quickstart folder with the following command:
mkdir realtime-audio-quickstart-js && cd realtime-audio-quickstart-js
-
Create the
package.json with the following command:
-
Update the
type to module in package.json with the following command:
-
Install the OpenAI client library for JavaScript with:
-
Install the dependent packages used by the OpenAI client library for JavaScript with:
-
For the recommended keyless authentication with Microsoft Entra ID, install the
@azure/identity package with:
npm install @azure/identity
You need to retrieve the following information to authenticate your application with your Azure OpenAI resource:
Microsoft Entra ID
API key
| Variable name | Value |
|---|
AZURE_OPENAI_ENDPOINT | This value can be found in the Keys and Endpoint section when examining your resource from the Azure portal. |
AZURE_OPENAI_DEPLOYMENT_NAME | This value will correspond to the custom name you chose for your deployment when you deployed a model. This value can be found under Resource Management > Model Deployments in the Azure portal. |
Learn more about keyless authentication and setting environment variables.| Variable name | Value |
|---|
AZURE_OPENAI_ENDPOINT | This value can be found in the Keys and Endpoint section when examining your resource from the Azure portal. |
AZURE_OPENAI_API_KEY | This value can be found in the Keys and Endpoint section when examining your resource from the Azure portal. You can use either KEY1 or KEY2. |
AZURE_OPENAI_DEPLOYMENT_NAME | This value will correspond to the custom name you chose for your deployment when you deployed a model. This value can be found under Resource Management > Model Deployments in the Azure portal. |
Learn more about finding API keys and setting environment variables.
To use the recommended keyless authentication with the SDK, make sure that the AZURE_OPENAI_API_KEY environment variable isn’t set.
Send text, receive audio response
Microsoft Entra ID
API key
-
Create the
index.js file with the following code:
import OpenAI from 'openai';
import { OpenAIRealtimeWS } from 'openai/realtime/ws';
import { DefaultAzureCredential, getBearerTokenProvider } from '@azure/identity';
import { OpenAIRealtimeError } from 'openai/realtime/internal-base';
let isCreated = false;
let isConfigured = false;
let responseDone = false;
// Set this to false, if you want to continue receiving events after an error is received.
const throwOnError = true;
async function main() {
// The endpoint of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_ENDPOINT
// environment variable or replace the default value below.
// You can find it in the Microsoft Foundry portal in the Overview page of your Azure OpenAI resource.
// Example: https://{your-resource}.openai.azure.com
const endpoint = process.env.AZURE_OPENAI_ENDPOINT || 'AZURE_OPENAI_ENDPOINT';
const baseUrl = endpoint.replace(/\/$/, "") + '/openai/v1';
// The deployment name of your Azure OpenAI model is required. You can set it in the AZURE_OPENAI_DEPLOYMENT_NAME
// environment variable or replace the default value below.
// You can find it in the Foundry portal in the "Models + endpoints" page of your Azure OpenAI resource.
// Example: gpt-realtime
const deploymentName = process.env.AZURE_OPENAI_DEPLOYMENT_NAME || 'gpt-realtime';
// Keyless authentication
const credential = new DefaultAzureCredential();
const scope = 'https://cognitiveservices.azure.com/.default';
const azureADTokenProvider = getBearerTokenProvider(credential, scope);
const token = await azureADTokenProvider();
// The APIs are compatible with the OpenAI client library.
// You can use the OpenAI client library to access the Azure OpenAI APIs.
// Make sure to set the baseURL and apiKey to use the Azure OpenAI endpoint and token.
const openAIClient = new OpenAI({
baseURL: baseUrl,
apiKey: token,
});
const realtimeClient = await OpenAIRealtimeWS.create(openAIClient, {
model: deploymentName
});
realtimeClient.on('error', (receivedError) => receiveError(receivedError));
realtimeClient.on('session.created', (receivedEvent) => receiveEvent(receivedEvent));
realtimeClient.on('session.updated', (receivedEvent) => receiveEvent(receivedEvent));
realtimeClient.on('response.output_audio.delta', (receivedEvent) => receiveEvent(receivedEvent));
realtimeClient.on('response.output_audio_transcript.delta', (receivedEvent) => receiveEvent(receivedEvent));
realtimeClient.on('response.done', (receivedEvent) => receiveEvent(receivedEvent));
console.log('Waiting for events...');
while (!isCreated) {
console.log('Waiting for session.created event...');
await new Promise((resolve) => setTimeout(resolve, 100));
}
// After the session is created, configure it to enable audio input and output.
const sessionConfig = {
'type': 'realtime',
'instructions': 'You are a helpful assistant. You respond by voice and text.',
'output_modalities': ['audio'],
'audio': {
'input': {
'transcription': {
'model': 'whisper-1'
},
'format': {
'type': 'audio/pcm',
'rate': 24000,
},
'turn_detection': {
'type': 'server_vad',
'threshold': 0.5,
'prefix_padding_ms': 300,
'silence_duration_ms': 200,
'create_response': true
}
},
'output': {
'voice': 'alloy',
'format': {
'type': 'audio/pcm',
'rate': 24000,
}
}
}
};
realtimeClient.send({
'type': 'session.update',
'session': sessionConfig
});
while (!isConfigured) {
console.log('Waiting for session.updated event...');
await new Promise((resolve) => setTimeout(resolve, 100));
}
// After the session is configured, data can be sent to the session.
realtimeClient.send({
'type': 'conversation.item.create',
'item': {
'type': 'message',
'role': 'user',
'content': [{
type: 'input_text',
text: 'Please assist the user.'
}
]
}
});
realtimeClient.send({
type: 'response.create'
});
// While waiting for the session to finish, the events can be handled in the event handlers.
// In this example, we just wait for the first response.done event.
while (!responseDone) {
console.log('Waiting for response.done event...');
await new Promise((resolve) => setTimeout(resolve, 100));
}
console.log('The sample completed successfully.');
realtimeClient.close();
}
function receiveError(err) {
if (err instanceof OpenAIRealtimeError) {
console.error('Received an error event.');
console.error(`Message: ${err.cause.message}`);
console.error(`Stack: ${err.cause.stack}`);
}
if (throwOnError) {
throw err;
}
}
function receiveEvent(event) {
console.log(`Received an event: ${event.type}`);
switch (event.type) {
case 'session.created':
console.log(`Session ID: ${event.session.id}`);
isCreated = true;
break;
case 'session.updated':
console.log(`Session ID: ${event.session.id}`);
isConfigured = true;
break;
case 'response.output_audio_transcript.delta':
console.log(`Transcript delta: ${event.delta}`);
break;
case 'response.output_audio.delta':
let audioBuffer = Buffer.from(event.delta, 'base64');
console.log(`Audio delta length: ${audioBuffer.length} bytes`);
break;
case 'response.done':
console.log(`Response ID: ${event.response.id}`);
console.log(`The final response is: ${event.response.output[0].content[0].transcript}`);
responseDone = true;
break;
default:
console.warn(`Unhandled event type: ${event.type}`);
}
}
main().catch((err) => {
console.error('The sample encountered an error:', err);
});
export {
main
};
-
Sign in to Azure with the following command:
-
Run the JavaScript file.
-
Create the
index.js file with the following code:
import OpenAI from 'openai';
import { OpenAIRealtimeWS } from 'openai/realtime/ws';
import { OpenAIRealtimeError } from 'openai/realtime/internal-base';
let isCreated = false;
let isConfigured = false;
let responseDone = false;
// Set this to false, if you want to continue receiving events after an error is received.
const throwOnError = true;
async function main() {
// The endpoint of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_ENDPOINT
// environment variable or replace the default value below.
// You can find it in the Foundry portal in the Overview page of your Azure OpenAI resource.
// Example: https://{your-resource}.openai.azure.com
const endpoint = process.env.AZURE_OPENAI_ENDPOINT || 'AZURE_OPENAI_ENDPOINT';
const baseUrl = endpoint.replace(/\/$/, "") + '/openai/v1';
// The deployment name of your Azure OpenAI model is required. You can set it in the AZURE_OPENAI_DEPLOYMENT_NAME
// environment variable or replace the default value below.
// You can find it in the Foundry portal in the "Models + endpoints" page of your Azure OpenAI resource.
// Example: gpt-realtime
const deploymentName = process.env.AZURE_OPENAI_DEPLOYMENT_NAME || 'gpt-realtime';
// API Key of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_API_KEY
// environment variable or replace the default value below.
// You can find it in the Foundry portal in the Overview page of your Azure OpenAI resource.
const token = process.env.AZURE_OPENAI_API_KEY || '<Your API Key>';
// The APIs are compatible with the OpenAI client library.
// You can use the OpenAI client library to access the Azure OpenAI APIs.
// Make sure to set the baseURL and apiKey to use the Azure OpenAI endpoint and token.
const openAIClient = new OpenAI({
baseURL: baseUrl,
apiKey: token,
});
// Due to the current SDK limitation we need to explicitly
// pass API key as Header
const realtimeClient = await OpenAIRealtimeWS.create(
openAIClient, {
model: deploymentName,
options: {
headers: {
"api-key": token
}
}
});
realtimeClient.on('error', (receivedError) => receiveError(receivedError));
realtimeClient.on('session.created', (receivedEvent) => receiveEvent(receivedEvent));
realtimeClient.on('session.updated', (receivedEvent) => receiveEvent(receivedEvent));
realtimeClient.on('response.output_audio.delta', (receivedEvent) => receiveEvent(receivedEvent));
realtimeClient.on('response.output_audio_transcript.delta', (receivedEvent) => receiveEvent(receivedEvent));
realtimeClient.on('response.done', (receivedEvent) => receiveEvent(receivedEvent));
console.log('Waiting for events...');
while (!isCreated) {
console.log('Waiting for session.created event...');
await new Promise((resolve) => setTimeout(resolve, 100));
}
// After the session is created, configure it to enable audio input and output.
const sessionConfig = {
'type': 'realtime',
'instructions': 'You are a helpful assistant. You respond by voice and text.',
'output_modalities': ['audio'],
'audio': {
'input': {
'transcription': {
'model': 'whisper-1'
},
'format': {
'type': 'audio/pcm',
'rate': 24000,
},
'turn_detection': {
'type': 'server_vad',
'threshold': 0.5,
'prefix_padding_ms': 300,
'silence_duration_ms': 200,
'create_response': true
}
},
'output': {
'voice': 'alloy',
'format': {
'type': 'audio/pcm',
'rate': 24000,
}
}
}
};
realtimeClient.send({
'type': 'session.update',
'session': sessionConfig
});
while (!isConfigured) {
console.log('Waiting for session.updated event...');
await new Promise((resolve) => setTimeout(resolve, 100));
}
// After the session is configured, data can be sent to the session.
realtimeClient.send({
'type': 'conversation.item.create',
'item': {
'type': 'message',
'role': 'user',
'content': [{
type: 'input_text',
text: 'Please assist the user.'
}
]
}
});
realtimeClient.send({
type: 'response.create'
});
// While waiting for the session to finish, the events can be handled in the event handlers.
// In this example, we just wait for the first response.done event.
while (!responseDone) {
console.log('Waiting for response.done event...');
await new Promise((resolve) => setTimeout(resolve, 100));
}
console.log('The sample completed successfully.');
realtimeClient.close();
}
function receiveError(err) {
if (err instanceof OpenAIRealtimeError) {
console.error('Received an error event.');
console.error(`Message: ${err.cause.message}`);
console.error(`Stack: ${err.cause.stack}`);
}
if (throwOnError) {
throw err;
}
}
function receiveEvent(event) {
console.log(`Received an event: ${event.type}`);
switch (event.type) {
case 'session.created':
console.log(`Session ID: ${event.session.id}`);
isCreated = true;
break;
case 'session.updated':
console.log(`Session ID: ${event.session.id}`);
isConfigured = true;
break;
case 'response.output_audio_transcript.delta':
console.log(`Transcript delta: ${event.delta}`);
break;
case 'response.output_audio.delta':
let audioBuffer = Buffer.from(event.delta, 'base64');
console.log(`Audio delta length: ${audioBuffer.length} bytes`);
break;
case 'response.done':
console.log(`Response ID: ${event.response.id}`);
console.log(`The final response is: ${event.response.output[0].content[0].transcript}`);
responseDone = true;
break;
default:
console.warn(`Unhandled event type: ${event.type}`);
}
}
main().catch((err) => {
console.error('The sample encountered an error:', err);
});
export {
main
};
-
Run the JavaScript file.
Wait a few moments to get the response.
Output
The script gets a response from the model and prints the transcript and audio data received.
The output will look similar to the following:
Waiting for events...
Waiting for session.created event...
Received an event: session.created
Session ID: sess_CQx8YO3vKxD9FaPxrbQ9R
Waiting for session.updated event...
Received an event: session.updated
Session ID: sess_CQx8YO3vKxD9FaPxrbQ9R
Waiting for response.done event...
Waiting for response.done event...
Waiting for response.done event...
Received an event: response.output_audio_transcript.delta
Transcript delta: Sure
Received an event: response.output_audio_transcript.delta
Transcript delta: ,
Received an event: response.output_audio_transcript.delta
Transcript delta: I
Waiting for response.done event...
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 4800 bytes
Received an event: response.output_audio.delta
Audio delta length: 7200 bytes
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio_transcript.delta
Transcript delta: 'm
Received an event: response.output_audio_transcript.delta
Transcript delta: here
Received an event: response.output_audio_transcript.delta
Transcript delta: to
Received an event: response.output_audio_transcript.delta
Transcript delta: help
Received an event: response.output_audio_transcript.delta
Transcript delta: .
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio_transcript.delta
Transcript delta: What
Received an event: response.output_audio_transcript.delta
Transcript delta: do
Received an event: response.output_audio_transcript.delta
Transcript delta: you
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio_transcript.delta
Transcript delta: need
Received an event: response.output_audio_transcript.delta
Transcript delta: assistance
Received an event: response.output_audio_transcript.delta
Transcript delta: with
Received an event: response.output_audio_transcript.delta
Transcript delta: ?
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 28800 bytes
Received an event: response.done
Response ID: resp_CQx8YwQCszDqSUXRutxP9
The final response is: Sure, I'm here to help. What do you need assistance with?
The sample completed successfully.
Prerequisites
Microsoft Entra ID prerequisites
For the recommended keyless authentication with Microsoft Entra ID, you need to:
- Install the Azure CLI used for keyless authentication with Microsoft Entra ID.
- Assign the
Cognitive Services OpenAI User role to your user account. You can assign roles in the Azure portal under Access control (IAM) > Add role assignment.
Deploy a model for real-time audio
To deploy the gpt-realtime model in the Microsoft Foundry portal:
- Go to the Foundry portal and create or select your project.
- Select your model deployments:
- For Azure OpenAI resource, select Deployments from Shared resources section in the left pane.
- For Foundry resource, select Models + endpoints from under My assets in the left pane.
- Select + Deploy model > Deploy base model to open the deployment window.
- Search for and select the
gpt-realtime model and then select Confirm.
- Review the deployment details and select Deploy.
- Follow the wizard to finish deploying the model.
Now that you have a deployment of the gpt-realtime model, you can interact with it in the Foundry portal Audio playground or Realtime API.
Set up
-
Create a new folder
realtime-audio-quickstart-py and go to the quickstart folder with the following command:
mkdir realtime-audio-quickstart-py && cd realtime-audio-quickstart-py
-
Create a virtual environment. If you already have Python 3.10 or higher installed, you can create a virtual environment using the following commands:
py -3 -m venv .venv
.venv\scripts\activate
python3 -m venv .venv
source .venv/bin/activate
python3 -m venv .venv
source .venv/bin/activate
Activating the Python environment means that when you run python or pip from the command line, you then use the Python interpreter contained in the .venv folder of your application. You can use the deactivate command to exit the python virtual environment, and can later reactivate it when needed.
We recommend that you create and activate a new Python environment to use to install the packages you need for this tutorial. Don’t install packages into your global python installation. You should always use a virtual or conda environment when installing python packages, otherwise you can break your global installation of Python.
-
Install the OpenAI Python client library with:
pip install openai[realtime]
This library is maintained by OpenAI. Refer to the release history to track the latest updates to the library.
-
For the recommended keyless authentication with Microsoft Entra ID, install the
azure-identity package with:
pip install azure-identity
You need to retrieve the following information to authenticate your application with your Azure OpenAI resource:
Microsoft Entra ID
API key
| Variable name | Value |
|---|
AZURE_OPENAI_ENDPOINT | This value can be found in the Keys and Endpoint section when examining your resource from the Azure portal. |
AZURE_OPENAI_DEPLOYMENT_NAME | This value will correspond to the custom name you chose for your deployment when you deployed a model. This value can be found under Resource Management > Model Deployments in the Azure portal. |
Learn more about keyless authentication and setting environment variables.| Variable name | Value |
|---|
AZURE_OPENAI_ENDPOINT | This value can be found in the Keys and Endpoint section when examining your resource from the Azure portal. |
AZURE_OPENAI_API_KEY | This value can be found in the Keys and Endpoint section when examining your resource from the Azure portal. You can use either KEY1 or KEY2. |
AZURE_OPENAI_DEPLOYMENT_NAME | This value will correspond to the custom name you chose for your deployment when you deployed a model. This value can be found under Resource Management > Model Deployments in the Azure portal. |
Learn more about finding API keys and setting environment variables.
To use the recommended keyless authentication with the SDK, make sure that the AZURE_OPENAI_API_KEY environment variable isn’t set.
Send text, receive audio response
Microsoft Entra ID
API key
-
Create the
text-in-audio-out.py file with the following code:
import os
import base64
import asyncio
from openai import AsyncOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
async def main() -> None:
"""
When prompted for user input, type a message and hit enter to send it to the model.
Enter "q" to quit the conversation.
"""
credential = DefaultAzureCredential()
token_provider = get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")
token = token_provider()
# The endpoint of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_ENDPOINT
# environment variable.
# You can find it in the Microsoft Foundry portal in the Overview page of your Azure OpenAI resource.
# Example: https://{your-resource}.openai.azure.com
endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
# The deployment name of the model you want to use is required. You can set it in the AZURE_OPENAI_DEPLOYMENT_NAME
# environment variable.
# You can find it in the Foundry portal in the "Models + endpoints" page of your Azure OpenAI resource.
# Example: gpt-realtime
deployment_name = os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"]
base_url = endpoint.replace("https://", "wss://").rstrip("/") + "/openai/v1"
# The APIs are compatible with the OpenAI client library.
# You can use the OpenAI client library to access the Azure OpenAI APIs.
# Make sure to set the baseURL and apiKey to use the Azure OpenAI endpoint and token.
client = AsyncOpenAI(
websocket_base_url=base_url,
api_key=token
)
async with client.realtime.connect(
model=deployment_name,
) as connection:
# after the connection is created, configure the session.
await connection.session.update(session={
"type": "realtime",
"instructions": "You are a helpful assistant. You respond by voice and text.",
"output_modalities": ["audio"],
"audio": {
"input": {
"transcription": {
"model": "whisper-1",
},
"format": {
"type": "audio/pcm",
"rate": 24000,
},
"turn_detection": {
"type": "server_vad",
"threshold": 0.5,
"prefix_padding_ms": 300,
"silence_duration_ms": 200,
"create_response": True,
}
},
"output": {
"voice": "alloy",
"format": {
"type": "audio/pcm",
"rate": 24000,
}
}
}
})
# After the session is configured, data can be sent to the session.
while True:
user_input = input("Enter a message: ")
if user_input == "q":
print("Stopping the conversation.")
break
await connection.conversation.item.create(
item={
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": user_input}],
}
)
await connection.response.create()
async for event in connection:
if event.type == "response.output_text.delta":
print(event.delta, flush=True, end="")
elif event.type == "session.created":
print(f"Session ID: {event.session.id}")
elif event.type == "response.output_audio.delta":
audio_data = base64.b64decode(event.delta)
print(f"Received {len(audio_data)} bytes of audio data.")
elif event.type == "response.output_audio_transcript.delta":
print(f"Received text delta: {event.delta}")
elif event.type == "response.output_text.done":
print()
elif event.type == "error":
print("Received an error event.")
print(f"Error code: {event.error.code}")
print(f"Error Event ID: {event.error.event_id}")
print(f"Error message: {event.error.message}")
elif event.type == "response.done":
break
print("Conversation ended.")
credential.close()
asyncio.run(main())
-
Sign in to Azure with the following command:
-
Run the Python file.
python text-in-audio-out.py
-
When prompted for user input, type a message and hit enter to send it to the model. Enter “q” to quit the conversation.
-
Create the
text-in-audio-out.py file with the following code:
import os
import base64
import asyncio
from openai import AsyncOpenAI
async def main() -> None:
"""
When prompted for user input, type a message and hit enter to send it to the model.
Enter "q" to quit the conversation.
"""
# The endpoint of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_ENDPOINT
# environment variable.
# You can find it in the Foundry portal in the Overview page of your Azure OpenAI resource.
# Example: https://{your-resource}.openai.azure.com
endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
base_url = endpoint.replace("https://", "wss://").rstrip("/") + "/openai/v1"
# The deployment name of the model you want to use is required. You can set it in the AZURE_OPENAI_DEPLOYMENT_NAME
# environment variable.
# You can find it in the Foundry portal in the "Models + endpoints" page of your Azure OpenAI resource.
# Example: gpt-realtime
deployment_name = os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"]
# API Key of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_API_KEY
# environment variable.
# You can find it in the Foundry portal in the Overview page of your Azure OpenAI resource.
token=os.environ["AZURE_OPENAI_API_KEY"]
# The APIs are compatible with the OpenAI client library.
# You can use the OpenAI client library to access the Azure OpenAI APIs.
# Make sure to set the baseURL and apiKey to use the Azure OpenAI endpoint and token.
client = AsyncOpenAI(
websocket_base_url=base_url,
api_key=token
)
async with client.realtime.connect(
model=deployment_name,
) as connection:
# after the connection is created, configure the session.
await connection.session.update(session={
"type": "realtime",
"instructions": "You are a helpful assistant. You respond by voice and text.",
"output_modalities": ["audio"],
"audio": {
"input": {
"transcription": {
"model": "whisper-1",
},
"format": {
"type": "audio/pcm",
"rate": 24000,
},
"turn_detection": {
"type": "server_vad",
"threshold": 0.5,
"prefix_padding_ms": 300,
"silence_duration_ms": 200,
"create_response": True,
}
},
"output": {
"voice": "alloy",
"format": {
"type": "audio/pcm",
"rate": 24000,
}
}
}
})
# After the session is configured, data can be sent to the session.
while True:
user_input = input("Enter a message: ")
if user_input == "q":
print("Stopping the conversation.")
break
await connection.conversation.item.create(
item={
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": user_input}],
}
)
await connection.response.create()
async for event in connection:
if event.type == "response.output_text.delta":
print(event.delta, flush=True, end="")
elif event.type == "session.created":
print(f"Session ID: {event.session.id}")
elif event.type == "response.output_audio.delta":
audio_data = base64.b64decode(event.delta)
print(f"Received {len(audio_data)} bytes of audio data.")
elif event.type == "response.output_audio_transcript.delta":
print(f"Received text delta: {event.delta}")
elif event.type == "response.output_text.done":
print()
elif event.type == "error":
print("Received an error event.")
print(f"Error code: {event.error.code}")
print(f"Error Event ID: {event.error.event_id}")
print(f"Error message: {event.error.message}")
elif event.type == "response.done":
break
print("Conversation ended.")
asyncio.run(main())
-
Run the Python file.
python text-in-audio-out.py
-
When prompted for user input, type a message and hit enter to send it to the model. Enter “q” to quit the conversation.
Wait a few moments to get the response.
Output
The script gets a response from the model and prints the transcript and audio data received.
The output looks similar to the following:
Enter a message: How are you today?
Session ID: sess_CgAuonaqdlSNNDTdqBagI
Received text delta: I'm
Received text delta: doing
Received text delta: well
Received text delta: ,
Received 4800 bytes of audio data.
Received 7200 bytes of audio data.
Received 12000 bytes of audio data.
Received text delta: thank
Received text delta: you
Received text delta: for
Received text delta: asking
Received text delta: !
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received text delta: How
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received text delta: about
Received text delta: you
Received text delta: —
Received text delta: how
Received text delta: are
Received text delta: you
Received text delta: feeling
Received text delta: today
Received text delta: ?
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 24000 bytes of audio data.
Enter a message: q
Stopping the conversation.
Conversation ended.
Prerequisites
Microsoft Entra ID prerequisites
For the recommended keyless authentication with Microsoft Entra ID, you need to:
- Install the Azure CLI used for keyless authentication with Microsoft Entra ID.
- Assign the
Cognitive Services OpenAI User role to your user account. You can assign roles in the Azure portal under Access control (IAM) > Add role assignment.
Deploy a model for real-time audio
To deploy the gpt-realtime model in the Microsoft Foundry portal:
- Go to the Foundry portal and create or select your project.
- Select your model deployments:
- For Azure OpenAI resource, select Deployments from Shared resources section in the left pane.
- For Foundry resource, select Models + endpoints from under My assets in the left pane.
- Select + Deploy model > Deploy base model to open the deployment window.
- Search for and select the
gpt-realtime model and then select Confirm.
- Review the deployment details and select Deploy.
- Follow the wizard to finish deploying the model.
Now that you have a deployment of the gpt-realtime model, you can interact with it in the Foundry portal Audio playground or Realtime API.
Set up
-
Create a new folder
realtime-audio-quickstart-ts and go to the quickstart folder with the following command:
mkdir realtime-audio-quickstart-ts && cd realtime-audio-quickstart-ts
-
Create the
package.json with the following command:
-
Update the
package.json to ECMAScript with the following command:
-
Install the OpenAI client library for JavaScript with:
-
Install the dependent packages used by the OpenAI client library for JavaScript with:
-
For the recommended keyless authentication with Microsoft Entra ID, install the
@azure/identity package with:
npm install @azure/identity
You need to retrieve the following information to authenticate your application with your Azure OpenAI resource:
Microsoft Entra ID
API key
| Variable name | Value |
|---|
AZURE_OPENAI_ENDPOINT | This value can be found in the Keys and Endpoint section when examining your resource from the Azure portal. |
AZURE_OPENAI_DEPLOYMENT_NAME | This value will correspond to the custom name you chose for your deployment when you deployed a model. This value can be found under Resource Management > Model Deployments in the Azure portal. |
Learn more about keyless authentication and setting environment variables.| Variable name | Value |
|---|
AZURE_OPENAI_ENDPOINT | This value can be found in the Keys and Endpoint section when examining your resource from the Azure portal. |
AZURE_OPENAI_API_KEY | This value can be found in the Keys and Endpoint section when examining your resource from the Azure portal. You can use either KEY1 or KEY2. |
AZURE_OPENAI_DEPLOYMENT_NAME | This value will correspond to the custom name you chose for your deployment when you deployed a model. This value can be found under Resource Management > Model Deployments in the Azure portal. |
Learn more about finding API keys and setting environment variables.
To use the recommended keyless authentication with the SDK, make sure that the AZURE_OPENAI_API_KEY environment variable isn’t set.
Send text, receive audio response
Microsoft Entra ID
API key
-
Create the
index.ts file with the following code:
import OpenAI from 'openai';
import { OpenAIRealtimeWS } from 'openai/realtime/ws';
import { OpenAIRealtimeError } from 'openai/realtime/internal-base';
import { DefaultAzureCredential, getBearerTokenProvider } from "@azure/identity";
import { RealtimeSessionCreateRequest } from 'openai/resources/realtime/realtime';
let isCreated = false;
let isConfigured = false;
let responseDone = false;
// Set this to false, if you want to continue receiving events after an error is received.
const throwOnError = true;
async function main(): Promise<void> {
// The endpoint of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_ENDPOINT
// environment variable or replace the default value below.
// You can find it in the Microsoft Foundry portal in the Overview page of your Azure OpenAI resource.
// Example: https://{your-resource}.openai.azure.com
const endpoint = process.env.AZURE_OPENAI_ENDPOINT || 'AZURE_OPENAI_ENDPOINT';
const baseUrl = endpoint.replace(/\/$/, "") + '/openai/v1';
// The deployment name of your Azure OpenAI model is required. You can set it in the AZURE_OPENAI_DEPLOYMENT_NAME
// environment variable or replace the default value below.
// You can find it in the Foundry portal in the "Models + endpoints" page of your Azure OpenAI resource.
// Example: gpt-realtime
const deploymentName = process.env.AZURE_OPENAI_DEPLOYMENT_NAME || 'gpt-realtime';
// Keyless authentication
const credential = new DefaultAzureCredential();
const scope = "https://cognitiveservices.azure.com/.default";
const azureADTokenProvider = getBearerTokenProvider(credential, scope);
const token = await azureADTokenProvider();
// The APIs are compatible with the OpenAI client library.
// You can use the OpenAI client library to access the Azure OpenAI APIs.
// Make sure to set the baseURL and apiKey to use the Azure OpenAI endpoint and token.
const openAIClient = new OpenAI({
baseURL: baseUrl,
apiKey: token,
});
const realtimeClient = await OpenAIRealtimeWS.create(openAIClient, { model: deploymentName });
realtimeClient.on('error', (receivedError) => receiveError(receivedError));
realtimeClient.on('session.created', (receivedEvent) => receiveEvent(receivedEvent));
realtimeClient.on('session.updated', (receivedEvent) => receiveEvent(receivedEvent));
realtimeClient.on('response.output_audio.delta', (receivedEvent) => receiveEvent(receivedEvent));
realtimeClient.on('response.output_audio_transcript.delta', (receivedEvent) => receiveEvent(receivedEvent));
realtimeClient.on('response.done', (receivedEvent) => receiveEvent(receivedEvent));
console.log('Waiting for events...');
while (!isCreated) {
console.log('Waiting for session.created event...');
await new Promise((resolve) => setTimeout(resolve, 100));
}
// After the session is created, configure it to enable audio input and output.
const sessionConfig: RealtimeSessionCreateRequest = {
'type': 'realtime',
'instructions': 'You are a helpful assistant. You respond by voice and text.',
'output_modalities': ['audio'],
'audio': {
'input': {
'transcription': {
'model': 'whisper-1'
},
'format': {
'type': 'audio/pcm',
'rate': 24000,
},
'turn_detection': {
'type': 'server_vad',
'threshold': 0.5,
'prefix_padding_ms': 300,
'silence_duration_ms': 200,
'create_response': true
}
},
'output': {
'voice': 'alloy',
'format': {
'type': 'audio/pcm',
'rate': 24000,
}
}
}
};
realtimeClient.send({ 'type': 'session.update', 'session': sessionConfig });
while (!isConfigured) {
console.log('Waiting for session.updated event...');
await new Promise((resolve) => setTimeout(resolve, 100));
}
// After the session is configured, data can be sent to the session.
realtimeClient.send({
'type': 'conversation.item.create',
'item': {
'type': 'message',
'role': 'user',
'content': [{ type: 'input_text', text: 'Please assist the user.' }]
}
});
realtimeClient.send({ type: 'response.create' });
// While waiting for the session to finish, the events can be handled in the event handlers.
// In this example, we just wait for the first response.done event.
while (!responseDone) {
console.log('Waiting for response.done event...');
await new Promise((resolve) => setTimeout(resolve, 100));
}
console.log('The sample completed successfully.');
realtimeClient.close();
}
function receiveError(errorEvent: OpenAIRealtimeError): void {
if (errorEvent instanceof OpenAIRealtimeError) {
console.error('Received an error event.');
console.error(`Message: ${errorEvent.message}`);
console.error(`Stack: ${errorEvent.stack}`); errorEvent
}
if (throwOnError) {
throw errorEvent;
}
}
function receiveEvent(event: any): void {
console.log(`Received an event: ${event.type}`);
switch (event.type) {
case 'session.created':
console.log(`Session ID: ${event.session.id}`);
isCreated = true;
break;
case 'session.updated':
console.log(`Session ID: ${event.session.id}`);
isConfigured = true;
break;
case 'response.output_audio_transcript.delta':
console.log(`Transcript delta: ${event.delta}`);
break;
case 'response.output_audio.delta':
let audioBuffer = Buffer.from(event.delta, 'base64');
console.log(`Audio delta length: ${audioBuffer.length} bytes`);
break;
case 'response.done':
console.log(`Response ID: ${event.response.id}`);
console.log(`The final response is: ${event.response.output[0].content[0].transcript}`);
responseDone = true;
break;
default:
console.warn(`Unhandled event type: ${event.type}`);
}
}
main().catch((err) => {
console.error("The sample encountered an error:", err);
});
export { main };
-
Create the
tsconfig.json file to transpile the TypeScript code and copy the following code for ECMAScript.
{
"compilerOptions": {
"module": "NodeNext",
"target": "ES2022", // Supports top-level await
"moduleResolution": "NodeNext",
"skipLibCheck": true, // Avoid type errors from node_modules
"strict": true // Enable strict type-checking options
},
"include": ["*.ts"]
}
-
Install type definitions for Node
npm i --save-dev @types/node
-
Transpile from TypeScript to JavaScript.
-
Sign in to Azure with the following command:
-
Run the code with the following command:
-
Create the
index.ts file with the following code:
import OpenAI from 'openai';
import { OpenAIRealtimeWS } from 'openai/realtime/ws';
import { OpenAIRealtimeError } from 'openai/realtime/internal-base';
import { RealtimeSessionCreateRequest } from 'openai/resources/realtime/realtime';
let isCreated = false;
let isConfigured = false;
let responseDone = false;
// Set this to false, if you want to continue receiving events after an error is received.
const throwOnError = true;
async function main(): Promise<void> {
// The endpoint of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_ENDPOINT
// environment variable or replace the default value below.
// You can find it in the Foundry portal in the Overview page of your Azure OpenAI resource.
// Example: https://{your-resource}.openai.azure.com
const endpoint = process.env.AZURE_OPENAI_ENDPOINT || 'AZURE_OPENAI_ENDPOINT';
const baseUrl = endpoint.replace(/\/$/, "") + '/openai/v1';
// The deployment name of your Azure OpenAI model is required. You can set it in the AZURE_OPENAI_DEPLOYMENT_NAME
// environment variable or replace the default value below.
// You can find it in the Foundry portal in the "Models + endpoints" page of your Azure OpenAI resource.
// Example: gpt-realtime
const deploymentName = process.env.AZURE_OPENAI_DEPLOYMENT_NAME || 'gpt-realtime';
// API Key of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_API_KEY
// environment variable or replace the default value below.
// You can find it in the Foundry portal in the Overview page of your Azure OpenAI resource.
const token = process.env.AZURE_OPENAI_API_KEY || '<Your API Key>';
// The APIs are compatible with the OpenAI client library.
// You can use the OpenAI client library to access the Azure OpenAI APIs.
// Make sure to set the baseURL and apiKey to use the Azure OpenAI endpoint and token.
const openAIClient = new OpenAI({
baseURL: baseUrl,
apiKey: token,
});
// Due to the current SDK limitation we need to explicitly
// pass API key as Header
const realtimeClient = await OpenAIRealtimeWS.create(
openAIClient, {
model: deploymentName,
options: {
headers: {
"api-key": token
}
}
});
realtimeClient.on('error', (receivedError) => receiveError(receivedError));
realtimeClient.on('session.created', (receivedEvent) => receiveEvent(receivedEvent));
realtimeClient.on('session.updated', (receivedEvent) => receiveEvent(receivedEvent));
realtimeClient.on('response.output_audio.delta', (receivedEvent) => receiveEvent(receivedEvent));
realtimeClient.on('response.output_audio_transcript.delta', (receivedEvent) => receiveEvent(receivedEvent));
realtimeClient.on('response.done', (receivedEvent) => receiveEvent(receivedEvent));
console.log('Waiting for events...');
while (!isCreated) {
console.log('Waiting for session.created event...');
await new Promise((resolve) => setTimeout(resolve, 100));
}
// After the session is created, configure it to enable audio input and output.
const sessionConfig: RealtimeSessionCreateRequest = {
'type': 'realtime',
'instructions': 'You are a helpful assistant. You respond by voice and text.',
'output_modalities': ['audio'],
'audio': {
'input': {
'transcription': {
'model': 'whisper-1'
},
'format': {
'type': 'audio/pcm',
'rate': 24000,
},
'turn_detection': {
'type': 'server_vad',
'threshold': 0.5,
'prefix_padding_ms': 300,
'silence_duration_ms': 200,
'create_response': true
}
},
'output': {
'voice': 'alloy',
'format': {
'type': 'audio/pcm',
'rate': 24000,
}
}
}
};
realtimeClient.send({
'type': 'session.update',
'session': sessionConfig
});
while (!isConfigured) {
console.log('Waiting for session.updated event...');
await new Promise((resolve) => setTimeout(resolve, 100));
}
// After the session is configured, data can be sent to the session.
realtimeClient.send({
'type': 'conversation.item.create',
'item': {
'type': 'message',
'role': 'user',
'content': [{
type: 'input_text',
text: 'Please assist the user.'
}
]
}
});
realtimeClient.send({
type: 'response.create'
});
// While waiting for the session to finish, the events can be handled in the event handlers.
// In this example, we just wait for the first response.done event.
while (!responseDone) {
console.log('Waiting for response.done event...');
await new Promise((resolve) => setTimeout(resolve, 100));
}
console.log('The sample completed successfully.');
realtimeClient.close();
}
function receiveError(errorEvent: OpenAIRealtimeError): void {
if (errorEvent instanceof OpenAIRealtimeError) {
console.error('Received an error event.');
console.error(`Message: ${errorEvent.message}`);
console.error(`Stack: ${errorEvent.stack}`);
errorEvent
}
if (throwOnError) {
throw errorEvent;
}
}
function receiveEvent(event: any): void {
console.log(`Received an event: ${event.type}`);
switch (event.type) {
case 'session.created':
console.log(`Session ID: ${event.session.id}`);
isCreated = true;
break;
case 'session.updated':
console.log(`Session ID: ${event.session.id}`);
isConfigured = true;
break;
case 'response.output_audio_transcript.delta':
console.log(`Transcript delta: ${event.delta}`);
break;
case 'response.output_audio.delta':
let audioBuffer = Buffer.from(event.delta, 'base64');
console.log(`Audio delta length: ${audioBuffer.length} bytes`);
break;
case 'response.done':
console.log(`Response ID: ${event.response.id}`);
console.log(`The final response is: ${event.response.output[0].content[0].transcript}`);
responseDone = true;
break;
default:
console.warn(`Unhandled event type: ${event.type}`);
}
}
main().catch((err) => {
console.error("The sample encountered an error:", err);
});
export {
main
};
-
Create the
tsconfig.json file to transpile the TypeScript code and copy the following code for ECMAScript.
{
"compilerOptions": {
"module": "NodeNext",
"target": "ES2022", // Supports top-level await
"moduleResolution": "NodeNext",
"skipLibCheck": true, // Avoid type errors from node_modules
"strict": true // Enable strict type-checking options
},
"include": ["*.ts"]
}
-
Install type definitions for Node
npm i --save-dev @types/node
-
Transpile from TypeScript to JavaScript.
-
Run the code with the following command:
Wait a few moments to get the response.
Output
The script gets a response from the model and prints the transcript and audio data received.
The output will look similar to the following:
Waiting for events...
Waiting for session.created event...
Waiting for session.created event...
Waiting for session.created event...
Waiting for session.created event...
Waiting for session.created event...
Waiting for session.created event...
Waiting for session.created event...
Waiting for session.created event...
Waiting for session.created event...
Waiting for session.created event...
Received an event: session.created
Session ID: sess_CWQkREiv3jlU3gk48bm0a
Waiting for session.updated event...
Waiting for session.updated event...
Received an event: session.updated
Session ID: sess_CWQkREiv3jlU3gk48bm0a
Waiting for response.done event...
Waiting for response.done event...
Waiting for response.done event...
Waiting for response.done event...
Waiting for response.done event...
Received an event: response.output_audio_transcript.delta
Transcript delta: Sure
Received an event: response.output_audio_transcript.delta
Transcript delta: ,
Received an event: response.output_audio_transcript.delta
Transcript delta: I'm
Received an event: response.output_audio_transcript.delta
Transcript delta: here
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 4800 bytes
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 7200 bytes
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio_transcript.delta
Transcript delta: to
Received an event: response.output_audio_transcript.delta
Transcript delta: help
Received an event: response.output_audio_transcript.delta
Transcript delta: .
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio_transcript.delta
Transcript delta: What
Received an event: response.output_audio_transcript.delta
Transcript delta: would
Received an event: response.output_audio_transcript.delta
Transcript delta: you
Received an event: response.output_audio_transcript.delta
Transcript delta: like
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio_transcript.delta
Transcript delta: to
Received an event: response.output_audio_transcript.delta
Transcript delta: do
Received an event: response.output_audio_transcript.delta
Transcript delta: or
Received an event: response.output_audio_transcript.delta
Transcript delta: know
Received an event: response.output_audio_transcript.delta
Transcript delta: about
Received an event: response.output_audio_transcript.delta
Transcript delta: ?
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 24000 bytes
Received an event: response.done
Response ID: resp_CWQkRBrCcCjtHgIEapA92
The final response is: Sure, I'm here to help. What would you like to do or know about?
The sample completed successfully.
Deploy a model for real-time audio
To deploy the gpt-realtime model in the Microsoft Foundry portal:
- Go to the Foundry portal and create or select your project.
- Select your model deployments:
- For Azure OpenAI resource, select Deployments from Shared resources section in the left pane.
- For Foundry resource, select Models + endpoints from under My assets in the left pane.
- Select + Deploy model > Deploy base model to open the deployment window.
- Search for and select the
gpt-realtime model and then select Confirm.
- Review the deployment details and select Deploy.
- Follow the wizard to finish deploying the model.
Now that you have a deployment of the gpt-realtime model, you can interact with it in the Foundry portal Audio playground or Realtime API.
Use the GPT real-time audio
To chat with your deployed gpt-realtime model in the Microsoft Foundry Real-time audio playground, follow these steps:
- Go to the Foundry portal and select your project that has your deployed
gpt-realtime model.
- Select Playgrounds from the left pane.
- Select Audio playground > Try the Audio playground.
The Chat playground doesn’t support the gpt-realtime model. Use the Audio playground as described in this section.
-
Select your deployed
gpt-realtime model from the Deployment dropdown.
-
Optionally, you can edit contents in the Give the model instructions and context text box. Give the model instructions about how it should behave and any context it should reference when generating a response. You can describe the assistant’s personality, tell it what it should and shouldn’t answer, and tell it how to format responses.
-
Optionally, change settings such as threshold, prefix padding, and silence duration.
-
Select Start listening to start the session. You can speak into the microphone to start a chat.
-
You can interrupt the chat at any time by speaking. You can end the chat by selecting the Stop listening button.
Troubleshooting
Authentication errors
If you’re using keyless authentication (Microsoft Entra ID) and receive authentication errors:
- Verify the
AZURE_OPENAI_API_KEY environment variable is not set. Keyless authentication fails if this variable exists.
- Confirm you’ve run
az login to authenticate with Azure CLI.
- Check that your account has the
Cognitive Services OpenAI User role assigned to the Azure OpenAI resource.
WebSocket connection failures
If the WebSocket connection fails to establish:
- Verify your endpoint URL format matches the GA format:
{endpoint}/openai/v1 (without the api-version parameter).
- Check that your Azure OpenAI resource has a deployed
gpt-realtime model.
- Ensure your network allows WebSocket connections on port 443.
Rate limit exceeded
If you receive rate limit errors:
- The Realtime API has specific quotas separate from chat completions.
- Check your current usage in the Azure portal under your Azure OpenAI resource.
- Implement exponential backoff for retry logic in your application.
For more information about quotas, see Azure OpenAI quotas and limits.
Related content