Speech to text with Whisper - Microsoft Foundry Docs

In this quickstart, you transcribe speech to text using the Azure OpenAI Whisper model. The Whisper model can transcribe human speech in numerous languages and translate other languages into English.

This quickstart takes approximately 10-15 minutes to complete.

For information about other audio models that you can use with Azure OpenAI, see Audio models.

The file size limit for the Whisper model is 25 MB. If you need to transcribe a file larger than 25 MB, you can use the Azure Speech in Foundry Tools batch transcription API.

Troubleshooting

Authentication errors

If you receive 401 Unauthorized errors, verify:

Your API key is correctly set in environment variables
Your Azure OpenAI resource is active
Your account has the Cognitive Services Contributor role

File format errors

The Whisper model supports mp3, mp4, mpeg, mpga, m4a, wav, and webm formats. Other formats return an error.

File size limit

Audio files must be 25 MB or smaller. For larger files, use the Azure Speech batch transcription API.

Deployment not found

Verify your deployment name matches exactly what you created in Azure OpenAI Studio. Deployment names are case-sensitive.

Clean up resources

If you want to clean up and remove an Azure OpenAI resource, you can delete the resource. Before deleting the resource, you must first delete any deployed models.

Next steps

To learn how to convert audio data to text in batches, see Create a batch transcription.
For more examples, check out the Azure OpenAI Samples GitHub repository.

Speech to text quickstart

Text to speech quickstart - Speech service

​Troubleshooting

​Authentication errors

​File format errors

​File size limit

​Deployment not found

​Clean up resources

​Next steps

Troubleshooting

Authentication errors

File format errors

File size limit

Deployment not found

Clean up resources

Next steps