Skip to main content
In this quickstart, you transcribe speech to text using the Azure OpenAI Whisper model. The Whisper model can transcribe human speech in numerous languages and translate other languages into English.
This quickstart takes approximately 10-15 minutes to complete.
For information about other audio models that you can use with Azure OpenAI, see Audio models.
The file size limit for the Whisper model is 25 MB. If you need to transcribe a file larger than 25 MB, you can use the Azure Speech in Foundry Tools batch transcription API.

Troubleshooting

Authentication errors

If you receive 401 Unauthorized errors, verify:
  • Your API key is correctly set in environment variables
  • Your Azure OpenAI resource is active
  • Your account has the Cognitive Services Contributor role

File format errors

The Whisper model supports mp3, mp4, mpeg, mpga, m4a, wav, and webm formats. Other formats return an error.

File size limit

Audio files must be 25 MB or smaller. For larger files, use the Azure Speech batch transcription API.

Deployment not found

Verify your deployment name matches exactly what you created in Azure OpenAI Studio. Deployment names are case-sensitive.

Clean up resources

If you want to clean up and remove an Azure OpenAI resource, you can delete the resource. Before deleting the resource, you must first delete any deployed models.

Next steps