Skip to main content

What is Azure Translator document translation?

Document translation is a cloud-based machine translation feature of Azure Translator in Foundry Tools. Translate multiple and complex documents across all supported languages and dialects while preserving original document structure and data format. The Document translation API supports two translation processes:
  • Asynchronous batch translation supports the processing of multiple documents and large files. The batch translation process requires an Azure Blob storage account with storage containers for your source and translated documents.
  • Synchronous single file supports the processing of single file translations. The file translation process doesn’t require an Azure Blob storage account. The final response contains the translated document and is returned directly to the calling client.

Prerequisites

Asynchronous batch translation prerequisites

Before you start, you need:

Synchronous translation prerequisites

Before you start, you need:
Store subscription keys in a secure location such as Azure Key Vault, and avoid putting keys in source control.

Key features

FeatureDescription
Translate large filesTranslate whole documents asynchronously.
Translate numerous filesTranslate multiple files across all supported languages and dialects while preserving document structure and data format.
Translate image file formats 🆕Translate text within an image while maintaining the original design and layout.
Supported formats: .jpeg, .png, .bmp, .webp
Pricing: Calculated on a per-image basis. For more information, see Pricing.
Translate image text in Word documents (.docx) and PowerPoint files (.pptx) 🆕.This feature is available with the batch document translation API for .docx and .pptx file formats.
Preserve source file presentationTranslate files while preserving the original layout and format.
Apply custom translationTranslate documents using general and custom translation models.
Apply custom glossariesTranslate documents using custom glossaries.
Automatically detect document languageLet the Document translation service determine the language of the document.
Translate documents with content in multiple languagesUse the autodetect feature to translate documents with content in multiple languages into your target language.

How document translation works

Document translation supports two workflows. Choose the approach that matches your scenario.

Asynchronous (batch)

  1. Upload source documents to your source container.
  2. Submit a batch translation request.
  3. Monitor job and document status.
  4. Download translated documents from your target container.
For detailed request/response flows, see the Document translation REST API reference guide.

Synchronous

  1. Send a request that includes one document (and an optional glossary).
  2. Receive the translated document in the response.
For request details and examples, see Synchronous document translation.

Development options

Add document translation to your projects and applications using the following development options.
Foundry portal currently supports synchronous (single-file) document translation only. Use the REST API or client libraries for asynchronous batch document translation.
Use asynchronous workflows to translate multiple documents and large files.
Development optionDescription
REST APIThe REST API is a language agnostic interface that enables you to create HTTP requests and authorization headers to translate documents.
Client libraries (SDKs)The client-library (SDKs) are language-specific classes, objects, methods, and code that you can quickly use by adding a reference in your project. Currently Document translation has programming language support for C#/.NET and Python.

Supported document and glossary formats

The following tables list the document and glossary file formats supported by each translation method.

Batch document supported formats

The Get supported document formats method returns a list of document formats supported by the Document translation service. The list includes common file extensions and content types.
File typeFile extensionDescription
Adobe PDFpdfPortable document file format. Document translation uses optical character recognition (OCR) technology to extract and translate text in scanned PDF document while retaining the original layout.
Comma-Separated ValuescsvA comma-delimited raw-data file used by spreadsheet programs.
DITAditaAn XML-based open standard for authoring and publishing.
HTMLhtml, htmHyper Text Markup Language.
Image (2025-12-01-preview).jpeg, .png, .bmp, .webpFiles that store digital image data.
Markdownmarkdown, mdown, mkdn, md, mkd, mdwn, mdtxt, mdtext, rmdA lightweight markup language for creating formatted text.
M​HTMLmhtml, mhtA web page archive format used to combine HTML code and its companion resources.
Microsoft Excelxls, xlsxA spreadsheet file for data analysis and documentation.
Microsoft OutlookmsgAn email message created or saved within Microsoft Outlook.
Microsoft PowerPointppt, pptxA presentation file used to display content in a slideshow format.
Microsoft Worddoc, docxA text document file.
OpenDocument PresentationodpAn open-source presentation file.
OpenDocument SpreadsheetodsAn open-source spreadsheet file.
OpenDocument TextodtAn open-source text document file.
Rich text formatrtfA text document containing formatting.
Tab separated values/TABtsv/tabA tab-delimited raw-data file used by spreadsheet programs.
TexttxtAn unformatted text document.
XLIFFxlf, xliff 2.0A parallel document format used in translation and localization.
XMLxmlA markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.

Batch Legacy file types

Source file types are preserved during the document translation with the following exceptions:
Source file extensionTranslated file extension
.doc, .odt, .rtf,.docx
.xls, .ods.xlsx
.ppt, .odp.pptx

Batch glossary supported formats

Document translation supports the following glossary file types:
File typeFile extensionDescription
Comma-Separated ValuescsvA comma-delimited raw-data file used by spreadsheet programs.
XLIFFxlfA parallel document format used in translation and localization.
Tab-Separated Values/TABtsv, tabA tab-delimited raw-data file used by spreadsheet programs.

Document translation request limits

For detailed information about Translator request limits, see Document translation request limits.

Document translation data residency

Document translation data residency depends on the Azure region where your Translator resource was created: ✔️ Feature: Document translation
✔️ Service endpoint: Custom domain: https://<your-resource-name>.cognitiveservices.azure.com
Resource created regionRequest processing data center
GlobalClosest available data center.
AmericasEast US 2 • West US 2
Asia PacificJapan East • Southeast Asia
Europe (except Switzerland)France Central • West Europe
SwitzerlandSwitzerland North • Switzerland West

Troubleshooting

Use the following checks to diagnose common issues.

Batch translation

Synchronous translation