Chat Completions API
Generate conversational responses using large language models. The Chat Completions API enables you to build chatbots, virtual assistants, and other conversational AI applications.Base URL
Authentication
All requests must include authentication via one of these methods:API Key (Header)
Microsoft Entra ID Token
Request Format
HTTP Method
Headers
| Header | Required | Description |
|---|---|---|
Content-Type | Yes | Must be application/json |
Authorization | Yes | Bearer token for authentication |
User-Agent | No | Client identification string |
Request Body Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
messages | Array | Yes | - | Array of message objects representing the conversation |
model | String | No | - | Model deployment name (if multiple models available) |
max_tokens | Integer | No | 4096 | Maximum tokens to generate in response |
temperature | Float | No | 1.0 | Controls randomness (0.0 to 2.0) |
top_p | Float | No | 1.0 | Nucleus sampling parameter (0.0 to 1.0) |
frequency_penalty | Float | No | 0.0 | Penalize frequent tokens (-2.0 to 2.0) |
presence_penalty | Float | No | 0.0 | Penalize repeated tokens (-2.0 to 2.0) |
stop | String/Array | No | null | Sequences that stop generation |
stream | Boolean | No | false | Enable streaming responses |
seed | Integer | No | null | Deterministic sampling seed |
tools | Array | No | null | Available function tools |
tool_choice | String/Object | No | ”auto” | Tool selection strategy |
Message Object Structure
Message Roles
| Role | Description | Required Fields |
|---|---|---|
system | System instructions and behavior | role, content |
user | User messages and queries | role, content |
assistant | Model responses | role, content or tool_calls |
tool | Function call results | role, content, tool_call_id |
Request Examples
Basic Chat Request
Multi-turn Conversation
Function Calling
Response Format
Standard Response
Response with Function Call
Streaming Response
Whenstream: true, responses are sent as Server-Sent Events:
Response Fields
Root Level Fields
| Field | Type | Description |
|---|---|---|
id | String | Unique identifier for the completion |
object | String | Object type: chat.completion |
created | Integer | Unix timestamp of creation |
model | String | Model used for completion |
choices | Array | Array of completion choices |
usage | Object | Token usage information |
system_fingerprint | String | System configuration identifier |
Choice Object Fields
| Field | Type | Description |
|---|---|---|
index | Integer | Choice index in the array |
message | Object | The completion message |
finish_reason | String | Reason completion stopped |
Finish Reasons
| Reason | Description |
|---|---|
stop | Natural stopping point or stop sequence reached |
length | Maximum token limit reached |
tool_calls | Model called a function |
content_filter | Content filtered by safety systems |
Usage Object Fields
| Field | Type | Description |
|---|---|---|
prompt_tokens | Integer | Tokens in the input prompt |
completion_tokens | Integer | Tokens in the generated completion |
total_tokens | Integer | Total tokens used (prompt + completion) |
Error Responses
Error Format
Common Error Types
| Status Code | Error Type | Description |
|---|---|---|
| 400 | invalid_request_error | Malformed request |
| 401 | authentication_error | Invalid or missing API key |
| 403 | permission_error | Insufficient permissions |
| 404 | not_found_error | Endpoint or resource not found |
| 429 | rate_limit_error | Rate limit exceeded |
| 500 | api_error | Internal server error |
| 503 | service_unavailable | Service temporarily unavailable |
Rate Limits
Rate limits vary by deployment and pricing tier:| Metric | Limit |
|---|---|
| Requests per minute | Varies by tier |
| Tokens per minute | Varies by tier |
| Concurrent requests | 100 (typical) |
x-ratelimit-limit-requestsx-ratelimit-remaining-requestsx-ratelimit-reset-requests
Content Filtering
Azure AI Foundry includes built-in content filtering for safety:Filter Categories
- Hate: Discriminatory content
- Violence: Violent or harmful content
- Sexual: Sexual content
- Self-harm: Content promoting self-harm
Filter Levels
- Safe: Content passes all filters
- Low: Low-risk content allowed
- Medium: Moderate-risk content blocked
- High: High-risk content blocked
Filter Response
When content is filtered, the response includes:Best Practices
Performance Optimization
- Use appropriate
max_tokensto avoid unnecessary generation - Implement caching for repeated queries
- Use streaming for long responses to improve perceived latency
- Batch multiple independent requests when possible
Cost Management
- Monitor token usage with the
usagefield - Set reasonable
max_tokenslimits - Use shorter prompts when possible
- Implement request deduplication
Security
- Never expose API keys in client-side code
- Use Microsoft Entra ID for production applications
- Implement rate limiting on your application side
- Validate and sanitize user inputs
Error Handling
- Implement exponential backoff for rate limit errors
- Handle content filter responses gracefully
- Log errors for debugging and monitoring
- Provide meaningful error messages to users

