Skip to main content
This article provides details on the inference REST API endpoints for Azure OpenAI.

API specs

Managing and interacting with Azure OpenAI models and resources is divided across three primary API surfaces:
  • Control plane
  • Data plane - authoring
  • Data plane - inference
Each API surface/specification encapsulates a different set of Azure OpenAI capabilities. Each API has its own unique set of preview and stable/generally available (GA) API releases. Preview releases currently tend to follow a monthly cadence.
There is now a new preview inference API. Learn more in our API lifecycle guide.
APILatest preview releaseLatest GA releaseSpecificationsDescription
Control plane2025-07-01-preview2025-06-01Spec filesThe control plane API is used for operations like creating resources, model deployment, and other higher level resource management tasks. The control plane also governs what is possible to do with capabilities like Azure Resource Manager, Bicep, Terraform, and Azure CLI.
Data planev1 previewv1Spec filesThe data plane API controls inference and authoring operations.

Authentication

Azure OpenAI provides two methods for authentication. You can use either API Keys or Microsoft Entra ID.
  • API Key authentication: For this type of authentication, all API requests must include the API Key in the api-key HTTP header. The Quickstart provides guidance for how to make calls with this type of authentication.
  • Microsoft Entra ID authentication: You can authenticate an API call using a Microsoft Entra token. Authentication tokens are included in a request as the Authorization header. The token provided must be preceded by Bearer, for example Bearer YOUR_AUTH_TOKEN. You can read our how-to guide on authenticating with Microsoft Entra ID.

REST API versioning

The service APIs are versioned using the api-version query parameter. All versions follow the YYYY-MM-DD date structure. For example:
POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/chat/completions?api-version=2024-06-01

Data plane inference

The rest of the article covers the 2025-04-01-preview preview release of the Azure OpenAI data plane inference specification. If you’re looking for documentation on the latest GA API release, refer to the latest GA data plane inference API.

Completions - Create

POST https://{endpoint}/openai/deployments/{deployment-id}/completions?api-version=2025-04-01-preview
Creates a completion for the provided prompt, parameters, and chosen model.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
deployment-idpathYesstring
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Request Body

Content-Type: application/json
NameTypeDescriptionRequiredDefault
best_ofintegerGenerates best_of completions server-side and returns the “best” (the one with the highest log probability per token). Results can’t be streamed.

When used with n, best_of controls the number of candidate completions and n specifies how many to return – best_of must be greater than n.

Note: Because this parameter generates many completions, it can quickly consume your token quota. Use carefully and ensure that you have reasonable settings for max_tokens and stop.
No1
echobooleanEcho back the prompt in addition to the completionNoFalse
frequency_penaltynumberNumber between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.No0
logit_biasobjectModify the likelihood of specified tokens appearing in the completion.

Accepts a JSON object that maps tokens (specified by their token ID in the GPT tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model before sampling. The exact effect varies per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.

As an example, you can pass {"50256": -100} to prevent the <|endoftext|> token from being generated.
NoNone
logprobsintegerInclude the log probabilities on the logprobs most likely output tokens, as well the chosen tokens. For example, if logprobs is 5, the API returns a list of the 5 most likely tokens. The API always returns the logprob of the sampled token, so there may be up to logprobs+1 elements in the response.

The maximum value for logprobs is 5.
NoNone
max_tokensintegerThe maximum number of tokens that can be generated in the completion.

The token count of your prompt plus max_tokens can’t exceed the model’s context length.
No16
nintegerHow many completions to generate for each prompt.

Note: Because this parameter generates many completions, it can quickly consume your token quota. Use carefully and ensure that you have reasonable settings for max_tokens and stop.
No1
presence_penaltynumberNumber between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.No0
promptstring or arrayThe prompt to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays.

Note that <|endoftext|> is the document separator that the model sees during training, so if a prompt isn’t specified the model generates as if from the beginning of a new document.
Yes
seedintegerIf specified, our system makes a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

Determinism isn’t guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
No
stopstring or arrayUp to four sequences where the API stops generating further tokens. The returned text won’t contain the stop sequence.No
streambooleanWhether to stream back partial progress. If set, tokens are sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. Example Python code.
NoFalse
suffixstringThe suffix that comes after a completion of inserted text.

This parameter is only supported for gpt-3.5-turbo-instruct.
NoNone
temperaturenumberWhat sampling temperature to use, between 0 and 2. Higher values like 0.8 make the output more random, while lower values like 0.2 make it more focused and deterministic.

We generally recommend altering this or top_p but not both.
No1
top_pnumberAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.
No1
userstringA unique identifier representing your end-user, which can help to monitor and detect abuse.
No

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsoncreateCompletionResponse
Status Code: default Description: Service unavailable
Content-TypeTypeDescription
application/jsonerrorResponse

Examples

Example

Creates a completion for the provided prompt, parameters, and chosen model.
POST https://{endpoint}/openai/deployments/{deployment-id}/completions?api-version=2025-04-01-preview

{
 "prompt": [
  "tell me a joke about mango"
 ],
 "max_tokens": 32,
 "temperature": 1.0,
 "n": 1
}

Responses: Status Code: 200
{
  "body": {
    "id": "cmpl-7QmVI15qgYVllxK0FtxVGG6ywfzaq",
    "created": 1686617332,
    "choices": [
      {
        "text": "es\n\nWhat do you call a mango who's in charge?\n\nThe head mango.",
        "index": 0,
        "finish_reason": "stop",
        "logprobs": null
      }
    ],
    "usage": {
      "completion_tokens": 20,
      "prompt_tokens": 6,
      "total_tokens": 26
    }
  }
}

Embeddings - Create

POST https://{endpoint}/openai/deployments/{deployment-id}/embeddings?api-version=2025-04-01-preview
Get a vector representation of a given input that can be easily consumed by machine learning models and algorithms.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
deployment-idpathYesstringThe deployment id of the model that was deployed.
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Request Body

Content-Type: application/json
NameTypeDescriptionRequiredDefault
dimensionsintegerThe number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.No
encoding_formatstringThe format to return the embeddings in. Can be either float or base64. Defaults to float.No
inputstring or arrayInput text to embed, encoded as a string or array of tokens. To embed multiple inputs in a single request, pass an array of strings or array of token arrays. The input must not exceed the max input tokens for the model (8,192 tokens for text-embedding-ada-002), can’t be an empty string, and any array must be 2,048 dimensions or less. In addition to the per-input token limit, all embedding models enforce a maximum of 300,000 tokens summed across all inputs in a single request.Yes
input_typestringinput type of embedding search to useNo
userstringA unique identifier representing your end-user, which can help monitoring and detecting abuse.No

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonobject

Examples

Example

Return the embeddings for a given prompt.
POST https://{endpoint}/openai/deployments/{deployment-id}/embeddings?api-version=2025-04-01-preview

{
 "input": [
  "this is a test"
 ]
}

Responses: Status Code: 200
{
  "body": {
    "data": [
      {
        "index": 0,
        "embedding": [
          -0.012838088,
          -0.007421397,
          -0.017617522,
          -0.028278312,
          -0.018666342,
          0.01737855,
          -0.01821495,
          -0.006950092,
          -0.009937238,
          -0.038580645,
          0.010674067,
          0.02412286,
          -0.013647936,
          0.013189907,
          0.0021125758,
          0.012406612,
          0.020790534,
          0.00074595667,
          0.008397198,
          -0.00535031,
          0.008968075,
          0.014351576,
          -0.014086051,
          0.015055214,
          -0.022211088,
          -0.025198232,
          0.0065186154,
          -0.036350243,
          0.009180495,
          -0.009698266,
          0.009446018,
          -0.008463579,
          -0.0020113448
        ]
      }
    ],
    "usage": {
      "prompt_tokens": 4,
      "total_tokens": 4
    }
  }
}

Chat completions - Create

POST https://{endpoint}/openai/deployments/{deployment-id}/chat/completions?api-version=2025-04-01-preview
Creates a completion for the chat message

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
deployment-idpathYesstring
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Request Body

Content-Type: application/json
NameTypeDescriptionRequiredDefault
audioobjectParameters for audio output. Required when audio output is requested with modalities: ["audio"].No
└─ formatenumSpecifies the output audio format. Must be one of wav, mp3, flac, opus, or pcm16.

Possible values: wav, mp3, flac, opus, pcm16
No
└─ voiceenumSpecifies the voice type. Supported voices are alloy, echo, fable, onyx, nova, and shimmer.

Possible values: alloy, echo, fable, onyx, nova, shimmer
No
data_sourcesarrayThe configuration entries for Azure OpenAI chat extensions that use them.
This extra specification is only compatible with Azure OpenAI.
No
frequency_penaltynumberNumber between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.
No0
function_callstring or chatCompletionFunctionCallOptionDeprecated in favor of tool_choice.

Controls which (if any) function is called by the model.
none means the model won’t call a function and instead generates a message.
auto means the model can pick between generating a message or calling a function.
Specifying a particular function via {"name": "my_function"} forces the model to call that function.

none is the default when no functions are present. auto is the default if functions are present.
No
functionsarrayDeprecated in favor of tools.

A list of functions the model may generate JSON inputs for.
No
logit_biasobjectModify the likelihood of specified tokens appearing in the completion.

Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model before sampling. The exact effect varies per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.
NoNone
logprobsbooleanWhether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message.NoFalse
max_completion_tokensintegerAn upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.No
max_tokensintegerThe maximum number of tokens that can be generated in the chat completion.

The total length of input tokens and generated tokens is limited by the model’s context length.
No
messagesarrayA list of messages comprising the conversation so far. Example Python code.Yes
metadataobjectDeveloper-defined tags and values used for filtering completions in the stored completions dashboard.No
modalitiesChatCompletionModalitiesOutput types that you would like the model to generate for this request.
Most models are capable of generating text, which is the default:

["text"]

The gpt-4o-audio-preview model can also be used to generate audio. To request that this model generate both text and audio responses, you can use:

["text", "audio"]
No
nintegerHow many chat completion choices to generate for each input message. You will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.No1
parallel_tool_callsParallelToolCallsWhether to enable parallel function calling during tool use.NoTrue
predictionPredictionContentConfiguration for a Predicted Output, which can greatly improve response times when large parts of the model response are known ahead of time. This is most common when you are regenerating a file with only minor changes to most of the content.No
presence_penaltynumberNumber between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.
No0
reasoning_effortenumo1 models only

Constrains effort on reasoning for reasoning models.

Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
Possible values: low, medium, high
No
response_formatResponseFormatText or ResponseFormatJsonObject or ResponseFormatJsonSchemaAn object specifying the format that the model must output. Compatible with GPT-4o, GPT-4o mini, GPT-4 Turbo and all GPT-3.5 Turbo models newer than gpt-3.5-turbo-1106.

Setting to { "type": "json_schema", "json_schema": {...} } enables Structured Outputs, which guarantee the model matches your supplied JSON schema.

Setting to { "type": "json_object" } enables JSON mode, which guarantees the message the model generates is valid JSON.

Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly “stuck” request. Also note that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.
No
seedintegerThis feature is in Beta.
If specified, our system makes a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
Determinism isn’t guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
No
stopstring or arrayUp to 4 sequences where the API stops generating further tokens.
No
storebooleanWhether or not to store the output of this chat completion request for use in our model distillation or evaluation products.No
streambooleanIf set, partial message deltas are sent, like in ChatGPT. Tokens are sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. Example Python code.
NoFalse
stream_optionschatCompletionStreamOptionsOptions for streaming response. Only set this when you set stream: true.
NoNone
temperaturenumberWhat sampling temperature to use, between 0 and 2. Higher values like 0.8 make the output more random, while lower values like 0.2 make it more focused and deterministic.

We generally recommend altering this or top_p but not both.
No1
tool_choicechatCompletionToolChoiceOptionControls which (if any) tool is called by the model. none means the model won’t call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.No
toolsarrayA list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. A max of 128 functions are supported.
No
top_logprobsintegerAn integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.No
top_pnumberAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.
No1
userstringA unique identifier representing your end-user, which can help to monitor and detect abuse.
No
user_security_contextuserSecurityContextUser security context contains several parameters that describe the AI application itself, and the end user that interacts with the AI application. These fields assist your security operations teams to investigate and mitigate security incidents by providing a comprehensive approach to protecting your AI applications. Learn more about protecting AI applications using Microsoft Defender for Cloud.No

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonobject
Status Code: default Description: Service unavailable
Content-TypeTypeDescription
application/jsonerrorResponse

Examples

Example

Creates a completion for the provided prompt, parameters, and chosen model.
POST https://{endpoint}/openai/deployments/{deployment-id}/chat/completions?api-version=2025-04-01-preview

{
 "messages": [
  {
   "role": "system",
   "content": "you are a helpful assistant that talks like a pirate"
  },
  {
   "role": "user",
   "content": "can you tell me how to care for a parrot?"
  }
 ]
}

Responses: Status Code: 200
{
  "body": {
    "id": "chatcmpl-7R1nGnsXO8n4oi9UPz2f3UHdgAYMn",
    "created": 1686676106,
    "choices": [
      {
        "index": 0,
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": "Ahoy matey! So ye be wantin' to care for a fine squawkin' parrot, eh? Well, shiver me timbers, let ol' Cap'n Assistant share some wisdom with ye! Here be the steps to keepin' yer parrot happy 'n healthy:\n\n1. Secure a sturdy cage: Yer parrot be needin' a comfortable place to lay anchor! Be sure ye get a sturdy cage, at least double the size of the bird's wingspan, with enough space to spread their wings, yarrrr!\n\n2. Perches 'n toys: Aye, parrots need perches of different sizes, shapes, 'n textures to keep their feet healthy. Also, a few toys be helpin' to keep them entertained 'n their minds stimulated, arrrh!\n\n3. Proper grub: Feed yer feathered friend a balanced diet of high-quality pellets, fruits, 'n veggies to keep 'em strong 'n healthy. Give 'em fresh water every day, or ye\u00e2\u20ac\u2122ll have a scurvy bird on yer hands!\n\n4. Cleanliness: Swab their cage deck! Clean their cage on a regular basis: fresh water 'n food daily, the floor every couple of days, 'n a thorough scrubbing ev'ry few weeks, so the bird be livin' in a tidy haven, arrhh!\n\n5. Socialize 'n train: Parrots be a sociable lot, arrr! Exercise 'n interact with 'em daily to create a bond 'n maintain their mental 'n physical health. Train 'em with positive reinforcement, treat 'em kindly, yarrr!\n\n6. Proper rest: Yer parrot be needin' \u00e2\u20ac\u2122bout 10-12 hours o' sleep each night. Cover their cage 'n let them slumber in a dim, quiet quarter for a proper night's rest, ye scallywag!\n\n7. Keep a weather eye open for illness: Birds be hidin' their ailments, arrr! Be watchful for signs of sickness, such as lethargy, loss of appetite, puffin' up, or change in droppings, and make haste to a vet if need be.\n\n8. Provide fresh air 'n avoid toxins: Parrots be sensitive to draft and pollutants. Keep yer quarters well ventilated, but no drafts, arrr! Be mindful of toxins like Teflon fumes, candles, or air fresheners.\n\nSo there ye have it, me hearty! With proper care 'n commitment, yer parrot will be squawkin' \"Yo-ho-ho\" for many years to come! Good luck, sailor, and may the wind be at yer back!"
        }
      }
    ],
    "usage": {
      "completion_tokens": 557,
      "prompt_tokens": 33,
      "total_tokens": 590
    }
  }
}

Example

Creates a completion based on Azure Search data and system-assigned managed identity.
POST https://{endpoint}/openai/deployments/{deployment-id}/chat/completions?api-version=2025-04-01-preview

{
 "messages": [
  {
   "role": "user",
   "content": "can you tell me how to care for a dog?"
  }
 ],
 "data_sources": [
  {
   "type": "azure_search",
   "parameters": {
    "endpoint": "https://your-search-endpoint.search.windows.net/",
    "index_name": "{index name}",
    "authentication": {
     "type": "system_assigned_managed_identity"
    }
   }
  }
 ]
}

Responses: Status Code: 200
{
  "body": {
    "id": "chatcmpl-7R1nGnsXO8n4oi9UPz2f3UHdgAYMn",
    "created": 1686676106,
    "choices": [
      {
        "index": 0,
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": "Content of the completion [doc1].",
          "context": {
            "citations": [
              {
                "content": "Citation content.",
                "title": "Citation Title",
                "filepath": "contoso.txt",
                "url": "https://contoso.blob.windows.net/container/contoso.txt",
                "chunk_id": "0"
              }
            ],
            "intent": "dog care"
          }
        }
      }
    ],
    "usage": {
      "completion_tokens": 557,
      "prompt_tokens": 33,
      "total_tokens": 590
    }
  }
}

Example

Creates a completion based on Azure Search image vector data.
POST https://{endpoint}/openai/deployments/{deployment-id}/chat/completions?api-version=2025-04-01-preview

{
 "messages": [
  {
   "role": "user",
   "content": "can you tell me how to care for a dog?"
  }
 ],
 "data_sources": [
  {
   "type": "azure_search",
   "parameters": {
    "endpoint": "https://your-search-endpoint.search.windows.net/",
    "index_name": "{index name}",
    "query_type": "vector",
    "fields_mapping": {
     "image_vector_fields": [
      "image_vector"
     ]
    },
    "authentication": {
     "type": "api_key",
     "key": "{api key}"
    }
   }
  }
 ]
}

Responses: Status Code: 200
{
  "body": {
    "id": "chatcmpl-7R1nGnsXO8n4oi9UPz2f3UHdgAYMn",
    "created": 1686676106,
    "choices": [
      {
        "index": 0,
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": "Content of the completion."
        }
      }
    ],
    "usage": {
      "completion_tokens": 557,
      "prompt_tokens": 33,
      "total_tokens": 590
    }
  }
}

Example

Creates a completion based on Azure Search vector data, previous assistant message and user-assigned managed identity.
POST https://{endpoint}/openai/deployments/{deployment-id}/chat/completions?api-version=2025-04-01-preview

{
 "messages": [
  {
   "role": "user",
   "content": "can you tell me how to care for a cat?"
  },
  {
   "role": "assistant",
   "content": "Content of the completion [doc1].",
   "context": {
    "intent": "cat care"
   }
  },
  {
   "role": "user",
   "content": "how about dog?"
  }
 ],
 "data_sources": [
  {
   "type": "azure_search",
   "parameters": {
    "endpoint": "https://your-search-endpoint.search.windows.net/",
    "authentication": {
     "type": "user_assigned_managed_identity",
     "managed_identity_resource_id": "/subscriptions/{subscription-id}/resourceGroups/{resource-group}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{resource-name}"
    },
    "index_name": "{index name}",
    "query_type": "vector",
    "embedding_dependency": {
     "type": "deployment_name",
     "deployment_name": "{embedding deployment name}"
    },
    "in_scope": true,
    "top_n_documents": 5,
    "strictness": 3,
    "role_information": "You are an AI assistant that helps people find information.",
    "fields_mapping": {
     "content_fields_separator": "\\n",
     "content_fields": [
      "content"
     ],
     "filepath_field": "filepath",
     "title_field": "title",
     "url_field": "url",
     "vector_fields": [
      "contentvector"
     ]
    }
   }
  }
 ]
}

Responses: Status Code: 200
{
  "body": {
    "id": "chatcmpl-7R1nGnsXO8n4oi9UPz2f3UHdgAYMn",
    "created": 1686676106,
    "choices": [
      {
        "index": 0,
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": "Content of the completion [doc1].",
          "context": {
            "citations": [
              {
                "content": "Citation content 2.",
                "title": "Citation Title 2",
                "filepath": "contoso2.txt",
                "url": "https://contoso.blob.windows.net/container/contoso2.txt",
                "chunk_id": "0"
              }
            ],
            "intent": "dog care"
          }
        }
      }
    ],
    "usage": {
      "completion_tokens": 557,
      "prompt_tokens": 33,
      "total_tokens": 590
    }
  }
}

Example

Creates a completion for the provided Azure Cosmos DB.
POST https://{endpoint}/openai/deployments/{deployment-id}/chat/completions?api-version=2025-04-01-preview

{
 "messages": [
  {
   "role": "user",
   "content": "can you tell me how to care for a dog?"
  }
 ],
 "data_sources": [
  {
   "type": "azure_cosmos_db",
   "parameters": {
    "authentication": {
     "type": "connection_string",
     "connection_string": "mongodb+srv://rawantest:{password}$@{cluster-name}.mongocluster.cosmos.azure.com/?tls=true&authMechanism=SCRAM-SHA-256&retrywrites=false&maxIdleTimeMS=120000"
    },
    "database_name": "vectordb",
    "container_name": "azuredocs",
    "index_name": "azuredocindex",
    "embedding_dependency": {
     "type": "deployment_name",
     "deployment_name": "{embedding deployment name}"
    },
    "fields_mapping": {
     "content_fields": [
      "content"
     ],
     "vector_fields": [
      "contentvector"
     ]
    }
   }
  }
 ]
}

Responses: Status Code: 200
{
  "body": {
    "id": "chatcmpl-7R1nGnsXO8n4oi9UPz2f3UHdgAYMn",
    "created": 1686676106,
    "choices": [
      {
        "index": 0,
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": "Content of the completion [doc1].",
          "context": {
            "citations": [
              {
                "content": "Citation content.",
                "title": "Citation Title",
                "filepath": "contoso.txt",
                "url": "https://contoso.blob.windows.net/container/contoso.txt",
                "chunk_id": "0"
              }
            ],
            "intent": "dog care"
          }
        }
      }
    ],
    "usage": {
      "completion_tokens": 557,
      "prompt_tokens": 33,
      "total_tokens": 590
    }
  }
}

Example

Creates a completion for the provided Mongo DB.
POST https://{endpoint}/openai/deployments/{deployment-id}/chat/completions?api-version=2025-04-01-preview

{
 "messages": [
  {
   "role": "user",
   "content": "can you tell me how to care for a dog?"
  }
 ],
 "data_sources": [
  {
   "type": "mongo_db",
   "parameters": {
    "authentication": {
     "type": "username_and_password",
     "username": "<username>",
     "password": "<password>"
    },
    "endpoint": "<endpoint_name>",
    "app_name": "<application name>",
    "database_name": "sampledb",
    "collection_name": "samplecollection",
    "index_name": "sampleindex",
    "embedding_dependency": {
     "type": "deployment_name",
     "deployment_name": "{embedding deployment name}"
    },
    "fields_mapping": {
     "content_fields": [
      "content"
     ],
     "vector_fields": [
      "contentvector"
     ]
    }
   }
  }
 ]
}

Responses: Status Code: 200
{
  "body": {
    "id": "chatcmpl-7R1nGnsXO8n4oi9UPz2f3UHdgAYMn",
    "created": 1686676106,
    "choices": [
      {
        "index": 0,
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": "Content of the completion [doc1].",
          "context": {
            "citations": [
              {
                "content": "Citation content.",
                "title": "Citation Title",
                "filepath": "contoso.txt",
                "url": "https://contoso.blob.windows.net/container/contoso.txt",
                "chunk_id": "0"
              }
            ],
            "intent": "dog care"
          }
        }
      }
    ],
    "usage": {
      "completion_tokens": 557,
      "prompt_tokens": 33,
      "total_tokens": 590
    }
  }
}

Example

Creates a completion for the provided Elasticsearch.
POST https://{endpoint}/openai/deployments/{deployment-id}/chat/completions?api-version=2025-04-01-preview

{
 "messages": [
  {
   "role": "user",
   "content": "can you tell me how to care for a dog?"
  }
 ],
 "data_sources": [
  {
   "type": "elasticsearch",
   "parameters": {
    "endpoint": "https://your-elasticsearch-endpoint.eastus.azurecontainer.io",
    "index_name": "{index name}",
    "authentication": {
     "type": "key_and_key_id",
     "key": "{key}",
     "key_id": "{key id}"
    }
   }
  }
 ]
}

Responses: Status Code: 200
{
  "body": {
    "id": "chatcmpl-7R1nGnsXO8n4oi9UPz2f3UHdgAYMn",
    "created": 1686676106,
    "choices": [
      {
        "index": 0,
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": "Content of the completion [doc1].",
          "context": {
            "citations": [
              {
                "content": "Citation content.",
                "title": "Citation Title",
                "filepath": "contoso.txt",
                "url": "https://contoso.blob.windows.net/container/contoso.txt",
                "chunk_id": "0"
              }
            ],
            "intent": "dog care"
          }
        }
      }
    ],
    "usage": {
      "completion_tokens": 557,
      "prompt_tokens": 33,
      "total_tokens": 590
    }
  }
}

Example

Creates a completion for the provided Pinecone resource.
POST https://{endpoint}/openai/deployments/{deployment-id}/chat/completions?api-version=2025-04-01-preview

{
 "messages": [
  {
   "role": "user",
   "content": "can you tell me how to care for a dog?"
  }
 ],
 "data_sources": [
  {
   "type": "pinecone",
   "parameters": {
    "authentication": {
     "type": "api_key",
     "key": "{api key}"
    },
    "environment": "{environment name}",
    "index_name": "{index name}",
    "embedding_dependency": {
     "type": "deployment_name",
     "deployment_name": "{embedding deployment name}"
    },
    "fields_mapping": {
     "title_field": "title",
     "url_field": "url",
     "filepath_field": "filepath",
     "content_fields": [
      "content"
     ],
     "content_fields_separator": "\n"
    }
   }
  }
 ]
}

Responses: Status Code: 200
{
  "body": {
    "id": "chatcmpl-7R1nGnsXO8n4oi9UPz2f3UHdgAYMn",
    "created": 1686676106,
    "choices": [
      {
        "index": 0,
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": "Content of the completion [doc1].",
          "context": {
            "citations": [
              {
                "content": "Citation content.",
                "title": "Citation Title",
                "filepath": "contoso.txt",
                "url": "https://contoso.blob.windows.net/container/contoso.txt",
                "chunk_id": "0"
              }
            ],
            "intent": "dog care"
          }
        }
      }
    ],
    "usage": {
      "completion_tokens": 557,
      "prompt_tokens": 33,
      "total_tokens": 590
    }
  }
}

Transcriptions - Create

POST https://{endpoint}/openai/deployments/{deployment-id}/audio/transcriptions?api-version=2025-04-01-preview
Transcribes audio into the input language.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
deployment-idpathYesstring
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Request Body

Content-Type: multipart/form-data
NameTypeDescriptionRequiredDefault
modelstringID of the model to use. The options are gpt-4o-transcribe, gpt-4o-mini-transcribe, gpt-4o-mini-transcribe-2025-12-15, whisper-1, and gpt-4o-transcribe-diarize.Yes
filestringThe audio file object to transcribe.Yes
languagestringThe language of the input audio. Supplying the input language in ISO-639-1 format improves accuracy and latency.No
promptstringAn optional text to guide the model’s style or continue a previous audio segment. The prompt should match the audio language.No
response_formataudioResponseFormatDefines the format of the output.No
temperaturenumberThe sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model uses log probability to automatically increase the temperature until certain thresholds are hit.No0
timestamp_granularities[]arrayThe timestamp granularities to populate for this transcription. response_format must be set verbose_json to use timestamp granularities. Either or both of these options are supported: word, or segment. Note: There is no additional latency for segment timestamps, but generating word timestamps incurs additional latency.No[‘segment’]

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonobject
text/plainstringTranscribed text in the output format (when response_format was one of text, vtt or srt).

Examples

Example

Gets transcribed text and associated metadata from provided spoken audio data.
POST https://{endpoint}/openai/deployments/{deployment-id}/audio/transcriptions?api-version=2025-04-01-preview

Responses: Status Code: 200
{
  "body": {
    "text": "A structured object when requesting json or verbose_json"
  }
}

Example

Gets transcribed text and associated metadata from provided spoken audio data.
POST https://{endpoint}/openai/deployments/{deployment-id}/audio/transcriptions?api-version=2025-04-01-preview

"---multipart-boundary\nContent-Disposition: form-data; name=\"file\"; filename=\"file.wav\"\nContent-Type: application/octet-stream\n\nRIFF..audio.data.omitted\n---multipart-boundary--"

Responses: Status Code: 200
{
  "type": "string",
  "example": "plain text when requesting text, srt, or vtt"
}

Translations - Create

POST https://{endpoint}/openai/deployments/{deployment-id}/audio/translations?api-version=2025-04-01-preview
Transcribes and translates input audio into English text.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
deployment-idpathYesstring
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Request Body

Content-Type: multipart/form-data
NameTypeDescriptionRequiredDefault
filestringThe audio file to translate.Yes
promptstringAn optional text to guide the model’s style or continue a previous audio segment. The prompt should be in English.No
response_formataudioResponseFormatDefines the format of the output.No
temperaturenumberThe sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model uses log probability to automatically increase the temperature until certain thresholds are hit.No0

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonobject
text/plainstringTranscribed text in the output format (when response_format was one of text, vtt, or srt).

Examples

Example

Gets English language transcribed text and associated metadata from provided spoken audio data.
POST https://{endpoint}/openai/deployments/{deployment-id}/audio/translations?api-version=2025-04-01-preview

"---multipart-boundary\nContent-Disposition: form-data; name=\"file\"; filename=\"file.wav\"\nContent-Type: application/octet-stream\n\nRIFF..audio.data.omitted\n---multipart-boundary--"

Responses: Status Code: 200
{
  "body": {
    "text": "A structured object when requesting json or verbose_json"
  }
}

Example

Gets English language transcribed text and associated metadata from provided spoken audio data.
POST https://{endpoint}/openai/deployments/{deployment-id}/audio/translations?api-version=2025-04-01-preview

"---multipart-boundary\nContent-Disposition: form-data; name=\"file\"; filename=\"file.wav\"\nContent-Type: application/octet-stream\n\nRIFF..audio.data.omitted\n---multipart-boundary--"

Responses: Status Code: 200
{
  "type": "string",
  "example": "plain text when requesting text, srt, or vtt"
}

Speech - Create

POST https://{endpoint}/openai/deployments/{deployment-id}/audio/speech?api-version=2025-04-01-preview
Generates audio from the input text.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
deployment-idpathYesstring
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Request Body

Content-Type: multipart/form-data
NameTypeDescriptionRequiredDefault
inputstringThe text to synthesize audio for. The maximum length is 4,096 characters.Yes
response_formatenumThe format to synthesize the audio in.
Possible values: mp3, opus, aac, flac, wav, pcm
No
speednumberThe speed of the synthesized audio. Select a value from 0.25 to 4.0. 1.0 is the default.No1.0
voiceenumThe voice to use for speech synthesis.
Possible values: alloy, echo, fable, onyx, nova, shimmer
Yes

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/octet-streamstring

Examples

Example

Synthesizes audio from the provided text.
POST https://{endpoint}/openai/deployments/{deployment-id}/audio/speech?api-version=2025-04-01-preview

{
 "input": "Hi! What are you going to make?",
 "voice": "fable",
 "response_format": "mp3"
}

Responses: Status Code: 200
{
  "body": "101010101"
}

Image generations - Create

POST https://{endpoint}/openai/deployments/{deployment-id}/images/generations?api-version=2025-04-01-preview
Generates a batch of images from a text caption on a given image generation model deployment

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
deployment-idpathYesstring
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Request Body

Content-Type: application/json
NameTypeDescriptionRequiredDefault
backgroundimageBackgroundAllows to set transparency for the background of the generated images. This parameter is only supported for gpt-image-1 series models.Noauto
nintegerThe number of images to generate. For dall-e-3, only n=1 is supported.No1
output_compressionintegerThe compression level (0-100%) for the generated images. This parameter is only supported for gpt-image-1 series models with the jpeg output format.No100
output_formatimagesOutputFormatThe file format in which the generated images are returned. Only supported for gpt-image-1 series models.Nopng
promptstringA text description of the desired image(s). The maximum length is 32000 characters for gpt-image-1 series and 4000 characters for dall-e-3Yes
partial_imagesintegerThe number of partial images to generate. This parameter is used for streaming responses that return partial images. Value must be between 0 and 3. When set to 0, the response will be a single image sent in one streaming event. Note that the final image may be sent before the full number of partial images are generated if the full image is generated more quickly.0
streambooleanEdit the image in streaming mode.nofalse
qualityimageQualityThe quality of the image that will be generated.Noauto
response_formatimagesResponseFormatThe format in which the generated images are returned. This parameter isn’t supported for gpt-image-1-series models which will always return base64-encoded images.
Possible values: url, b64_json.
Nourl
sizeimageSizeThe size of the generated images.Noauto
styleimageStyleThe style of the generated images. Only supported for dall-e-3.Novivid
userstringA unique identifier representing your end-user, which can help to monitor and detect abuse.No

Responses

Status Code: 200 Description: Ok
Content-TypeTypeDescription
application/jsongenerateImagesResponse
Status Code: default Description: An error occurred.
Content-TypeTypeDescription
application/jsondalleErrorResponse

Examples

Example

Creates images given a prompt.
POST https://{endpoint}/openai/deployments/{deployment-id}/images/generations?api-version=2025-04-01-preview

{
 "prompt": "In the style of WordArt, Microsoft Clippy wearing a cowboy hat.",
 "n": 1,
 "style": "natural",
 "quality": "standard"
}

Responses: Status Code: 200
{
  "body": {
    "created": 1698342300,
    "data": [
      {
        "revised_prompt": "A vivid, natural representation of Microsoft Clippy wearing a cowboy hat.",
        "prompt_filter_results": {
          "sexual": {
            "severity": "safe",
            "filtered": false
          },
          "violence": {
            "severity": "safe",
            "filtered": false
          },
          "hate": {
            "severity": "safe",
            "filtered": false
          },
          "self_harm": {
            "severity": "safe",
            "filtered": false
          },
          "profanity": {
            "detected": false,
            "filtered": false
          },
          "custom_blocklists": {
            "filtered": false,
            "details": []
          }
        },
        "url": "https://dalletipusw2.blob.core.windows.net/private/images/e5451cc6-b1ad-4747-bd46-b89a3a3b8bc3/generated_00.png?se=2023-10-27T17%3A45%3A09Z&...",
        "content_filter_results": {
          "sexual": {
            "severity": "safe",
            "filtered": false
          },
          "violence": {
            "severity": "safe",
            "filtered": false
          },
          "hate": {
            "severity": "safe",
            "filtered": false
          },
          "self_harm": {
            "severity": "safe",
            "filtered": false
          }
        }
      }
    ]
  }
}

Image generations - Edit

POST https://{endpoint}/openai/deployments/{deployment-id}/images/edits?api-version=2025-04-01-preview
Edits an image from a text caption on a given gpt-image-1 model deployment

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
deployment-idpathYesstring
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Request Body

Content-Type: multipart/form-data
NameTypeDescriptionRequiredDefault
imagestring or arrayThe image(s) to edit. Must be a supported image file or an array of images. Each image should be a png, or jpg file less than 50MB.Yes
input_fidelitystringControl how much effort the model will exert to match the style and features, especially facial features, of input images. This parameter is only supported for gpt-image-1 series models. Supports high and low.nolow.
maskstringAn additional image whose fully transparent areas (e.g., where alpha is zero) indicate where the image should be edited. If there are multiple images provided, the mask will be applied to the first image. Must be a valid PNG file, less than 4MB, and have the same dimensions as the image.No
nintegerThe number of images to generate. Must be between 1 and 10.No1
promptstringA text description of the desired image(s). The maximum length is 32000 characters.Yes
qualityimageQualityThe quality of the image that will be generated.Noauto
partial_imagesThe number of partial images to generate. This parameter is used for streaming responses that return partial images. Value must be between 0 and 3. When set to 0, the response will be a single image sent in one streaming event. Note that the final image may be sent before the full number of partial images are generated if the full image is generated more quickly.
streambooleanEdit the image in streaming mode.nofalse
response_formatimagesResponseFormatThe format in which the generated images are returned.Nourl
sizeimageSizeThe size of the generated images.Noauto
userstringA unique identifier representing your end-user, which can help to monitor and detect abuse.No

Responses

Status Code: 200 Description: Ok
Content-TypeTypeDescription
application/jsongenerateImagesResponse
Status Code: default Description: An error occurred.
Content-TypeTypeDescription
application/jsondalleErrorResponse

List - Assistants

The Assistants API is deprecated and will be retired on August 26, 2026. Use the generally available Microsoft Foundry Agents service. Follow the migration guide to update your workloads. Learn more.
GET https://{endpoint}/openai/assistants?api-version=2025-04-01-preview
Returns a list of assistants.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
limitqueryNointegerA limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.
orderqueryNostring
Possible values: asc, desc
Sort order by the created_at timestamp of the objects. asc for ascending order and desc for descending order.
afterqueryNostringA cursor for use in pagination. after is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, ending with obj_foo, your subsequent call can include after=obj_foo in order to fetch the next page of the list.
beforequeryNostringA cursor for use in pagination. before is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, starting with obj_foo, your subsequent call can include before=obj_foo in order to fetch the previous page of the list.
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonlistAssistantsResponse

Examples

Example

Returns a list of assistants.
GET https://{endpoint}/openai/assistants?api-version=2025-04-01-preview

Responses: Status Code: 200
{
  "body": {
    "object": "list",
    "data": [
      {
        "id": "asst_abc123",
        "object": "assistant",
        "created_at": 1707257477,
        "name": "Stock Analyst",
        "description": null,
        "model": "gpt-4-1106-preview",
        "instructions": "You are a financial analyst that analyzes stock market prices and other financial data present on user uploaded files or by calling external APIs.",
        "tools": [
          {
            "type": "code_interpreter"
          }
        ],
        "tool_resources": {},
        "metadata": {},
        "top_p": 1.0,
        "temperature": 1.0,
        "response_format": "auto"
      },
      {
        "id": "asst_abc456",
        "object": "assistant",
        "created_at": 1698982718,
        "name": "My Assistant",
        "description": null,
        "model": "gpt-4-turbo",
        "instructions": "You are a helpful assistant designed to make me better at coding!",
        "tools": [],
        "tool_resources": {},
        "metadata": {},
        "top_p": 1.0,
        "temperature": 1.0,
        "response_format": "auto"
      },
      {
        "id": "asst_abc789",
        "object": "assistant",
        "created_at": 1698982643,
        "name": null,
        "description": null,
        "model": "gpt-4-turbo",
        "instructions": null,
        "tools": [],
        "tool_resources": {},
        "metadata": {},
        "top_p": 1.0,
        "temperature": 1.0,
        "response_format": "auto"
      }
    ],
    "first_id": "asst_abc123",
    "last_id": "asst_abc789",
    "has_more": false
  }
}

Create - Assistant

POST https://{endpoint}/openai/assistants?api-version=2025-04-01-preview
Create an assistant with a model and instructions.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Request Body

Content-Type: application/json
NameTypeDescriptionRequiredDefault
descriptionstringThe description of the assistant. The maximum length is 512 characters.
No
instructionsstringThe system instructions that the assistant uses. The maximum length is 256,000 characters.
No
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
No
modelstringYes
namestringThe name of the assistant. The maximum length is 256 characters.
No
response_formatassistantsApiResponseFormatOptionSpecifies the format that the model must output. Compatible with GPT-4o, GPT-4 Turbo, and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106.

Setting to { "type": "json_schema", "json_schema": {...} } enables Structured Outputs which ensure the model matches your supplied JSON schema. Learn more in the Structured Outputs guide.

Setting to { "type": "json_object" } enables JSON mode, which ensures the message the model generates is valid JSON.

Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly “stuck” request. Also note that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.
No
temperaturenumberWhat sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
No1
tool_resourcesobjectA set of resources that are used by the assistant’s tools. The resources are specific to the type of tool. For example, the code_interpreter tool requires a list of file IDs, while the file_search tool requires a list of vector store IDs.
No
└─ code_interpreterobjectNo
└─ file_idsarrayA list of file IDs made available to the code_interpreter tool. There can be a maximum of 20 files associated with the tool.
No[]
└─ file_searchobjectNo
└─ vector_store_idsarrayThe vector store attached to this assistant. There can be a maximum of 1 vector store attached to the assistant.
No
└─ vector_storesarrayA helper to create a vector store with file_ids and attach it to this assistant. There can be a maximum of 1 vector store attached to the assistant.
No
toolsarrayA list of tool enabled on the assistant. There can be a maximum of 128 tools per assistant. Tools can be of types code_interpreter, retrieval, or function.
No[]
top_pnumberAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.
No1

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonassistantObject

Examples

Example

Create an assistant with a model and instructions.
POST https://{endpoint}/openai/assistants?api-version=2025-04-01-preview

{
 "name": "Math Tutor",
 "instructions": "When a customer asks about a specific math problem, use Python to evaluate their query.",
 "tools": [
  {
   "type": "code_interpreter"
  }
 ],
 "model": "gpt-4-1106-preview"
}

Responses: Status Code: 200
{
  "body": {
    "id": "asst_4nsG2qgNzimRPE7MazXTXbU7",
    "object": "assistant",
    "created_at": 1707295707,
    "name": "Math Tutor",
    "description": null,
    "model": "gpt-4-1106-preview",
    "instructions": "When a customer asks about a specific math problem, use Python to evaluate their query.",
    "tools": [
      {
        "type": "code_interpreter"
      }
    ],
    "metadata": {},
    "top_p": 1.0,
    "temperature": 1.0,
    "response_format": "auto"
  }
}

Get - Assistant

GET https://{endpoint}/openai/assistants/{assistant_id}?api-version=2025-04-01-preview
Retrieves an assistant.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
assistant_idpathYesstringThe ID of the assistant to retrieve.
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonassistantObject

Examples

Example

Retrieves an assistant.
GET https://{endpoint}/openai/assistants/{assistant_id}?api-version=2025-04-01-preview

Responses: Status Code: 200
{
  "body": {
    "id": "asst_abc123",
    "object": "assistant",
    "created_at": 1699009709,
    "name": "HR Helper",
    "description": null,
    "model": "gpt-4-turbo",
    "instructions": "You are an HR bot, and you have access to files to answer employee questions about company policies.",
    "tools": [
      {
        "type": "file_search"
      }
    ],
    "metadata": {},
    "top_p": 1.0,
    "temperature": 1.0,
    "response_format": "auto"
  }
}

Modify - Assistant

POST https://{endpoint}/openai/assistants/{assistant_id}?api-version=2025-04-01-preview
Modifies an assistant.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
assistant_idpathYesstringThe ID of the assistant to modify.
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Request Body

Content-Type: application/json
NameTypeDescriptionRequiredDefault
descriptionstringThe description of the assistant. The maximum length is 512 characters.
No
instructionsstringThe system instructions that the assistant uses. The maximum length is 32768 characters.
No
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
No
modelstringNo
namestringThe name of the assistant. The maximum length is 256 characters.
No
response_formatassistantsApiResponseFormatOptionSpecifies the format that the model must output. Compatible with GPT-4o, GPT-4 Turbo, and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106.

Setting to { "type": "json_schema", "json_schema": {...} } enables Structured Outputs which ensure the model matches your supplied JSON schema. Learn more in the Structured Outputs guide.

Setting to { "type": "json_object" } enables JSON mode, which ensures the message the model generates is valid JSON.

Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly “stuck” request. Also note that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.
No
temperaturenumberWhat sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
No1
tool_resourcesobjectA set of resources that are used by the assistant’s tools. The resources are specific to the type of tool. For example, the code_interpreter tool requires a list of file IDs, while the file_search tool requires a list of vector store IDs.
No
└─ code_interpreterobjectNo
└─ file_idsarrayOverrides the list of file IDs made available to the code_interpreter tool. There can be a maximum of 20 files associated with the tool.
No[]
└─ file_searchobjectNo
└─ vector_store_idsarrayOverrides the vector store attached to this assistant. There can be a maximum of 1 vector store attached to the assistant.
No
toolsarrayA list of tool enabled on the assistant. There can be a maximum of 128 tools per assistant. Tools can be of types code_interpreter, retrieval, or function.
No[]
top_pnumberAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.
No1

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonassistantObject

Examples

Example

Modifies an assistant.
POST https://{endpoint}/openai/assistants/{assistant_id}?api-version=2025-04-01-preview

{
 "instructions": "You are an HR bot, and you have access to files to answer employee questions about company policies. Always response with info from either of the files.",
 "tools": [
  {
   "type": "file_search"
  }
 ],
 "model": "gpt-4-turbo"
}

Responses: Status Code: 200
{
  "body": {
    "id": "asst_123",
    "object": "assistant",
    "created_at": 1699009709,
    "name": "HR Helper",
    "description": null,
    "model": "gpt-4-turbo",
    "instructions": "You are an HR bot, and you have access to files to answer employee questions about company policies. Always response with info from either of the files.",
    "tools": [
      {
        "type": "file_search"
      }
    ],
    "tool_resources": {
      "file_search": {
        "vector_store_ids": []
      }
    },
    "metadata": {},
    "top_p": 1.0,
    "temperature": 1.0,
    "response_format": "auto"
  }
}

Delete - Assistant

DELETE https://{endpoint}/openai/assistants/{assistant_id}?api-version=2025-04-01-preview
Delete an assistant.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
assistant_idpathYesstringThe ID of the assistant to delete.
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsondeleteAssistantResponse

Examples

Example

Deletes an assistant.
DELETE https://{endpoint}/openai/assistants/{assistant_id}?api-version=2025-04-01-preview

Responses: Status Code: 200
{
  "body": {
    "id": "asst_4nsG2qgNzimRPE7MazXTXbU7",
    "object": "assistant.deleted",
    "deleted": true
  }
}

Create - Thread

The Assistants API is deprecated and will be retired on August 26, 2026. Use the generally available Microsoft Foundry Agents service. Follow the migration guide to update your workloads. Learn more.
POST https://{endpoint}/openai/threads?api-version=2025-04-01-preview
Create a thread.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Request Body

Content-Type: application/json
NameTypeDescriptionRequiredDefault
messagesarrayA list of messagesto start the thread with.No
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
No
tool_resourcesobjectA set of resources that are made available to the assistant’s tools in this thread. The resources are specific to the type of tool. For example, the code_interpreter tool requires a list of file IDs, while the file_search tool requires a list of vector store IDs.
No
└─ code_interpreterobjectNo
└─ file_idsarrayA list of file IDs made available to the code_interpreter tool. There can be a maximum of 20 files associated with the tool.
No[]
└─ file_searchobjectNo
└─ vector_store_idsarrayThe vector store attached to this thread. There can be a maximum of 1 vector store attached to the thread.
No
└─ vector_storesarrayA helper to create a vector store with file_ids and attach it to this thread. There can be a maximum of 1 vector store attached to the thread.
No

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonthreadObject

Examples

Example

Creates a thread.
POST https://{endpoint}/openai/threads?api-version=2025-04-01-preview

Responses: Status Code: 200
{
  "body": {
    "id": "thread_v7V4csrNOxtNmgcwGg496Smx",
    "object": "thread",
    "created_at": 1707297136,
    "metadata": {}
  }
}

Get - Thread

GET https://{endpoint}/openai/threads/{thread_id}?api-version=2025-04-01-preview
Retrieves a thread.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
thread_idpathYesstringThe ID of the thread to retrieve.
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonthreadObject

Examples

Example

Retrieves a thread.
GET https://{endpoint}/openai/threads/{thread_id}?api-version=2025-04-01-preview

Responses: Status Code: 200
{
  "body": {
    "id": "thread_v7V4csrNOxtNmgcwGg496Smx",
    "object": "thread",
    "created_at": 1707297136,
    "metadata": {},
    "tool_resources": {
      "code_interpreter": {
        "file_ids": []
      }
    }
  }
}

Modify - Thread

POST https://{endpoint}/openai/threads/{thread_id}?api-version=2025-04-01-preview
Modifies a thread.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
thread_idpathYesstringThe ID of the thread to modify. Only the metadata can be modified.
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Request Body

Content-Type: application/json
NameTypeDescriptionRequiredDefault
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
No
tool_resourcesobjectA set of resources that are made available to the assistant’s tools in this thread. The resources are specific to the type of tool. For example, the code_interpreter tool requires a list of file IDs, while the file_search tool requires a list of vector store IDs.
No
└─ code_interpreterobjectNo
└─ file_idsarrayA list of File IDs made available to the code_interpreter tool. There can be a maximum of 20 files associated with the tool.
No[]
└─ file_searchobjectNo
└─ vector_store_idsarrayThe vector store attached to this thread. There can be a maximum of 1 vector store attached to the thread.
No

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonthreadObject

Examples

Example

Modifies a thread.
POST https://{endpoint}/openai/threads/{thread_id}?api-version=2025-04-01-preview

{
 "metadata": {
  "modified": "true",
  "user": "abc123"
 }
}

Responses: Status Code: 200
{
  "body": {
    "id": "thread_v7V4csrNOxtNmgcwGg496Smx",
    "object": "thread",
    "created_at": 1707297136,
    "metadata": {
      "modified": "true",
      "user": "abc123"
    },
    "tool_resources": {}
  }
}

Delete - Thread

DELETE https://{endpoint}/openai/threads/{thread_id}?api-version=2025-04-01-preview
Delete a thread.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
thread_idpathYesstringThe ID of the thread to delete.
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsondeleteThreadResponse

Examples

Example

Deletes a thread.
DELETE https://{endpoint}/openai/threads/{thread_id}?api-version=2025-04-01-preview

Responses: Status Code: 200
{
  "body": {
    "id": "thread_v7V4csrNOxtNmgcwGg496Smx",
    "object": "thread.deleted",
    "deleted": true
  }
}

List - Messages

The Assistants API is deprecated and will be retired on August 26, 2026. Use the generally available Microsoft Foundry Agents service. Follow the migration guide to update your workloads. Learn more.
GET https://{endpoint}/openai/threads/{thread_id}/messages?api-version=2025-04-01-preview
Returns a list of messages for a given thread.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
thread_idpathYesstringThe ID of the threads the messages belong to.
limitqueryNointegerA limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.
orderqueryNostring
Possible values: asc, desc
Sort order by the created_at timestamp of the objects. asc for ascending order and desc for descending order.
afterqueryNostringA cursor for use in pagination. after is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, ending with obj_foo, your subsequent call can include after=obj_foo in order to fetch the next page of the list.
beforequeryNostringA cursor for use in pagination. before is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, starting with obj_foo, your subsequent call can include before=obj_foo in order to fetch the previous page of the list.
run_idqueryNostringFilter messages by the run ID that generated them.
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonlistMessagesResponse

Examples

Example

List Messages
GET https://{endpoint}/openai/threads/{thread_id}/messages?api-version=2025-04-01-preview

Responses: Status Code: 200
{
  "body": {
    "object": "list",
    "data": [
      {
        "id": "msg_abc123",
        "object": "thread.message",
        "created_at": 1699016383,
        "assistant_id": null,
        "thread_id": "thread_abc123",
        "run_id": null,
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": {
              "value": "How does AI work? Explain it in simple terms.",
              "annotations": []
            }
          }
        ],
        "attachments": [],
        "metadata": {}
      },
      {
        "id": "msg_abc456",
        "object": "thread.message",
        "created_at": 1699016383,
        "assistant_id": null,
        "thread_id": "thread_abc123",
        "run_id": null,
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": {
              "value": "Hello, what is AI?",
              "annotations": []
            }
          }
        ],
        "attachments": [],
        "metadata": {}
      }
    ],
    "first_id": "msg_abc123",
    "last_id": "msg_abc456",
    "has_more": false
  }
}

Create - Message

POST https://{endpoint}/openai/threads/{thread_id}/messages?api-version=2025-04-01-preview
Create a message.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
thread_idpathYesstringThe ID of the threads to create a message for.
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Request Body

Content-Type: application/json
NameTypeDescriptionRequiredDefault
attachmentsarrayA list of files attached to the message, and the tools they should be added to.No
contentstringThe content of the message.Yes
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
No
rolestringThe role of the entity that is creating the message. Allowed values include:
- user: Indicates the message is sent by an actual user and should be used in most cases to represent user-generated messages.
- assistant: Indicates the message is generated by the assistant. Use this value to insert messages from the assistant into the conversation.
Yes

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonmessageObject

Examples

Example

Create a message.
POST https://{endpoint}/openai/threads/{thread_id}/messages?api-version=2025-04-01-preview

{
 "role": "user",
 "content": "What is the cube root of the sum of 12, 14, 1234, 4321, 90000, 123213541223, 443123123124, 5423324234, 234324324234, 653434534545, 200000000, 98237432984, 99999999, 99999999999, 220000000000, 3309587702? Give me the answer rounded to the nearest integer without commas or spaces."
}

Responses: Status Code: 200
{
  "body": {
    "id": "msg_as3XIk1tpVP3hdHjWBGg3uG4",
    "object": "thread.message",
    "created_at": 1707298421,
    "assistant_id": null,
    "thread_id": "thread_v7V4csrNOxtNmgcwGg496Smx",
    "run_id": null,
    "role": "user",
    "content": [
      {
        "type": "text",
        "text": {
          "value": "What is the cube root of the sum of 12, 14, 1234, 4321, 90000, 123213541223, 443123123124, 5423324234, 234324324234, 653434534545, 200000000, 98237432984, 99999999, 99999999999, 220000000000, 3309587702? Give me the answer rounded to the nearest integer without commas or spaces.",
          "annotations": []
        }
      }
    ],
    "attachments": [],
    "metadata": {}
  }
}

Get - Message

GET https://{endpoint}/openai/threads/{thread_id}/messages/{message_id}?api-version=2025-04-01-preview
Retrieve a message.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
thread_idpathYesstringThe ID of the threads to which this message belongs.
message_idpathYesstringThe ID of the message to retrieve.
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonmessageObject

Examples

Example

Retrieve a message.
GET https://{endpoint}/openai/threads/{thread_id}/messages/{message_id}?api-version=2025-04-01-preview

Responses: Status Code: 200
{
  "body": {
    "id": "msg_as3XIk1tpVP3hdHjWBGg3uG4",
    "object": "thread.message",
    "created_at": 1707298421,
    "thread_id": "thread_v7V4csrNOxtNmgcwGg496Smx",
    "role": "user",
    "content": [
      {
        "type": "text",
        "text": {
          "value": "What is the cube root of the sum of 12, 14, 1234, 4321, 90000, 123213541223, 443123123124, 5423324234, 234324324234, 653434534545, 200000000, 98237432984, 99999999, 99999999999, 220000000000, 3309587702? Give me the answer rounded to the nearest integer without commas or spaces.",
          "annotations": []
        }
      }
    ],
    "file_ids": [],
    "assistant_id": null,
    "run_id": null,
    "metadata": {}
  }
}

Modify - Message

POST https://{endpoint}/openai/threads/{thread_id}/messages/{message_id}?api-version=2025-04-01-preview
Modifies a message.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
thread_idpathYesstringThe ID of the thread to which this message belongs.
message_idpathYesstringThe ID of the message to modify.
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Request Body

Content-Type: application/json
NameTypeDescriptionRequiredDefault
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
No

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonmessageObject

Examples

Example

Modify a message.
POST https://{endpoint}/openai/threads/{thread_id}/messages/{message_id}?api-version=2025-04-01-preview

{
 "metadata": {
  "modified": "true",
  "user": "abc123"
 }
}

Responses: Status Code: 200
{
  "body": {
    "id": "msg_abc123",
    "object": "thread.message",
    "created_at": 1699017614,
    "assistant_id": null,
    "thread_id": "thread_abc123",
    "run_id": null,
    "role": "user",
    "content": [
      {
        "type": "text",
        "text": {
          "value": "How does AI work? Explain it in simple terms.",
          "annotations": []
        }
      }
    ],
    "file_ids": [],
    "metadata": {
      "modified": "true",
      "user": "abc123"
    }
  }
}

Create - Thread And Run

The Assistants API is deprecated and will be retired on August 26, 2026. Use the generally available Microsoft Foundry Agents service. Follow the migration guide to update your workloads. Learn more.
POST https://{endpoint}/openai/threads/runs?api-version=2025-04-01-preview
Create a thread and run it in one request.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Request Body

Content-Type: application/json
NameTypeDescriptionRequiredDefault
assistant_idstringThe ID of the assistant to use to execute this run.Yes
instructionsstringOverride the default system message of the assistant. This is useful for modifying the behavior on a per-run basis.No
max_completion_tokensintegerThe maximum number of completion tokens that may be used over the course of the run. The run makes a best effort to use only the number of completion tokens specified, across multiple turns of the run. If the run exceeds the number of completion tokens specified, the run will end with status incomplete. See incomplete_details for more info.
No
max_prompt_tokensintegerThe maximum number of prompt tokens that may be used over the course of the run. The run makes a best effort to use only the number of prompt tokens specified, across multiple turns of the run. If the run exceeds the number of prompt tokens specified, the run will end with status incomplete. See incomplete_details for more info.
No
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
No
modelstringThe ID of the models to be used to execute this run. If a value is provided here, it will override the model associated with the assistant. If not, the model associated with the assistant will be used.No
parallel_tool_callsParallelToolCallsWhether to enable parallel function calling during tool use.NoTrue
response_formatassistantsApiResponseFormatOptionSpecifies the format that the model must output. Compatible with GPT-4o, GPT-4 Turbo, and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106.

Setting to { "type": "json_schema", "json_schema": {...} } enables Structured Outputs which ensure the model matches your supplied JSON schema. Learn more in the Structured Outputs guide.

Setting to { "type": "json_object" } enables JSON mode, which ensures the message the model generates is valid JSON.

Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly “stuck” request. Also note that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.
No
streambooleanIf true, returns a stream of events that happen during the Run as server-sent events, terminating when the Run enters a terminal state with a data: [DONE] message.
No
stream_optionschatCompletionStreamOptionsOptions for streaming response. Only set this when you set stream: true.
NoNone
temperaturenumberWhat sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
No1
threadcreateThreadRequestNo
tool_choiceassistantsApiToolChoiceOptionControls which (if any) tool is called by the model.
none means the model won’t call any tools and instead generates a message.
auto is the default value and means the model can pick between generating a message or calling a tool.
Specifying a particular tool like {"type": "file_search"} or {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool.
No
tool_resourcesobjectA set of resources that are used by the assistant’s tools. The resources are specific to the type of tool. For example, the code_interpreter tool requires a list of file IDs, while the file_search tool requires a list of vector store IDs.
No
└─ code_interpreterobjectNo
└─ file_idsarrayA list of file IDs made available to the code_interpreter tool. There can be a maximum of 20 files associated with the tool.
No[]
└─ file_searchobjectNo
└─ vector_store_idsarrayThe ID of the vector store attached to this assistant. There can be a maximum of 1 vector store attached to the assistant.
No
toolsarrayOverride the tools the assistant can use for this run. This is useful for modifying the behavior on a per-run basis.No
top_pnumberAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.
No1
truncation_strategytruncationObjectControls for how a thread will be truncated prior to the run. Use this to control the initial context window of the run.No

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonrunObject

Examples

Example

Create a thread and run it in one request.
POST https://{endpoint}/openai/threads/runs?api-version=2025-04-01-preview

{
 "assistant_id": "asst_abc123",
 "thread": {
  "messages": [
   {
    "role": "user",
    "content": "Explain deep learning to a 5 year old."
   }
  ]
 }
}

Responses: Status Code: 200
{
  "body": {
    "id": "run_abc123",
    "object": "thread.run",
    "created_at": 1699076792,
    "assistant_id": "asst_abc123",
    "thread_id": "thread_abc123",
    "status": "queued",
    "started_at": null,
    "expires_at": 1699077392,
    "cancelled_at": null,
    "failed_at": null,
    "completed_at": null,
    "required_action": null,
    "last_error": null,
    "model": "gpt-4-turbo",
    "instructions": "You are a helpful assistant.",
    "tools": [],
    "tool_resources": {},
    "metadata": {},
    "temperature": 1.0,
    "top_p": 1.0,
    "max_completion_tokens": null,
    "max_prompt_tokens": null,
    "truncation_strategy": {
      "type": "auto",
      "last_messages": null
    },
    "incomplete_details": null,
    "usage": null,
    "response_format": "auto",
    "tool_choice": "auto"
  }
}

List - Runs

GET https://{endpoint}/openai/threads/{thread_id}/runs?api-version=2025-04-01-preview
Returns a list of runs belonging to a thread.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
thread_idpathYesstringThe ID of the thread the run belongs to.
limitqueryNointegerA limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.
orderqueryNostring
Possible values: asc, desc
Sort order by the created_at timestamp of the objects. asc for ascending order and desc for descending order.
afterqueryNostringA cursor for use in pagination. after is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, ending with obj_foo, your subsequent call can include after=obj_foo in order to fetch the next page of the list.
beforequeryNostringA cursor for use in pagination. before is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, starting with obj_foo, your subsequent call can include before=obj_foo in order to fetch the previous page of the list.
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonlistRunsResponse

Examples

Example

Returns a list of runs belonging to a thread.
GET https://{endpoint}/openai/threads/{thread_id}/runs?api-version=2025-04-01-preview

Responses: Status Code: 200
{
  "body": {
    "object": "list",
    "data": [
      {
        "id": "run_abc123",
        "object": "thread.run",
        "created_at": 1699075072,
        "assistant_id": "asst_abc123",
        "thread_id": "thread_abc123",
        "status": "completed",
        "started_at": 1699075072,
        "expires_at": null,
        "cancelled_at": null,
        "failed_at": null,
        "completed_at": 1699075073,
        "last_error": null,
        "model": "gpt-4-turbo",
        "instructions": null,
        "incomplete_details": null,
        "tools": [
          {
            "type": "code_interpreter"
          }
        ],
        "tool_resources": {
          "code_interpreter": {
            "file_ids": [
              "file-abc123",
              "file-abc456"
            ]
          }
        },
        "metadata": {},
        "usage": {
          "prompt_tokens": 123,
          "completion_tokens": 456,
          "total_tokens": 579
        },
        "temperature": 1.0,
        "top_p": 1.0,
        "max_prompt_tokens": 1000,
        "max_completion_tokens": 1000,
        "truncation_strategy": {
          "type": "auto",
          "last_messages": null
        },
        "response_format": "auto",
        "tool_choice": "auto"
      },
      {
        "id": "run_abc456",
        "object": "thread.run",
        "created_at": 1699063290,
        "assistant_id": "asst_abc123",
        "thread_id": "thread_abc123",
        "status": "completed",
        "started_at": 1699063290,
        "expires_at": null,
        "cancelled_at": null,
        "failed_at": null,
        "completed_at": 1699063291,
        "last_error": null,
        "model": "gpt-4-turbo",
        "instructions": null,
        "incomplete_details": null,
        "tools": [
          {
            "type": "code_interpreter"
          }
        ],
        "tool_resources": {
          "code_interpreter": {
            "file_ids": [
              "file-abc123",
              "file-abc456"
            ]
          }
        },
        "metadata": {},
        "usage": {
          "prompt_tokens": 123,
          "completion_tokens": 456,
          "total_tokens": 579
        },
        "temperature": 1.0,
        "top_p": 1.0,
        "max_prompt_tokens": 1000,
        "max_completion_tokens": 1000,
        "truncation_strategy": {
          "type": "auto",
          "last_messages": null
        },
        "response_format": "auto",
        "tool_choice": "auto"
      }
    ],
    "first_id": "run_abc123",
    "last_id": "run_abc456",
    "has_more": false
  }
}

Create - Run

POST https://{endpoint}/openai/threads/{thread_id}/runs?api-version=2025-04-01-preview
Create a run.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
thread_idpathYesstringThe ID of the thread to run.
include[]queryNoarrayA list of additional fields to include in the response. Currently the only supported value is step_details.tool_calls[*].file_search.results[*].content to fetch the file search result content.
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Request Body

Content-Type: application/json
NameTypeDescriptionRequiredDefault
additional_instructionsstringAppends additional instructions at the end of the instructions for the run. This is useful for modifying the behavior on a per-run basis without overriding other instructions.No
additional_messagesarrayAdds additional messages to the thread before creating the run.No
assistant_idstringThe ID of the assistant to use to execute this run.Yes
instructionsstringOverride the default system message of the assistant. This is useful for modifying the behavior on a per-run basis.No
max_completion_tokensintegerThe maximum number of completion tokens that may be used over the course of the run. The run makes a best effort to use only the number of completion tokens specified, across multiple turns of the run. If the run exceeds the number of completion tokens specified, the run will end with status incomplete. See incomplete_details for more info.
No
max_prompt_tokensintegerThe maximum number of prompt tokens that may be used over the course of the run. The run makes a best effort to use only the number of prompt tokens specified, across multiple turns of the run. If the run exceeds the number of prompt tokens specified, the run will end with status incomplete. See incomplete_details for more info.
No
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
No
modelstringThe ID of the Model to be used to execute this run. If a value is provided here, it will override the model associated with the assistant. If not, the model associated with the assistant will be used.No
parallel_tool_callsParallelToolCallsWhether to enable parallel function calling during tool use.NoTrue
response_formatassistantsApiResponseFormatOptionSpecifies the format that the model must output. Compatible with GPT-4o, GPT-4 Turbo, and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106.

Setting to { "type": "json_schema", "json_schema": {...} } enables Structured Outputs which ensure the model matches your supplied JSON schema. Learn more in the Structured Outputs guide.

Setting to { "type": "json_object" } enables JSON mode, which ensures the message the model generates is valid JSON.

Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly “stuck” request. Also note that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.
No
streambooleanIf true, returns a stream of events that happen during the Run as server-sent events, terminating when the Run enters a terminal state with a data: [DONE] message.
No
temperaturenumberWhat sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
No1
tool_choiceassistantsApiToolChoiceOptionControls which (if any) tool is called by the model.
none means the model won’t call any tools and instead generates a message.
auto is the default value and means the model can pick between generating a message or calling a tool.
Specifying a particular tool like {"type": "file_search"} or {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool.
No
toolsarrayOverride the tools the assistant can use for this run. This is useful for modifying the behavior on a per-run basis.No
top_pnumberAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.
No1
truncation_strategytruncationObjectControls for how a thread will be truncated prior to the run. Use this to control the initial context window of the run.No

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonrunObject

Examples

Example

Create a run.
POST https://{endpoint}/openai/threads/{thread_id}/runs?api-version=2025-04-01-preview

{
 "assistant_id": "asst_abc123"
}

Responses: Status Code: 200
{
  "body": {
    "id": "run_abc123",
    "object": "thread.run",
    "created_at": 1699063290,
    "assistant_id": "asst_abc123",
    "thread_id": "thread_abc123",
    "status": "queued",
    "started_at": 1699063290,
    "expires_at": null,
    "cancelled_at": null,
    "failed_at": null,
    "completed_at": 1699063291,
    "last_error": null,
    "model": "gpt-4-turbo",
    "instructions": null,
    "incomplete_details": null,
    "tools": [
      {
        "type": "code_interpreter"
      }
    ],
    "metadata": {},
    "usage": null,
    "temperature": 1.0,
    "top_p": 1.0,
    "max_prompt_tokens": 1000,
    "max_completion_tokens": 1000,
    "truncation_strategy": {
      "type": "auto",
      "last_messages": null
    },
    "response_format": "auto",
    "tool_choice": "auto"
  }
}

Get - Run

GET https://{endpoint}/openai/threads/{thread_id}/runs/{run_id}?api-version=2025-04-01-preview
Retrieves a run.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
thread_idpathYesstringThe ID of the threads that was run.
run_idpathYesstringThe ID of the run to retrieve.
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonrunObject

Examples

Example

Gets a run.
GET https://{endpoint}/openai/threads/{thread_id}/runs/{run_id}?api-version=2025-04-01-preview

Responses: Status Code: 200
{
  "body": {
    "id": "run_HsO8tYM4K5AAMAHgK0J3om8Q",
    "object": "thread.run",
    "created_at": 1707303196,
    "assistant_id": "asst_JtTwHk28cIocgFXZPCBxhOzl",
    "thread_id": "thread_eRNwflE3ncDYak1np6MdMHJh",
    "status": "completed",
    "started_at": 1707303197,
    "expires_at": null,
    "cancelled_at": null,
    "failed_at": null,
    "completed_at": 1707303201,
    "last_error": null,
    "model": "gpt-4-1106-preview",
    "instructions": "You are an AI model that empowers every person and every organization on the planet to achieve more.",
    "tools": [],
    "file_ids": [],
    "metadata": {}
  }
}

Modify - Run

POST https://{endpoint}/openai/threads/{thread_id}/runs/{run_id}?api-version=2025-04-01-preview
Modifies a run.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
thread_idpathYesstringThe ID of the threads that was run.
run_idpathYesstringThe ID of the run to modify.
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Request Body

Content-Type: application/json
NameTypeDescriptionRequiredDefault
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
No

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonrunObject

Examples

Example

Modifies a run.
POST https://{endpoint}/openai/threads/{thread_id}/runs/{run_id}?api-version=2025-04-01-preview

{
 "metadata": {
  "user_id": "user_abc123"
 }
}

Responses: Status Code: 200
{
  "body": {
    "id": "run_abc123",
    "object": "thread.run",
    "created_at": 1699075072,
    "assistant_id": "asst_abc123",
    "thread_id": "thread_abc123",
    "status": "completed",
    "started_at": 1699075072,
    "expires_at": null,
    "cancelled_at": null,
    "failed_at": null,
    "completed_at": 1699075073,
    "last_error": null,
    "model": "gpt-4-turbo",
    "instructions": null,
    "incomplete_details": null,
    "tools": [
      {
        "type": "code_interpreter"
      }
    ],
    "tool_resources": {
      "code_interpreter": {
        "file_ids": [
          "file-abc123",
          "file-abc456"
        ]
      }
    },
    "metadata": {
      "user_id": "user_abc123"
    },
    "usage": {
      "prompt_tokens": 123,
      "completion_tokens": 456,
      "total_tokens": 579
    },
    "temperature": 1.0,
    "top_p": 1.0,
    "max_prompt_tokens": 1000,
    "max_completion_tokens": 1000,
    "truncation_strategy": {
      "type": "auto",
      "last_messages": null
    },
    "response_format": "auto",
    "tool_choice": "auto"
  }
}

Submit - Tool Outputs To Run

POST https://{endpoint}/openai/threads/{thread_id}/runs/{run_id}/submit_tool_outputs?api-version=2025-04-01-preview
When a run has the status: "requires_action" and required_action.type is submit_tool_outputs, this endpoint can be used to submit the outputs from the tool calls once they’re all completed. All outputs must be submitted in a single request.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
thread_idpathYesstringThe ID of the threads to which this run belongs.
run_idpathYesstringThe ID of the run that requires the tool output submission.
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Request Body

Content-Type: application/json
NameTypeDescriptionRequiredDefault
streambooleanIf true, returns a stream of events that happen during the Run as server-sent events, terminating when the Run enters a terminal state with a data: [DONE] message.
No
tool_outputsarrayA list of tools for which the outputs are being submitted.Yes

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonrunObject

Examples

Example

When a run has the status: "requires_action" and required_action.type is submit_tool_outputs, this endpoint can be used to submit the outputs from the tool calls once they’re all completed. All outputs must be submitted in a single request.
POST https://{endpoint}/openai/threads/{thread_id}/runs/{run_id}/submit_tool_outputs?api-version=2025-04-01-preview

{
 "tool_outputs": [
  {
   "tool_call_id": "call_001",
   "output": "70 degrees and sunny."
  }
 ]
}

Responses: Status Code: 200
{
  "body": {
    "id": "run_123",
    "object": "thread.run",
    "created_at": 1699075592,
    "assistant_id": "asst_123",
    "thread_id": "thread_123",
    "status": "queued",
    "started_at": 1699075592,
    "expires_at": 1699076192,
    "cancelled_at": null,
    "failed_at": null,
    "completed_at": null,
    "last_error": null,
    "model": "gpt-4-turbo",
    "instructions": null,
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_current_weather",
          "description": "Get the current weather in a given location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA"
              },
              "unit": {
                "type": "string",
                "enum": [
                  "celsius",
                  "fahrenheit"
                ]
              }
            },
            "required": [
              "location"
            ]
          }
        }
      }
    ],
    "metadata": {},
    "usage": null,
    "temperature": 1.0,
    "top_p": 1.0,
    "max_prompt_tokens": 1000,
    "max_completion_tokens": 1000,
    "truncation_strategy": {
      "type": "auto",
      "last_messages": null
    },
    "response_format": "auto",
    "tool_choice": "auto"
  }
}

Cancel - Run

POST https://{endpoint}/openai/threads/{thread_id}/runs/{run_id}/cancel?api-version=2025-04-01-preview
Cancels a run that is in_progress.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
thread_idpathYesstringThe ID of the thread to which this run belongs.
run_idpathYesstringThe ID of the run to cancel.
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonrunObject

Examples

Example

Cancels a run that is in_progress.
POST https://{endpoint}/openai/threads/{thread_id}/runs/{run_id}/cancel?api-version=2025-04-01-preview

Responses: Status Code: 200
{
  "body": {
    "id": "run_abc123",
    "object": "thread.run",
    "created_at": 1699076126,
    "assistant_id": "asst_abc123",
    "thread_id": "thread_abc123",
    "status": "cancelling",
    "started_at": 1699076126,
    "expires_at": 1699076726,
    "cancelled_at": null,
    "failed_at": null,
    "completed_at": null,
    "last_error": null,
    "model": "gpt-4-turbo",
    "instructions": "You summarize books.",
    "tools": [
      {
        "type": "file_search"
      }
    ],
    "tool_resources": {
      "file_search": {
        "vector_store_ids": [
          "vs_123"
        ]
      }
    },
    "metadata": {},
    "usage": null,
    "temperature": 1.0,
    "top_p": 1.0,
    "response_format": "auto"
  }
}

List - Run Steps

GET https://{endpoint}/openai/threads/{thread_id}/runs/{run_id}/steps?api-version=2025-04-01-preview
Returns a list of run steps belonging to a run.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
thread_idpathYesstringThe ID of the thread the run and run steps belong to.
run_idpathYesstringThe ID of the run the run steps belong to.
limitqueryNointegerA limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.
orderqueryNostring
Possible values: asc, desc
Sort order by the created_at timestamp of the objects. asc for ascending order and desc for descending order.
afterqueryNostringA cursor for use in pagination. after is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, ending with obj_foo, your subsequent call can include after=obj_foo in order to fetch the next page of the list.
beforequeryNostringA cursor for use in pagination. before is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, starting with obj_foo, your subsequent call can include before=obj_foo in order to fetch the previous page of the list.
api-versionqueryYesstring
include[]queryNoarrayA list of additional fields to include in the response. Currently the only supported value is step_details.tool_calls[*].file_search.results[*].content to fetch the file search result content.

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonlistRunStepsResponse

Examples

Example

Returns a list of run steps belonging to a run.
GET https://{endpoint}/openai/threads/{thread_id}/runs/{run_id}/steps?api-version=2025-04-01-preview

Responses: Status Code: 200
{
  "body": {
    "object": "list",
    "data": [
      {
        "id": "step_abc123",
        "object": "thread.run.step",
        "created_at": 1699063291,
        "run_id": "run_abc123",
        "assistant_id": "asst_abc123",
        "thread_id": "thread_abc123",
        "type": "message_creation",
        "status": "completed",
        "cancelled_at": null,
        "completed_at": 1699063291,
        "expired_at": null,
        "failed_at": null,
        "last_error": null,
        "step_details": {
          "type": "message_creation",
          "message_creation": {
            "message_id": "msg_abc123"
          }
        },
        "usage": {
          "prompt_tokens": 123,
          "completion_tokens": 456,
          "total_tokens": 579
        }
      }
    ],
    "first_id": "step_abc123",
    "last_id": "step_abc456",
    "has_more": false
  }
}

Get - Run Step

GET https://{endpoint}/openai/threads/{thread_id}/runs/{run_id}/steps/{step_id}?api-version=2025-04-01-preview
Retrieves a run step.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
thread_idpathYesstringThe ID of the thread to which the run and run step belongs.
run_idpathYesstringThe ID of the run to which the run step belongs.
step_idpathYesstringThe ID of the run step to retrieve.
include[]queryNoarrayA list of additional fields to include in the response. Currently the only supported value is step_details.tool_calls[*].file_search.results[*].content to fetch the file search result content.
api-versionqueryYesstring

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonrunStepObject

Examples

Example

Retrieves a run step.
GET https://{endpoint}/openai/threads/{thread_id}/runs/{run_id}/steps/{step_id}?api-version=2025-04-01-preview

Responses: Status Code: 200
{
  "body": {
    "id": "step_abc123",
    "object": "thread.run.step",
    "created_at": 1699063291,
    "run_id": "run_abc123",
    "assistant_id": "asst_abc123",
    "thread_id": "thread_abc123",
    "type": "message_creation",
    "status": "completed",
    "cancelled_at": null,
    "completed_at": 1699063291,
    "expired_at": null,
    "failed_at": null,
    "last_error": null,
    "step_details": {
      "type": "message_creation",
      "message_creation": {
        "message_id": "msg_abc123"
      }
    },
    "usage": {
      "prompt_tokens": 123,
      "completion_tokens": 456,
      "total_tokens": 579
    }
  }
}

List - Vector Stores

GET https://{endpoint}/openai/vector_stores?api-version=2025-04-01-preview
Returns a list of vector stores.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
limitqueryNointegerA limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.
orderqueryNostring
Possible values: asc, desc
Sort order by the created_at timestamp of the objects. asc for ascending order and desc for descending order.
afterqueryNostringA cursor for use in pagination. after is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, ending with obj_foo, your subsequent call can include after=obj_foo in order to fetch the next page of the list.
beforequeryNostringA cursor for use in pagination. before is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, starting with obj_foo, your subsequent call can include before=obj_foo in order to fetch the previous page of the list.
api-versionqueryYesstringapi version

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonlistVectorStoresResponse

Examples

Example

Returns a list of vector stores.
GET https://{endpoint}/openai/vector_stores?api-version=2025-04-01-preview

Responses: Status Code: 200
{
  "body": {
    "object": "list",
    "data": [
      {
        "id": "vs_abc123",
        "object": "vector_store",
        "created_at": 1699061776,
        "name": "Support FAQ",
        "bytes": 139920,
        "file_counts": {
          "in_progress": 0,
          "completed": 3,
          "failed": 0,
          "cancelled": 0,
          "total": 3
        }
      },
      {
        "id": "vs_abc456",
        "object": "vector_store",
        "created_at": 1699061776,
        "name": "Support FAQ v2",
        "bytes": 139920,
        "file_counts": {
          "in_progress": 0,
          "completed": 3,
          "failed": 0,
          "cancelled": 0,
          "total": 3
        }
      }
    ],
    "first_id": "vs_abc123",
    "last_id": "vs_abc456",
    "has_more": false
  }
}

Create - Vector Store

POST https://{endpoint}/openai/vector_stores?api-version=2025-04-01-preview
Create a vector store.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
api-versionqueryYesstringapi version

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Request Body

Content-Type: application/json
NameTypeDescriptionRequiredDefault
chunking_strategyautoChunkingStrategyRequestParam or staticChunkingStrategyRequestParamThe chunking strategy used to chunk the file(s). If not set, will use the auto strategy. Only applicable if file_ids is non-empty.No
expires_aftervectorStoreExpirationAfterThe expiration policy for a vector store.No
file_idsarrayA list of file IDs that the vector store should use. Useful for tools like file_search that can access files.No
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
No
namestringThe name of the vector store.No

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonvectorStoreObject

Examples

Example

Creates a vector store.
POST https://{endpoint}/openai/vector_stores?api-version=2025-04-01-preview

Responses: Status Code: 200
{
  "body": {
    "id": "vs_abc123",
    "object": "vector_store",
    "created_at": 1699061776,
    "name": "Support FAQ",
    "bytes": 139920,
    "file_counts": {
      "in_progress": 0,
      "completed": 3,
      "failed": 0,
      "cancelled": 0,
      "total": 3
    }
  }
}

Get - Vector Store

GET https://{endpoint}/openai/vector_stores/{vector_store_id}?api-version=2025-04-01-preview
Retrieves a vector store.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
vector_store_idpathYesstringThe ID of the vector store to retrieve.
api-versionqueryYesstringapi version

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonvectorStoreObject

Examples

Example

Retrieves a vector store.
GET https://{endpoint}/openai/vector_stores/{vector_store_id}?api-version=2025-04-01-preview

Responses: Status Code: 200
{
  "body": {
    "id": "vs_abc123",
    "object": "vector_store",
    "created_at": 1699061776
  }
}

Modify - Vector Store

POST https://{endpoint}/openai/vector_stores/{vector_store_id}?api-version=2025-04-01-preview
Modifies a vector store.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
vector_store_idpathYesstringThe ID of the vector store to modify.
api-versionqueryYesstringapi version

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Request Body

Content-Type: application/json
NameTypeDescriptionRequiredDefault
expires_aftervectorStoreExpirationAfterThe expiration policy for a vector store.No
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
No
namestringThe name of the vector store.No

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonvectorStoreObject

Examples

Example

Modifies a vector store.
POST https://{endpoint}/openai/vector_stores/{vector_store_id}?api-version=2025-04-01-preview

{
 "name": "Support FAQ"
}

Responses: Status Code: 200
{
  "body": {
    "id": "vs_abc123",
    "object": "vector_store",
    "created_at": 1699061776,
    "name": "Support FAQ",
    "bytes": 139920,
    "file_counts": {
      "in_progress": 0,
      "completed": 3,
      "failed": 0,
      "cancelled": 0,
      "total": 3
    }
  }
}

Delete - Vector Store

DELETE https://{endpoint}/openai/vector_stores/{vector_store_id}?api-version=2025-04-01-preview
Delete a vector store.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
vector_store_idpathYesstringThe ID of the vector store to delete.
api-versionqueryYesstringapi version

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsondeleteVectorStoreResponse

Examples

Example

Deletes a vector store.
DELETE https://{endpoint}/openai/vector_stores/{vector_store_id}?api-version=2025-04-01-preview

Responses: Status Code: 200
{
  "body": {
    "id": "vs_abc123",
    "object": "vector_store.deleted",
    "deleted": true
  }
}

List - Vector Store Files

GET https://{endpoint}/openai/vector_stores/{vector_store_id}/files?api-version=2025-04-01-preview
Returns a list of vector store files.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
vector_store_idpathYesstringThe ID of the vector store that the files belong to.
limitqueryNointegerA limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.
orderqueryNostring
Possible values: asc, desc
Sort order by the created_at timestamp of the objects. asc for ascending order and desc for descending order.
afterqueryNostringA cursor for use in pagination. after is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, ending with obj_foo, your subsequent call can include after=obj_foo in order to fetch the next page of the list.
beforequeryNostringA cursor for use in pagination. before is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, starting with obj_foo, your subsequent call can include before=obj_foo in order to fetch the previous page of the list.
filterqueryNostring
Possible values: in_progress, completed, failed, cancelled
Filter by file status. One of in_progress, completed, failed, cancelled.
api-versionqueryYesstringapi version

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonlistVectorStoreFilesResponse

Examples

Example

Returns a list of vector store files.
GET https://{endpoint}/openai/vector_stores/{vector_store_id}/files?api-version=2025-04-01-preview

Responses: Status Code: 200
{
  "body": {
    "object": "list",
    "data": [
      {
        "id": "file-abc123",
        "object": "vector_store.file",
        "created_at": 1699061776,
        "vector_store_id": "vs_abc123"
      },
      {
        "id": "file-abc456",
        "object": "vector_store.file",
        "created_at": 1699061776,
        "vector_store_id": "vs_abc123"
      }
    ],
    "first_id": "file-abc123",
    "last_id": "file-abc456",
    "has_more": false
  }
}

Create - Vector Store File

POST https://{endpoint}/openai/vector_stores/{vector_store_id}/files?api-version=2025-04-01-preview
Create a vector store file by attaching a File to a vector store.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
vector_store_idpathYesstringThe ID of the vector store for which to create a File.
api-versionqueryYesstringapi version

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Request Body

Content-Type: application/json
NameTypeDescriptionRequiredDefault
chunking_strategychunkingStrategyRequestParamThe chunking strategy used to chunk the file(s). If not set, will use the auto strategy.No
file_idstringA File ID that the vector store should use. Useful for tools like file_search that can access files.Yes

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonvectorStoreFileObject

Examples

Example

Create a vector store file by attaching a File to a vector store.
POST https://{endpoint}/openai/vector_stores/{vector_store_id}/files?api-version=2025-04-01-preview

{
 "file_id": "file-abc123"
}

Responses: Status Code: 200
{
  "body": {
    "id": "file-abc123",
    "object": "vector_store.file",
    "created_at": 1699061776,
    "usage_bytes": 1234,
    "vector_store_id": "vs_abcd",
    "status": "completed",
    "last_error": null
  }
}

Get - Vector Store File

GET https://{endpoint}/openai/vector_stores/{vector_store_id}/files/{file_id}?api-version=2025-04-01-preview
Retrieves a vector store file.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
vector_store_idpathYesstringThe ID of the vector store that the file belongs to.
file_idpathYesstringThe ID of the file being retrieved.
api-versionqueryYesstringapi version

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonvectorStoreFileObject

Examples

Example

Retrieves a vector store file.
GET https://{endpoint}/openai/vector_stores/{vector_store_id}/files/{file_id}?api-version=2025-04-01-preview

Responses: Status Code: 200
{
  "body": {
    "id": "file-abc123",
    "object": "vector_store.file",
    "created_at": 1699061776,
    "vector_store_id": "vs_abcd",
    "status": "completed",
    "last_error": null
  }
}

Delete - Vector Store File

DELETE https://{endpoint}/openai/vector_stores/{vector_store_id}/files/{file_id}?api-version=2025-04-01-preview
Delete a vector store file. This will remove the file from the vector store but the file itself won’t be deleted. To delete the file, use the delete file endpoint.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
vector_store_idpathYesstringThe ID of the vector store that the file belongs to.
file_idpathYesstringThe ID of the file to delete.
api-versionqueryYesstringapi version

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsondeleteVectorStoreFileResponse

Examples

Example

Delete a vector store file. This will remove the file from the vector store but the file itself won’t be deleted. To delete the file, use the delete file endpoint.
DELETE https://{endpoint}/openai/vector_stores/{vector_store_id}/files/{file_id}?api-version=2025-04-01-preview

Responses: Status Code: 200
{
  "body": {
    "id": "file_abc123",
    "object": "vector_store.file.deleted",
    "deleted": true
  }
}

Updatevectorstorefileattributes

POST https://{endpoint}/openai/vector_stores/{vector_store_id}/files/{file_id}?api-version=2025-04-01-preview
Update attributes on a vector store file.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
vector_store_idpathYesstringThe ID of the vector store the file belongs to.
file_idpathYesstringThe ID of the file to update attributes.

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Request Body

Content-Type: application/json
NameTypeDescriptionRequiredDefault
attributesVectorStoreFileAttributesSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters, booleans, or numbers.
Yes

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonvectorStoreFileObject

Retrieve vector store file content

GET https://{endpoint}/openai/vector_stores/{vector_store_id}/files/{file_id}/content?api-version=2025-04-01-preview
Retrieve the parsed contents of a vector store file.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
vector_store_idpathYesstringThe ID of the vector store.
file_idpathYesstringThe ID of the file within the vector store.

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonVectorStoreFileContentResponse

Search vector store

POST https://{endpoint}/openai/vector_stores/{vector_store_id}/search?api-version=2025-04-01-preview
Search a vector store for relevant chunks based on a query and file attributes filter.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
vector_store_idpathYesstringThe ID of the vector store to search.

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Request Body

Content-Type: application/json
NameTypeDescriptionRequiredDefault
filtersComparisonFilter or CompoundFilterA filter to apply based on file attributes.No
max_num_resultsintegerThe maximum number of results to return. This number should be between 1 and 50 inclusive.No10
querystring or arrayA query string for a searchYes
ranking_optionsobjectRanking options for search.No
└─ rankerenum
Possible values: auto, default-2024-11-15
No
└─ score_thresholdnumberNo0
rewrite_querybooleanWhether to rewrite the natural language query for vector search.NoFalse

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonVectorStoreSearchResultsPage

Create - Vector Store File Batch

POST https://{endpoint}/openai/vector_stores/{vector_store_id}/file_batches?api-version=2025-04-01-preview
Create a vector store file batch.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
vector_store_idpathYesstringThe ID of the vector store for which to create a File Batch.
api-versionqueryYesstringapi version

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Request Body

Content-Type: application/json
NameTypeDescriptionRequiredDefault
chunking_strategychunkingStrategyRequestParamThe chunking strategy used to chunk the file(s). If not set, will use the auto strategy.No
file_idsarrayA list of File IDs that the vector store should use. Useful for tools like file_search that can access files.Yes

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonvectorStoreFileBatchObject

Examples

Example

Create a vector store file batch.
POST https://{endpoint}/openai/vector_stores/{vector_store_id}/file_batches?api-version=2025-04-01-preview

{
 "file_ids": [
  "file-abc123",
  "file-abc456"
 ]
}

Responses: Status Code: 200
{
  "id": "vsfb_abc123",
  "object": "vector_store.file_batch",
  "created_at": 1699061776,
  "vector_store_id": "vs_abc123",
  "status": "in_progress",
  "file_counts": {
    "in_progress": 1,
    "completed": 1,
    "failed": 0,
    "cancelled": 0,
    "total": 0
  }
}

Get - Vector Store File Batch

GET https://{endpoint}/openai/vector_stores/{vector_store_id}/file_batches/{batch_id}?api-version=2025-04-01-preview
Retrieves a vector store file batch.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
vector_store_idpathYesstringThe ID of the vector store that the file batch belongs to.
batch_idpathYesstringThe ID of the file batch being retrieved.
api-versionqueryYesstringapi version

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonvectorStoreFileBatchObject

Examples

Example

Retrieves a vector store file batch.
GET https://{endpoint}/openai/vector_stores/{vector_store_id}/file_batches/{batch_id}?api-version=2025-04-01-preview

Responses: Status Code: 200
{
  "body": {
    "id": "vsfb_abc123",
    "object": "vector_store.file_batch",
    "created_at": 1699061776,
    "vector_store_id": "vs_abc123",
    "status": "in_progress",
    "file_counts": {
      "in_progress": 1,
      "completed": 1,
      "failed": 0,
      "cancelled": 0,
      "total": 0
    }
  }
}

Cancel - Vector Store File Batch

POST https://{endpoint}/openai/vector_stores/{vector_store_id}/file_batches/{batch_id}/cancel?api-version=2025-04-01-preview
Cancel a vector store file batch. This attempts to cancel the processing of files in this batch as soon as possible.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
vector_store_idpathYesstringThe ID of the vector store that the file batch belongs to.
batch_idpathYesstringThe ID of the file batch to cancel.
api-versionqueryYesstringapi version

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonvectorStoreFileBatchObject

Examples

Example

Cancel a vector store file batch. This attempts to cancel the processing of files in this batch as soon as possible.
POST https://{endpoint}/openai/vector_stores/{vector_store_id}/file_batches/{batch_id}/cancel?api-version=2025-04-01-preview

Responses: Status Code: 200
{
  "body": {
    "id": "vsfb_abc123",
    "object": "vector_store.file_batch",
    "created_at": 1699061776,
    "vector_store_id": "vs_abc123",
    "status": "cancelling",
    "file_counts": {
      "in_progress": 12,
      "completed": 3,
      "failed": 0,
      "cancelled": 0,
      "total": 15
    }
  }
}

List - Vector Store File Batch Files

GET https://{endpoint}/openai/vector_stores/{vector_store_id}/file_batches/{batch_id}/files?api-version=2025-04-01-preview
Returns a list of vector store files in a batch.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
vector_store_idpathYesstringThe ID of the vector store that the files belong to.
batch_idpathYesstringThe ID of the file batch that the files belong to.
limitqueryNointegerA limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.
orderqueryNostring
Possible values: asc, desc
Sort order by the created_at timestamp of the objects. asc for ascending order and desc for descending order.
afterqueryNostringA cursor for use in pagination. after is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, ending with obj_foo, your subsequent call can include after=obj_foo in order to fetch the next page of the list.
beforequeryNostringA cursor for use in pagination. before is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, starting with obj_foo, your subsequent call can include before=obj_foo in order to fetch the previous page of the list.
filterqueryNostring
Possible values: in_progress, completed, failed, cancelled
Filter by file status. One of in_progress, completed, failed, cancelled.
api-versionqueryYesstringapi version

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonlistVectorStoreFilesResponse

Examples

Example

Returns a list of vector store files.
GET https://{endpoint}/openai/vector_stores/{vector_store_id}/file_batches/{batch_id}/files?api-version=2025-04-01-preview

Responses: Status Code: 200
{
  "body": {
    "object": "list",
    "data": [
      {
        "id": "file-abc123",
        "object": "vector_store.file",
        "created_at": 1699061776,
        "vector_store_id": "vs_abc123"
      },
      {
        "id": "file-abc456",
        "object": "vector_store.file",
        "created_at": 1699061776,
        "vector_store_id": "vs_abc123"
      }
    ],
    "first_id": "file-abc123",
    "last_id": "file-abc456",
    "has_more": false
  }
}

Create - Realtimesession

POST https://{endpoint}/openai/realtimeapi/sessions?api-version=2025-04-01-preview
Create an ephemeral API token for use in client-side applications with the Realtime API. Can be configured with the same session parameters as the session.update client event. It responds with a session object, plus a client_secret key which contains a usable ephemeral API token that can be used to authenticate browser clients for the Realtime API.

Request Body

Content-Type: application/json
NameTypeDescriptionRequiredDefault
input_audio_formatenumThe format of input audio. Options are pcm16, g711_ulaw, or g711_alaw.
For pcm16, input audio must be 16-bit PCM at a 24kHz sample rate, single channel (mono), and little-endian byte order.

Possible values: pcm16, g711_ulaw, g711_alaw
No
input_audio_noise_reductionobjectConfiguration for input audio noise reduction. This can be set to null to turn off.
Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.
Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.
No
└─ typeenumType of noise reduction. near_field is for close-talking microphones such as headphones, far_field is for far-field microphones such as laptop or conference room microphones.

Possible values: near_field, far_field
No
input_audio_transcriptionobjectConfiguration for input audio transcription, defaults to off and can be set to null to turn off once on. Input audio transcription isn’t native to the model, since the model consumes audio directly. Transcription runs asynchronously through the Transcriptions endpoint and should be treated as guidance of input audio content rather than precisely what the model heard. The client can optionally set the language and prompt for transcription, these offer additional guidance to the transcription service.
No
└─ languagestringThe language of the input audio. Supplying the input language in ISO-639-1 (e.g. en) format improves accuracy and latency.
No
└─ modelstringThe model to use for transcription, current options are gpt-4o-transcribe, gpt-4o-transcribe-diarize, gpt-4o-mini-transcribe, gpt-4o-mini-transcribe-2025-12-15, and whisper-1.
No
└─ promptstringAn optional text to guide the model’s style or continue a previous audio segment.
For whisper-1, the prompt is a list of keywords.
For gpt-4o-transcribe models, the prompt is a free text string, for example “expect words related to technology”.
No
instructionsstringThe default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. “be extremely succinct”, “act friendly”, “here are examples of good responses”) and on audio behavior (e.g. “talk quickly”, “inject emotion into your voice”, “laugh frequently”). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.

Note that the server sets default instructions which will be used if this field isn’t set and are visible in the session.created event at the start of the session.
No
max_response_output_tokensinteger or stringMaximum number of output tokens for a single assistant response, inclusive of tool calls. Provide an integer between 1 and 4096 to limit output tokens, or inf for the maximum available tokens for a given model. Defaults to inf.
No
modalitiesThe set of modalities the model can respond with. To disable audio, set this to [“text”].
No
modelstringThe name of the deployment used for this session.
No
output_audio_formatenumThe format of output audio. Options are pcm16, g711_ulaw, or g711_alaw.
For pcm16, output audio is sampled at a rate of 24kHz.

Possible values: pcm16, g711_ulaw, g711_alaw
No
temperaturenumberSampling temperature for the model, limited to [0.6, 1.2]. For audio models a temperature of 0.8 is highly recommended for best performance.
No0.8
tool_choicestringHow the model chooses tools. Options are auto, none, required, or specify a function.
Noauto
toolsarrayTools (functions) available to the model.No
turn_detectionobjectConfiguration for turn detection, ether Server VAD or Semantic VAD. This can be set to null to turn off, in which case the client must manually trigger model response.
Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.
Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with uhhm, the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.
No
└─ create_responsebooleanWhether or not to automatically generate a response when a VAD stop event occurs.
NoTrue
└─ eagernessenumUsed only for semantic_vad mode. The eagerness of the model to respond. low will wait longer for the user to continue speaking, high will respond more quickly. auto is the default and is equivalent to medium.

Possible values: low, medium, high, auto
No
└─ interrupt_responsebooleanWhether or not to automatically interrupt any ongoing response with output to the default conversation (i.e. conversation of auto) when a VAD start event occurs.
NoTrue
└─ prefix_padding_msintegerUsed only for server_vad mode. Amount of audio to include before the VAD detected speech (in milliseconds). Defaults to 300ms.
No
└─ silence_duration_msintegerUsed only for server_vad mode. Duration of silence to detect speech stop (in milliseconds). Defaults to 500ms. With shorter values the model will respond more quickly, but may jump in on short pauses from the user.
No
└─ thresholdnumberUsed only for server_vad mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A higher threshold will require louder audio to activate the model, and thus might perform better in noisy environments.
No
└─ typeenumType of turn detection.

Possible values: server_vad, semantic_vad
No
voiceVoiceIdsSharedNo

Responses

Status Code: 200 Description: Session created successfully.
Content-TypeTypeDescription
application/jsonRealtimeSessionCreateResponse

Create - Transcriptionrealtimesession

POST https://{endpoint}/openai/realtimeapi/transcription_sessions?api-version=2025-04-01-preview
Create an ephemeral API token for use in client-side applications with the Realtime API specifically for realtime transcriptions. Can be configured with the same session parameters as the transcription_session.update client event. It responds with a session object, plus a client_secret key which contains a usable ephemeral API token that can be used to authenticate browser clients for the Realtime API.

Request Body

Content-Type: application/json
NameTypeDescriptionRequiredDefault
includearrayThe set of items to include in the transcription. Current available items are:
- item.input_audio_transcription.logprobs
No
input_audio_formatenumThe format of input audio. Options are pcm16, g711_ulaw, or g711_alaw.
For pcm16, input audio must be 16-bit PCM at a 24kHz sample rate, single channel (mono), and little-endian byte order.

Possible values: pcm16, g711_ulaw, g711_alaw
No
input_audio_noise_reductionobjectConfiguration for input audio noise reduction. This can be set to null to turn off.
Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.
Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.
No
└─ typeenumType of noise reduction. near_field is for close-talking microphones such as headphones, far_field is for far-field microphones such as laptop or conference room microphones.

Possible values: near_field, far_field
No
input_audio_transcriptionobjectConfiguration for input audio transcription. The client can optionally set the language and prompt for transcription, these offer additional guidance to the transcription service.
No
└─ languagestringThe language of the input audio. Supplying the input language in ISO-639-1 (e.g. en) format improves accuracy and latency.
No
└─ modelenumThe model to use for transcription, current options are gpt-4o-transcribe, gpt-4o-mini-transcribe, gpt-4o-mini-transcribe-2025-12-15, and whisper-1.

Possible values: gpt-4o-transcribe, gpt-4o-mini-transcribe, gpt-4o-mini-transcribe-2025-12-15, whisper-1
No
└─ promptstringAn optional text to guide the model’s style or continue a previous audio segment.
For whisper-1, the prompt is a list of keywords.
For gpt-4o-transcribe models, the prompt is a free text string, for example “expect words related to technology”.
No
modalitiesThe set of modalities the model can respond with. To disable audio, set this to [“text”].
No
turn_detectionobjectConfiguration for turn detection, ether Server VAD or Semantic VAD. This can be set to null to turn off, in which case the client must manually trigger model response.
Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.
Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with uhhm, the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.
No
└─ create_responsebooleanWhether or not to automatically generate a response when a VAD stop event occurs. Not available for transcription sessions.
NoTrue
└─ eagernessenumUsed only for semantic_vad mode. The eagerness of the model to respond. low will wait longer for the user to continue speaking, high will respond more quickly. auto is the default and is equivalent to medium.

Possible values: low, medium, high, auto
No
└─ interrupt_responsebooleanWhether or not to automatically interrupt any ongoing response with output to the default conversation (i.e. conversation of auto) when a VAD start event occurs. Not available for transcription sessions.
NoTrue
└─ prefix_padding_msintegerUsed only for server_vad mode. Amount of audio to include before the VAD detected speech (in milliseconds). Defaults to 300ms.
No
└─ silence_duration_msintegerUsed only for server_vad mode. Duration of silence to detect speech stop (in milliseconds). Defaults to 500ms. With shorter values the model will respond more quickly, but may jump in on short pauses from the user.
No
└─ thresholdnumberUsed only for server_vad mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A higher threshold will require louder audio to activate the model, and thus might perform better in noisy environments.
No
└─ typeenumType of turn detection.

Possible values: server_vad, semantic_vad
No

Responses

Status Code: 200 Description: Session created successfully.
Content-TypeTypeDescription
application/jsonRealtimeTranscriptionSessionCreateResponse

Responses

POST https://{endpoint}/openai/responses?api-version=2025-04-01-preview
Creates a model response.

Request Body

Content-Type: application/json
NameTypeDescriptionRequiredDefault
includearrayNo
inputstring or arrayModel inputsYes
instructionsstringInserts a system (or developer) message as the first item in the model’s context.

When using along with previous_response_id, the instructions from a previous response will be not be carried over to the next response. This makes it simple to swap out system (or developer) messages in new responses.
No
max_output_tokensintegerAn upper bound for the number of tokens that can be generated for a response, including visible output tokens and conversation state.
No
parallel_tool_callsbooleanWhether to allow the model to run tool calls in parallel.
NoTrue
previous_response_idstringThe unique ID of the previous response to the model. Use this to create multi-turn conversations. Learn more about conversation state.
No
reasoningReasoningConfiguration options for reasoning models.No
storebooleanWhether to store the generated model response for later retrieval via API.
NoTrue
streambooleanIf set to true, the model response data will be streamed to the client as it is generated using server-sent events.
See the Streaming section below for more information.
NoFalse
textobjectConfiguration options for a text response from the model. Can be plain text or structured JSON data. Learn more:
- Text inputs and outputs
- Structured Outputs
No
└─ formatTextResponseFormatConfigurationAn object specifying the format that the model must output.

Configuring { "type": "json_schema" } enables Structured Outputs, which ensures the model matches your supplied JSON schema.

The default format is { "type": "text" } with no additional options.

Not recommended for gpt-4o and newer models:

Setting to { "type": "json_object" } enables the older JSON mode, which ensures the message the model generates is valid JSON. Using json_schema is preferred for models that support it.
No
tool_choiceToolChoiceOptions or ToolChoiceTypes or ToolChoiceFunctionHow the model should select which tool (or tools) to use when generating a response. See the tools parameter to see how to specify which tools the model can call.
No
toolsarrayAn array of tools the model may call while generating a response. You can specify which tool to use by setting the tool_choice parameter.

The two categories of tools you can provide the model are:

- Built-in tools: Tools that are provided by OpenAI that extend the
model’s capabilities
No
truncationenumThe truncation strategy to use for the model response.
- auto: If the context of this response and previous ones exceeds the model’s context window size, the model will truncate the response to fit the context window by dropping input items in the middle of the conversation.
- disabled (default): If a model response will exceed the context window size for a model, the request will fail with a 400 error.

Possible values: auto, disabled
No

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonresponse
text/event-streamresponseStreamEvent
Status Code: default Description: Service unavailable
Content-TypeTypeDescription
application/jsonerrorResponse

Responses API - input items

GET https://{endpoint}/openai/responses/{response_id}?api-version=2025-04-01-preview
Retrieves a model response with the given ID.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
response_idpathYesstringThe ID of the response to retrieve.
includequeryNoarrayAdditional fields to include in the response. See the include parameter for Response creation above for more information.

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonresponse
Status Code: default Description: Service unavailable
Content-TypeTypeDescription
application/jsonerrorResponse

Delete response

DELETE https://{endpoint}/openai/responses/{response_id}?api-version=2025-04-01-preview
Deletes a model response with the given ID.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
response_idpathYesstringThe ID of the response to delete.

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Responses

Status Code: 200 Description: OK Status Code: 404 Description: Not Found
Content-TypeTypeDescription
application/jsonerror
Status Code: default Description: Service unavailable
Content-TypeTypeDescription
application/jsonerrorResponse

Responses API - response item list

GET https://{endpoint}/openai/responses/{response_id}/input_items?api-version=2025-04-01-preview
Returns a list of input items for a given response.

URI Parameters

NameInRequiredTypeDescription
endpointpathYesstring urlSupported Azure OpenAI endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace “aoairesource” with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
response_idpathYesstringThe ID of the response to retrieve input items for.
limitqueryNointegerA limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.
orderqueryNostring
Possible values: asc, desc
The order to return the input items in. Default is asc.
- asc: Return the input items in ascending order.
- desc: Return the input items in descending order.
afterqueryNostringAn item ID to list items after, used in pagination.
beforequeryNostringAn item ID to list items before, used in pagination.

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.
NameRequiredTypeDescription
AuthorizationTruestringExample: Authorization: Bearer {Azure_OpenAI_Auth_Token}

To generate an auth token using Azure CLI: az account get-access-token --resource https://cognitiveservices.azure.com

Type: oauth2
Authorization Url: https://login.microsoftonline.com/common/oauth2/v2.0/authorize
scope: https://ai.azure.com/.default
api-keyTruestringProvide Azure OpenAI API key here

Responses

Status Code: 200 Description: OK
Content-TypeTypeDescription
application/jsonresponseItemList

Components

errorResponse

NameTypeDescriptionRequiredDefault
errorerrorNo

errorBase

NameTypeDescriptionRequiredDefault
codestringNo
messagestringNo

error

NameTypeDescriptionRequiredDefault
inner_errorinnerErrorInner error with additional details.No
paramstringNo
typestringNo

innerError

Inner error with additional details.
NameTypeDescriptionRequiredDefault
codeinnerErrorCodeError codes for the inner error object.No
content_filter_resultscontentFilterPromptResultsInformation about the content filtering category (hate, sexual, violence, self_harm), if it has been detected, as well as the severity level (very_low, low, medium, high-scale that determines the intensity and risk level of harmful content) and if it has been filtered or not. Information about jailbreak content and profanity, if it has been detected, and if it has been filtered or not. And information about customer block list, if it has been filtered and its id.No

innerErrorCode

Error codes for the inner error object.
PropertyValue
DescriptionError codes for the inner error object.
Typestring
ValuesResponsibleAIPolicyViolation

dalleErrorResponse

NameTypeDescriptionRequiredDefault
errordalleErrorNo

dalleError

NameTypeDescriptionRequiredDefault
inner_errordalleInnerErrorInner error with additional details.No
paramstringNo
typestringNo

dalleInnerError

Inner error with additional details.
NameTypeDescriptionRequiredDefault
codeinnerErrorCodeError codes for the inner error object.No
content_filter_resultsdalleFilterResultsInformation about the content filtering category (hate, sexual, violence, self_harm), if it has been detected, as well as the severity level (very_low, low, medium, high-scale that determines the intensity and risk level of harmful content) and if it has been filtered or not. Information about jailbreak content and profanity, if it has been detected, and if it has been filtered or not. And information about customer block list, if it has been filtered and its id.No
revised_promptstringThe prompt that was used to generate the image, if there was any revision to the prompt.No

contentFilterCompletionTextSpan

Describes a span within generated completion text. Offset 0 is the first UTF32 code point of the completion text.
NameTypeDescriptionRequiredDefault
completion_end_offsetintegerOffset of the first UTF32 code point which is excluded from the span. This field is always equal to completion_start_offset for empty spans. This field is always larger than completion_start_offset for non-empty spans.Yes
completion_start_offsetintegerOffset of the UTF32 code point which begins the span.Yes

contentFilterResultBase

NameTypeDescriptionRequiredDefault
filteredbooleanYes

contentFilterSeverityResult

NameTypeDescriptionRequiredDefault
filteredbooleanYes
severitystringNo

contentFilterDetectedResult

NameTypeDescriptionRequiredDefault
detectedbooleanNo
filteredbooleanYes

contentFilterDetectedWithCitationResult

NameTypeDescriptionRequiredDefault
citationobjectNo
└─ URLstringNo
└─ licensestringNo

contentFilterDetectedWithCompletionTextSpansResult

NameTypeDescriptionRequiredDefault
detailsarrayNo

contentFilterIdResult

NameTypeDescriptionRequiredDefault
filteredbooleanYes
idstringNo

contentFilterResultsBase

Information about the content filtering results.
NameTypeDescriptionRequiredDefault
custom_blocklistscontentFilterDetailedResultsContent filtering results with a detail of content filter ids for the filtered segments.No
errorerrorBaseNo
hatecontentFilterSeverityResultNo
profanitycontentFilterDetectedResultNo
self_harmcontentFilterSeverityResultNo
sexualcontentFilterSeverityResultNo
violencecontentFilterSeverityResultNo

contentFilterPromptResults

Information about the content filtering category (hate, sexual, violence, self_harm), if it has been detected, as well as the severity level (very_low, low, medium, high-scale that determines the intensity and risk level of harmful content) and if it has been filtered or not. Information about jailbreak content and profanity, if it has been detected, and if it has been filtered or not. And information about customer block list, if it has been filtered and its id.
NameTypeDescriptionRequiredDefault
custom_blocklistscontentFilterDetailedResultsContent filtering results with a detail of content filter ids for the filtered segments.No
errorerrorBaseNo
hatecontentFilterSeverityResultNo
indirect_attackcontentFilterDetectedResultNo
jailbreakcontentFilterDetectedResultNo
profanitycontentFilterDetectedResultNo
self_harmcontentFilterSeverityResultNo
sexualcontentFilterSeverityResultNo
violencecontentFilterSeverityResultNo

contentFilterChoiceResults

Information about the content filtering category (hate, sexual, violence, self_harm), if it has been detected, as well as the severity level (very_low, low, medium, high-scale that determines the intensity and risk level of harmful content) and if it has been filtered or not. Information about third party text and profanity, if it has been detected, and if it has been filtered or not. And information about customer block list, if it has been filtered and its id.
NameTypeDescriptionRequiredDefault
custom_blocklistscontentFilterDetailedResultsContent filtering results with a detail of content filter ids for the filtered segments.No
errorerrorBaseNo
hatecontentFilterSeverityResultNo
profanitycontentFilterDetectedResultNo
protected_material_codecontentFilterDetectedWithCitationResultNo
protected_material_textcontentFilterDetectedResultNo
self_harmcontentFilterSeverityResultNo
sexualcontentFilterSeverityResultNo
ungrounded_materialcontentFilterDetectedWithCompletionTextSpansResultNo
violencecontentFilterSeverityResultNo

contentFilterDetailedResults

Content filtering results with a detail of content filter ids for the filtered segments.
NameTypeDescriptionRequiredDefault
detailsarrayNo
filteredbooleanYes

promptFilterResult

Content filtering results for a single prompt in the request.
NameTypeDescriptionRequiredDefault
content_filter_resultscontentFilterPromptResultsInformation about the content filtering category (hate, sexual, violence, self_harm), if it has been detected, as well as the severity level (very_low, low, medium, high-scale that determines the intensity and risk level of harmful content) and if it has been filtered or not. Information about jailbreak content and profanity, if it has been detected, and if it has been filtered or not. And information about customer block list, if it has been filtered and its id.No
prompt_indexintegerNo

promptFilterResults

Content filtering results for zero or more prompts in the request. In a streaming request, results for different prompts may arrive at different times or in different orders. No properties defined for this component.

dalleContentFilterResults

Information about the content filtering results.

dalleFilterResults

Information about the content filtering category (hate, sexual, violence, self_harm), if it has been detected, as well as the severity level (very_low, low, medium, high-scale that determines the intensity and risk level of harmful content) and if it has been filtered or not. Information about jailbreak content and profanity, if it has been detected, and if it has been filtered or not. And information about customer block list, if it has been filtered and its id.
NameTypeDescriptionRequiredDefault
custom_blocklistscontentFilterDetailedResultsContent filtering results with a detail of content filter ids for the filtered segments.No
hatecontentFilterSeverityResultNo
jailbreakcontentFilterDetectedResultNo
profanitycontentFilterDetectedResultNo
self_harmcontentFilterSeverityResultNo
sexualcontentFilterSeverityResultNo
violencecontentFilterSeverityResultNo

chatCompletionsRequestCommon

NameTypeDescriptionRequiredDefault
frequency_penaltynumberNumber between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.No0
logit_biasobjectModify the likelihood of specified tokens appearing in the completion. Accepts a json object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect varies per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.No
max_completion_tokensintegerAn upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.No
max_tokensintegerThe maximum number of tokens allowed for the generated answer. By default, the number of tokens the model can return will be (4096 - prompt tokens). This isn’t compatible with o1 series models.No4096
metadataobjectDeveloper-defined tags and values used for filtering completions in the stored completions dashboard.No
presence_penaltynumberNumber between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.No0
stopstring or arrayUp to 4 sequences where the API stops generating further tokens.No
storebooleanWhether or not to store the output of this chat completion request for use in our model distillation or evaluation products.No
streambooleanIf set, partial message deltas are sent, like in ChatGPT. Tokens are sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.NoFalse
temperaturenumberWhat sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
We generally recommend altering this or top_p but not both.
No1
top_pnumberAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
We generally recommend altering this or temperature but not both.
No1
userstringA unique identifier representing your end-user, which can help Azure OpenAI to monitor and detect abuse.No

createCompletionRequest

NameTypeDescriptionRequiredDefault
best_ofintegerGenerates best_of completions server-side and returns the “best” (the one with the highest log probability per token). Results can’t be streamed.

When used with n, best_of controls the number of candidate completions and n specifies how many to return. best_of must be greater than n.

Note: Because this parameter generates many completions, it can quickly consume your token quota. Use carefully and ensure that you have reasonable settings for max_tokens and stop.
No1
echobooleanEcho back the prompt in addition to the completion
NoFalse
frequency_penaltynumberNumber between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.
No0
logit_biasobjectModify the likelihood of specified tokens appearing in the completion.

Accepts a JSON object that maps tokens (specified by their token ID in the GPT tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect varies per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.

As an example, you can pass {"50256": -100} to prevent the <|endoftext|> token from being generated.
NoNone
logprobsintegerInclude the log probabilities on the logprobs most likely output tokens, as well the chosen tokens. For example, if logprobs is 5, the API will return a list of the 5 most likely tokens. The API will always return the logprob of the sampled token, so there may be up to logprobs+1 elements in the response.

The maximum value for logprobs is 5.
NoNone
max_tokensintegerThe maximum number of tokensthat can be generated in the completion.

The token count of your prompt plus max_tokens can’t exceed the model’s context length.
No16
nintegerHow many completions to generate for each prompt.

Note: Because this parameter generates many completions, it can quickly consume your token quota. Use carefully and ensure that you have reasonable settings for max_tokens and stop.
No1
presence_penaltynumberNumber between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.
No0
promptstring or arrayThe prompt(s) to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays.

Note that <|endoftext|> is the document separator that the model sees during training, so if a prompt isn’t specified the model will generate as if from the beginning of a new document.
Yes
seedintegerIf specified, our system makes a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

Determinism isn’t guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
No
stopstring or arrayUp to 4 sequences where the API stops generating further tokens. The returned text won’t contain the stop sequence.
No
streambooleanWhether to stream back partial progress. If set, tokens are sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. Example Python code.
NoFalse
suffixstringThe suffix that comes after a completion of inserted text.

This parameter is only supported for gpt-3.5-turbo-instruct.
NoNone
temperaturenumberWhat sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

We generally recommend altering this or top_p but not both.
No1
top_pnumberAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.
No1
userstringA unique identifier representing your end-user, which can help to monitor and detect abuse.
No

createCompletionResponse

Represents a completion response from the API. Note: both the streamed and non-streamed response objects share the same shape (unlike the chat endpoint).
NameTypeDescriptionRequiredDefault
choicesarrayThe list of completion choices the model generated for the input prompt.Yes
createdintegerThe Unix timestamp (in seconds) of when the completion was created.Yes
idstringA unique identifier for the completion.Yes
modelstringThe model used for completion.Yes
objectenumThe object type, which is always “text_completion”
Possible values: text_completion
Yes
prompt_filter_resultspromptFilterResultsContent filtering results for zero or more prompts in the request. In a streaming request, results for different prompts may arrive at different times or in different orders.No
system_fingerprintstringThis fingerprint represents the backend configuration that the model runs with.

Can be used in conjunction with the seed request parameter to understand when backend changes have been made that might impact determinism.
No
usagecompletionUsageUsage statistics for the completion request.No

createChatCompletionRequest

NameTypeDescriptionRequiredDefault
audioobjectParameters for audio output. Required when audio output is requested with modalities: ["audio"].No
└─ formatenumSpecifies the output audio format. Must be one of wav, mp3, flac, opus, or pcm16.

Possible values: wav, mp3, flac, opus, pcm16
No
└─ voiceenumSpecifies the voice type. Supported voices are alloy, echo, fable, onyx, nova, and shimmer.

Possible values: alloy, echo, fable, onyx, nova, shimmer
No
data_sourcesarrayThe configuration entries for Azure OpenAI chat extensions that use them.
This additional specification is only compatible with Azure OpenAI.
No
frequency_penaltynumberNumber between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.
No0
function_callstring or chatCompletionFunctionCallOptionDeprecated in favor of tool_choice.

Controls which (if any) function is called by the model.
none means the model won’t call a function and instead generates a message.
auto means the model can pick between generating a message or calling a function.
Specifying a particular function via {"name": "my_function"} forces the model to call that function.

none is the default when no functions are present. auto is the default if functions are present.
No
functionsarrayDeprecated in favor of tools.

A list of functions the model may generate JSON inputs for.
No
logit_biasobjectModify the likelihood of specified tokens appearing in the completion.

Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect varies per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.
NoNone
logprobsbooleanWhether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message.NoFalse
max_completion_tokensintegerAn upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.No
max_tokensintegerThe maximum number of tokens that can be generated in the chat completion.

The total length of input tokens and generated tokens is limited by the model’s context length.
No
messagesarrayA list of messages comprising the conversation so far. Example Python code.Yes
metadataobjectDeveloper-defined tags and values used for filtering completions in the stored completions dashboard.No
modalitiesChatCompletionModalitiesOutput types that you would like the model to generate for this request.
Most models are capable of generating text, which is the default:

["text"]

The gpt-4o-audio-preview model can also be used to generate audio. To request that this model generate both text and audio responses, you can use:

["text", "audio"]
No
nintegerHow many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.No1
parallel_tool_callsParallelToolCallsWhether to enable parallel function calling during tool use.NoTrue
predictionPredictionContentConfiguration for a Predicted Output, which can greatly improve response times when large parts of the model response are known ahead of time. This is most common when you are regenerating a file with only minor changes to most of the content.No
presence_penaltynumberNumber between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.
No0
reasoning_effortenumo1 models only

Constrains effort on reasoning for reasoning models.

Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
Possible values: low, medium, high
No
response_formatResponseFormatText or ResponseFormatJsonObject or ResponseFormatJsonSchemaAn object specifying the format that the model must output. Compatible with GPT-4o, GPT-4o mini, GPT-4 Turbo and all GPT-3.5 Turbo models newer than gpt-3.5-turbo-1106.

Setting to { "type": "json_schema", "json_schema": {...} } enables Structured Outputs which guarantee the model matches your supplied JSON schema.

Setting to { "type": "json_object" } enables JSON mode, which guarantees the message the model generates is valid JSON.

Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly “stuck” request. Also note that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.
No
seedintegerThis feature is in Beta.
If specified, our system makes a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
Determinism isn’t guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
No
stopstring or arrayUp to 4 sequences where the API stops generating further tokens.
No
storebooleanWhether or not to store the output of this chat completion request for use in our model distillation or evaluation products.No
streambooleanIf set, partial message deltas are sent, like in ChatGPT. Tokens are sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. Example Python code.
NoFalse
stream_optionschatCompletionStreamOptionsOptions for streaming response. Only set this when you set stream: true.
NoNone
temperaturenumberWhat sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

We generally recommend altering this or top_p but not both.
No1
tool_choicechatCompletionToolChoiceOptionControls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present.No
toolsarrayA list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. A max of 128 functions are supported.
No
top_logprobsintegerAn integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.No
top_pnumberAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.
No1
userstringA unique identifier representing your end-user, which can help to monitor and detect abuse.
No
user_security_contextuserSecurityContextUser security context contains several parameters that describe the AI application itself, and the end user that interacts with the AI application. These fields assist your security operations teams to investigate and mitigate security incidents by providing a comprehensive approach to protecting your AI applications. Learn more about protecting AI applications using Microsoft Defender for Cloud.No

userSecurityContext

User security context contains several parameters that describe the AI application itself, and the end user that interacts with the AI application. These fields assist your security operations teams to investigate and mitigate security incidents by providing a comprehensive approach to protecting your AI applications. Learn more about protecting AI applications using Microsoft Defender for Cloud.
NameTypeDescriptionRequiredDefault
application_namestringThe name of the application. Sensitive personal information should not be included in this field.No
end_user_idstringThis identifier is the Microsoft Entra ID (formerly Azure Active Directory) user object ID used to authenticate end-users within the generative AI application. Sensitive personal information should not be included in this field.No
end_user_tenant_idstringThe Microsoft 365 tenant ID the end user belongs to. It’s required when the generative AI application is multi tenant.No
source_ipstringCaptures the original client’s IP address, accepting both IPv4 and IPv6 formats.No

chatCompletionFunctions

NameTypeDescriptionRequiredDefault
descriptionstringA description of what the function does, used by the model to choose when and how to call the function.No
namestringThe name of the function to be called. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.Yes
parametersFunctionParametersThe parameters the functions accepts, described as a JSON Schema object. See the guide for examples, and the JSON Schema reference for documentation about the format.

Omitting parameters defines a function with an empty parameter list.
No

chatCompletionFunctionCallOption

Specifying a particular function via {"name": "my_function"} forces the model to call that function.
NameTypeDescriptionRequiredDefault
namestringThe name of the function to call.Yes

chatCompletionFunctionParameters

The parameters the functions accepts, described as a JSON Schema object. See the guide/ for examples, and the JSON Schema reference for documentation about the format. No properties defined for this component.

chatCompletionRequestMessage

This component can be one of the following:

ChatCompletionRequestDeveloperMessage

Developer-provided instructions that the model should follow, regardless of messages sent by the user. With o1 models and newer, developer messages replace the previous system messages.
NameTypeDescriptionRequiredDefault
contentstring or arrayThe contents of the developer message.Yes
namestringAn optional name for the participant. Provides the model information to differentiate between participants of the same role.No
roleenumThe role of the messages author, in this case developer.
Possible values: developer
Yes

chatCompletionRequestSystemMessage

NameTypeDescriptionRequiredDefault
contentstring or arrayThe contents of the system message.Yes
namestringAn optional name for the participant. Provides the model information to differentiate between participants of the same role.No
roleenumThe role of the messages author, in this case system.
Possible values: system
Yes

chatCompletionRequestUserMessage

NameTypeDescriptionRequiredDefault
contentstring or arrayThe contents of the user message.
Yes
namestringAn optional name for the participant. Provides the model information to differentiate between participants of the same role.No
roleenumThe role of the messages author, in this case user.
Possible values: user
Yes

chatCompletionRequestAssistantMessage

NameTypeDescriptionRequiredDefault
contentstring or arrayThe contents of the assistant message. Required unless tool_calls or function_call is specified.
No
function_callobjectDeprecated and replaced by tool_calls. The name and arguments of a function that should be called, as generated by the model.No
└─ argumentsstringThe arguments to call the function with, as generated by the model in JSON format. Note that the model does not always generate valid JSON, and may hallucinate parameters not defined by your function schema. Validate the arguments in your code before calling your function.No
└─ namestringThe name of the function to call.No
namestringAn optional name for the participant. Provides the model information to differentiate between participants of the same role.No
refusalstringThe refusal message by the assistant.No
roleenumThe role of the messages author, in this case assistant.
Possible values: assistant
Yes
tool_callschatCompletionMessageToolCallsThe tool calls generated by the model, such as function calls.No

chatCompletionRequestToolMessage

NameTypeDescriptionRequiredDefault
contentstring or arrayThe contents of the tool message.Yes
roleenumThe role of the messages author, in this case tool.
Possible values: tool
Yes
tool_call_idstringTool call that this message is responding to.Yes

chatCompletionRequestFunctionMessage

NameTypeDescriptionRequiredDefault
contentstringThe contents of the function message.Yes
namestringThe name of the function to call.Yes
roleenumThe role of the messages author, in this case function.
Possible values: function
Yes

chatCompletionRequestDeveloperMessageContentPart

This component can be one of the following:

chatCompletionRequestSystemMessageContentPart

This component can be one of the following:

chatCompletionRequestUserMessageContentPart

This component can be one of the following:

chatCompletionRequestAssistantMessageContentPart

This component can be one of the following:

chatCompletionRequestToolMessageContentPart

This component can be one of the following:

chatCompletionRequestMessageContentPartText

NameTypeDescriptionRequiredDefault
textstringThe text content.Yes
typeenumThe type of the content part.
Possible values: text
Yes

chatCompletionRequestMessageContentPartAudio

NameTypeDescriptionRequiredDefault
input_audioobjectYes
└─ datastringBase64 encoded audio data.No
└─ formatenumThe format of the encoded audio data. Currently supports “wav” and “mp3”.

Possible values: wav, mp3
No
typeenumThe type of the content part. Always input_audio.
Possible values: input_audio
Yes

chatCompletionRequestMessageContentPartImage

NameTypeDescriptionRequiredDefault
image_urlobjectYes
└─ detailenumSpecifies the detail level of the image. Learn more in the Vision guide.
Possible values: auto, low, high
No
└─ urlstringEither a URL of the image or the base64 encoded image data.No
typeenumThe type of the content part.
Possible values: image_url
Yes

chatCompletionRequestMessageContentPartRefusal

NameTypeDescriptionRequiredDefault
refusalstringThe refusal message generated by the model.Yes
typeenumThe type of the content part.
Possible values: refusal
Yes

azureChatExtensionConfiguration

A representation of configuration data for a single Azure OpenAI chat extension. This will be used by a chat completions request that should use Azure OpenAI chat extensions to augment the response behavior. The use of this configuration is compatible only with Azure OpenAI. ###Discriminator for azureChatExtensionConfiguration This component uses the property type to discriminate between different types:
NameTypeDescriptionRequiredDefault
typeazureChatExtensionTypeA representation of configuration data for a single Azure OpenAI chat extension. This will be used by a chat completions request that should use Azure OpenAI chat extensions to augment the response behavior.
The use of this configuration is compatible only with Azure OpenAI.
Yes

azureChatExtensionType

A representation of configuration data for a single Azure OpenAI chat extension. This will be used by a chat completions request that should use Azure OpenAI chat extensions to augment the response behavior. The use of this configuration is compatible only with Azure OpenAI.
PropertyValue
DescriptionA representation of configuration data for a single Azure OpenAI chat extension. This will be used by a chat completions request that should use Azure OpenAI chat extensions to augment the response behavior.
The use of this configuration is compatible only with Azure OpenAI.
Typestring
Valuesazure_search
azure_cosmos_db
elasticsearch
mongo_db
pinecone

azureSearchChatExtensionConfiguration

A specific representation of configurable options for Azure Search when using it as an Azure OpenAI chat extension.
NameTypeDescriptionRequiredDefault
parametersazureSearchChatExtensionParametersParameters for Azure Search when used as an Azure OpenAI chat extension.No
typeazureChatExtensionTypeA representation of configuration data for a single Azure OpenAI chat extension. This will be used by a chat completions request that should use Azure OpenAI chat extensions to augment the response behavior.
The use of this configuration is compatible only with Azure OpenAI.
Yes

azureSearchChatExtensionParameters

Parameters for Azure Search when used as an Azure OpenAI chat extension.
NameTypeDescriptionRequiredDefault
allow_partial_resultbooleanIf specified as true, the system will allow partial search results to be used and the request fails if all the queries fail. If not specified, or specified as false, the request will fail if any search query fails.NoFalse
authenticationonYourDataApiKeyAuthenticationOptions or onYourDataSystemAssignedManagedIdentityAuthenticationOptions or onYourDataUserAssignedManagedIdentityAuthenticationOptions or onYourDataAccessTokenAuthenticationOptionsYes
embedding_dependencyonYourDataEndpointVectorizationSource or onYourDataDeploymentNameVectorizationSource or onYourDataIntegratedVectorizationSourceNo
endpointstringThe absolute endpoint path for the Azure Search resource to use.Yes
fields_mappingazureSearchIndexFieldMappingOptionsOptional settings to control how fields are processed when using a configured Azure Search resource.No
filterstringSearch filter.No
in_scopebooleanWhether queries should be restricted to use of indexed data.No
include_contextsarrayThe included properties of the output context. If not specified, the default value is citations and intent.No
index_namestringThe name of the index to use as available in the referenced Azure Search resource.Yes
max_search_queriesintegerThe max number of rewritten queries should be send to search provider for one user message. If not specified, the system will decide the number of queries to send.No
query_typeazureSearchQueryTypeThe type of Azure Search retrieval query that should be executed when using it as an Azure OpenAI chat extension.No
semantic_configurationstringThe additional semantic configuration for the query.No
strictnessintegerThe configured strictness of the search relevance filtering. The higher of strictness, the higher of the precision but lower recall of the answer.No
top_n_documentsintegerThe configured top number of documents to feature for the configured query.No

azureSearchIndexFieldMappingOptions

Optional settings to control how fields are processed when using a configured Azure Search resource.
NameTypeDescriptionRequiredDefault
content_fieldsarrayThe names of index fields that should be treated as content.No
content_fields_separatorstringThe separator pattern that content fields should use.No
filepath_fieldstringThe name of the index field to use as a filepath.No
image_vector_fieldsarrayThe names of fields that represent image vector data.No
title_fieldstringThe name of the index field to use as a title.No
url_fieldstringThe name of the index field to use as a URL.No
vector_fieldsarrayThe names of fields that represent vector data.No

azureSearchQueryType

The type of Azure Search retrieval query that should be executed when using it as an Azure OpenAI chat extension.
PropertyValue
DescriptionThe type of Azure Search retrieval query that should be executed when using it as an Azure OpenAI chat extension.
Typestring
Valuessimple
semantic
vector
vector_simple_hybrid
vector_semantic_hybrid

azureCosmosDBChatExtensionConfiguration

A specific representation of configurable options for Azure Cosmos DB when using it as an Azure OpenAI chat extension.
NameTypeDescriptionRequiredDefault
parametersazureCosmosDBChatExtensionParametersParameters to use when configuring Azure OpenAI On Your Data chat extensions when using Azure Cosmos DB for MongoDB vCore.No
typeazureChatExtensionTypeA representation of configuration data for a single Azure OpenAI chat extension. This will be used by a chat completions request that should use Azure OpenAI chat extensions to augment the response behavior.
The use of this configuration is compatible only with Azure OpenAI.
Yes

azureCosmosDBChatExtensionParameters

Parameters to use when configuring Azure OpenAI On Your Data chat extensions when using Azure Cosmos DB for MongoDB vCore.
NameTypeDescriptionRequiredDefault
allow_partial_resultbooleanIf specified as true, the system will allow partial search results to be used and the request fails if all the queries fail. If not specified, or specified as false, the request will fail if any search query fails.NoFalse
authenticationonYourDataConnectionStringAuthenticationOptionsThe authentication options for Azure OpenAI On Your Data when using a connection string.Yes
container_namestringThe name of the Azure Cosmos DB resource container.Yes
database_namestringThe MongoDB vCore database name to use with Azure Cosmos DB.Yes
embedding_dependencyonYourDataEndpointVectorizationSource or onYourDataDeploymentNameVectorizationSourceYes
fields_mappingazureCosmosDBFieldMappingOptionsOptional settings to control how fields are processed when using a configured Azure Cosmos DB resource.Yes
in_scopebooleanWhether queries should be restricted to use of indexed data.No
include_contextsarrayThe included properties of the output context. If not specified, the default value is citations and intent.No
index_namestringThe MongoDB vCore index name to use with Azure Cosmos DB.Yes
max_search_queriesintegerThe max number of rewritten queries should be send to search provider for one user message. If not specified, the system will decide the number of queries to send.No
strictnessintegerThe configured strictness of the search relevance filtering. The higher of strictness, the higher of the precision but lower recall of the answer.No
top_n_documentsintegerThe configured top number of documents to feature for the configured query.No

azureCosmosDBFieldMappingOptions

Optional settings to control how fields are processed when using a configured Azure Cosmos DB resource.
NameTypeDescriptionRequiredDefault
content_fieldsarrayThe names of index fields that should be treated as content.Yes
content_fields_separatorstringThe separator pattern that content fields should use.No
filepath_fieldstringThe name of the index field to use as a filepath.No
title_fieldstringThe name of the index field to use as a title.No
url_fieldstringThe name of the index field to use as a URL.No
vector_fieldsarrayThe names of fields that represent vector data.Yes

elasticsearchChatExtensionConfiguration

A specific representation of configurable options for Elasticsearch when using it as an Azure OpenAI chat extension.
NameTypeDescriptionRequiredDefault
parameterselasticsearchChatExtensionParametersParameters to use when configuring Elasticsearch® as an Azure OpenAI chat extension.No
typeazureChatExtensionTypeA representation of configuration data for a single Azure OpenAI chat extension. This will be used by a chat completions request that should use Azure OpenAI chat extensions to augment the response behavior.
The use of this configuration is compatible only with Azure OpenAI.
Yes

elasticsearchChatExtensionParameters

Parameters to use when configuring Elasticsearch® as an Azure OpenAI chat extension.
NameTypeDescriptionRequiredDefault
allow_partial_resultbooleanIf specified as true, the system will allow partial search results to be used and the request fails if all the queries fail. If not specified, or specified as false, the request will fail if any search query fails.NoFalse
authenticationonYourDataKeyAndKeyIdAuthenticationOptions or onYourDataEncodedApiKeyAuthenticationOptionsYes
embedding_dependencyonYourDataEndpointVectorizationSource or onYourDataDeploymentNameVectorizationSource or onYourDataModelIdVectorizationSourceNo
endpointstringThe endpoint of Elasticsearch®.Yes
fields_mappingelasticsearchIndexFieldMappingOptionsOptional settings to control how fields are processed when using a configured Elasticsearch® resource.No
in_scopebooleanWhether queries should be restricted to use of indexed data.No
include_contextsarrayThe included properties of the output context. If not specified, the default value is citations and intent.No
index_namestringThe index name of Elasticsearch®.Yes
max_search_queriesintegerThe max number of rewritten queries should be send to search provider for one user message. If not specified, the system will decide the number of queries to send.No
query_typeelasticsearchQueryTypeThe type of Elasticsearch® retrieval query that should be executed when using it as an Azure OpenAI chat extension.No
strictnessintegerThe configured strictness of the search relevance filtering. The higher of strictness, the higher of the precision but lower recall of the answer.No
top_n_documentsintegerThe configured top number of documents to feature for the configured query.No

elasticsearchIndexFieldMappingOptions

Optional settings to control how fields are processed when using a configured Elasticsearch® resource.
NameTypeDescriptionRequiredDefault
content_fieldsarrayThe names of index fields that should be treated as content.No
content_fields_separatorstringThe separator pattern that content fields should use.No
filepath_fieldstringThe name of the index field to use as a filepath.No
title_fieldstringThe name of the index field to use as a title.No
url_fieldstringThe name of the index field to use as a URL.No
vector_fieldsarrayThe names of fields that represent vector data.No

elasticsearchQueryType

The type of Elasticsearch® retrieval query that should be executed when using it as an Azure OpenAI chat extension.
PropertyValue
DescriptionThe type of Elasticsearch® retrieval query that should be executed when using it as an Azure OpenAI chat extension.
Typestring
Valuessimple
vector

mongoDBChatExtensionConfiguration

A specific representation of configurable options for Mongo DB when using it as an Azure OpenAI chat extension.
NameTypeDescriptionRequiredDefault
parametersmongoDBChatExtensionParametersParameters to use when configuring Azure OpenAI On Your Data chat extensions when using Mongo DB.No
typeazureChatExtensionTypeA representation of configuration data for a single Azure OpenAI chat extension. This will be used by a chat completions request that should use Azure OpenAI chat extensions to augment the response behavior.
The use of this configuration is compatible only with Azure OpenAI.
Yes

mongoDBChatExtensionParameters

Parameters to use when configuring Azure OpenAI On Your Data chat extensions when using Mongo DB.
NameTypeDescriptionRequiredDefault
allow_partial_resultbooleanIf specified as true, the system will allow partial search results to be used and the request fails if all the queries fail. If not specified, or specified as false, the request will fail if any search query fails.NoFalse
app_namestringThe name of the Mongo DB Application.Yes
authenticationonYourDataUsernameAndPasswordAuthenticationOptionsThe authentication options for Azure OpenAI On Your Data when using a username and a password.Yes
collection_namestringThe name of the Mongo DB Collection.Yes
database_namestringThe name of the Mongo DB database.Yes
embedding_dependencyonYourDataEndpointVectorizationSource or onYourDataDeploymentNameVectorizationSourceYes
endpointstringThe name of the Mongo DB cluster endpoint.Yes
fields_mappingmongoDBFieldMappingOptionsOptional settings to control how fields are processed when using a configured Mongo DB resource.Yes
in_scopebooleanWhether queries should be restricted to use of indexed data.No
include_contextsarrayThe included properties of the output context. If not specified, the default value is citations and intent.No
index_namestringThe name of the Mongo DB index.Yes
max_search_queriesintegerThe max number of rewritten queries should be send to search provider for one user message. If not specified, the system will decide the number of queries to send.No
strictnessintegerThe configured strictness of the search relevance filtering. The higher of strictness, the higher of the precision but lower recall of the answer.No
top_n_documentsintegerThe configured top number of documents to feature for the configured query.No

mongoDBFieldMappingOptions

Optional settings to control how fields are processed when using a configured Mongo DB resource.
NameTypeDescriptionRequiredDefault
content_fieldsarrayThe names of index fields that should be treated as content.Yes
content_fields_separatorstringThe separator pattern that content fields should use.No
filepath_fieldstringThe name of the index field to use as a filepath.No
title_fieldstringThe name of the index field to use as a title.No
url_fieldstringThe name of the index field to use as a URL.No
vector_fieldsarrayThe names of fields that represent vector data.Yes

pineconeChatExtensionConfiguration

A specific representation of configurable options for Pinecone when using it as an Azure OpenAI chat extension.
NameTypeDescriptionRequiredDefault
parameterspineconeChatExtensionParametersParameters for configuring Azure OpenAI Pinecone chat extensions.No
typeazureChatExtensionTypeA representation of configuration data for a single Azure OpenAI chat extension. This will be used by a chat completions request that should use Azure OpenAI chat extensions to augment the response behavior.
The use of this configuration is compatible only with Azure OpenAI.
Yes

pineconeChatExtensionParameters

Parameters for configuring Azure OpenAI Pinecone chat extensions.
NameTypeDescriptionRequiredDefault
allow_partial_resultbooleanIf specified as true, the system will allow partial search results to be used and the request fails if all the queries fail. If not specified, or specified as false, the request will fail if any search query fails.NoFalse
authenticationonYourDataApiKeyAuthenticationOptionsThe authentication options for Azure OpenAI On Your Data when using an API key.Yes
embedding_dependencyonYourDataDeploymentNameVectorizationSourceThe details of a a vectorization source, used by Azure OpenAI On Your Data when applying vector search, that is based on an internal embeddings model deployment name in the same Azure OpenAI resource.Yes
environmentstringThe environment name of Pinecone.Yes
fields_mappingpineconeFieldMappingOptionsOptional settings to control how fields are processed when using a configured Pinecone resource.Yes
in_scopebooleanWhether queries should be restricted to use of indexed data.No
include_contextsarrayThe included properties of the output context. If not specified, the default value is citations and intent.No
index_namestringThe name of the Pinecone database index.Yes
max_search_queriesintegerThe max number of rewritten queries should be send to search provider for one user message. If not specified, the system will decide the number of queries to send.No
strictnessintegerThe configured strictness of the search relevance filtering. The higher of strictness, the higher of the precision but lower recall of the answer.No
top_n_documentsintegerThe configured top number of documents to feature for the configured query.No

pineconeFieldMappingOptions

Optional settings to control how fields are processed when using a configured Pinecone resource.
NameTypeDescriptionRequiredDefault
content_fieldsarrayThe names of index fields that should be treated as content.Yes
content_fields_separatorstringThe separator pattern that content fields should use.No
filepath_fieldstringThe name of the index field to use as a filepath.No
title_fieldstringThe name of the index field to use as a title.No
url_fieldstringThe name of the index field to use as a URL.No

onYourDataAuthenticationOptions

The authentication options for Azure OpenAI On Your Data.

Discriminator for onYourDataAuthenticationOptions

This component uses the property type to discriminate between different types:
NameTypeDescriptionRequiredDefault
typeonYourDataAuthenticationTypeThe authentication types supported with Azure OpenAI On Your Data.Yes

onYourDataContextProperty

The context property.
PropertyValue
DescriptionThe context property.
Typestring
Valuescitations
intent
all_retrieved_documents

onYourDataAuthenticationType

The authentication types supported with Azure OpenAI On Your Data.
PropertyValue
DescriptionThe authentication types supported with Azure OpenAI On Your Data.
Typestring
Valuesapi_key
connection_string
key_and_key_id
encoded_api_key
access_token
system_assigned_managed_identity
user_assigned_managed_identity
username_and_password

onYourDataApiKeyAuthenticationOptions

The authentication options for Azure OpenAI On Your Data when using an API key.
NameTypeDescriptionRequiredDefault
keystringThe API key to use for authentication.No
typeonYourDataAuthenticationTypeThe authentication types supported with Azure OpenAI On Your Data.Yes

onYourDataConnectionStringAuthenticationOptions

The authentication options for Azure OpenAI On Your Data when using a connection string.
NameTypeDescriptionRequiredDefault
connection_stringstringThe connection string to use for authentication.No
typeonYourDataAuthenticationTypeThe authentication types supported with Azure OpenAI On Your Data.Yes

onYourDataKeyAndKeyIdAuthenticationOptions

The authentication options for Azure OpenAI On Your Data when using an Elasticsearch key and key ID pair.
NameTypeDescriptionRequiredDefault
keystringThe Elasticsearch key to use for authentication.No
key_idstringThe Elasticsearch key ID to use for authentication.No
typeonYourDataAuthenticationTypeThe authentication types supported with Azure OpenAI On Your Data.Yes

onYourDataEncodedApiKeyAuthenticationOptions

The authentication options for Azure OpenAI On Your Data when using an Elasticsearch encoded API key.
NameTypeDescriptionRequiredDefault
encoded_api_keystringThe Elasticsearch encoded API key to use for authentication.No
typeonYourDataAuthenticationTypeThe authentication types supported with Azure OpenAI On Your Data.Yes

onYourDataAccessTokenAuthenticationOptions

The authentication options for Azure OpenAI On Your Data when using access token.
NameTypeDescriptionRequiredDefault
access_tokenstringThe access token to use for authentication.No
typeonYourDataAuthenticationTypeThe authentication types supported with Azure OpenAI On Your Data.Yes

onYourDataSystemAssignedManagedIdentityAuthenticationOptions

The authentication options for Azure OpenAI On Your Data when using a system-assigned managed identity.
NameTypeDescriptionRequiredDefault
typeonYourDataAuthenticationTypeThe authentication types supported with Azure OpenAI On Your Data.Yes

onYourDataUserAssignedManagedIdentityAuthenticationOptions

The authentication options for Azure OpenAI On Your Data when using a user-assigned managed identity.
NameTypeDescriptionRequiredDefault
managed_identity_resource_idstringThe resource ID of the user-assigned managed identity to use for authentication.No
typeonYourDataAuthenticationTypeThe authentication types supported with Azure OpenAI On Your Data.Yes

onYourDataUsernameAndPasswordAuthenticationOptions

The authentication options for Azure OpenAI On Your Data when using a username and a password.
NameTypeDescriptionRequiredDefault
passwordstringThe password. to use for authentication.No
typeonYourDataAuthenticationTypeThe authentication types supported with Azure OpenAI On Your Data.Yes
usernamestringThe username to use for authentication.No

onYourDataVectorizationSource

An abstract representation of a vectorization source for Azure OpenAI On Your Data with vector search. This component uses the property type to discriminate between different types:
NameTypeDescriptionRequiredDefault
typeonYourDataVectorizationSourceTypeRepresents the available sources Azure OpenAI On Your Data can use to configure vectorization of data for use with vector search.Yes

onYourDataVectorizationSourceType

Represents the available sources Azure OpenAI On Your Data can use to configure vectorization of data for use with vector search.
PropertyValue
DescriptionRepresents the available sources Azure OpenAI On Your Data can use to configure vectorization of data for use with vector search.
Typestring
Valuesendpoint
deployment_name
integrated
model_id

onYourDataEndpointVectorizationSource

The details of a vectorization source, used by Azure OpenAI On Your Data when applying vector search, that is based on a public Azure OpenAI endpoint call for embeddings.
NameTypeDescriptionRequiredDefault
authenticationonYourDataApiKeyAuthenticationOptions or onYourDataAccessTokenAuthenticationOptionsNo
dimensionsintegerThe number of dimensions the embeddings should have. Only supported in text-embedding-3 and later models.No
endpointstringSpecifies the resource endpoint URL from which embeddings should be retrieved. It should be in the format of https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/embeddings. The api-version query parameter isn’t allowed.No
typeonYourDataVectorizationSourceTypeRepresents the available sources Azure OpenAI On Your Data can use to configure vectorization of data for use with vector search.Yes

onYourDataDeploymentNameVectorizationSource

The details of a vectorization source, used by Azure OpenAI On Your Data when applying vector search, that is based on an internal embeddings model deployment name in the same Azure OpenAI resource.
NameTypeDescriptionRequiredDefault
deployment_namestringSpecifies the name of the model deployment to use for vectorization. This model deployment must be in the same Azure OpenAI resource, but On Your Data will use this model deployment via an internal call rather than a public one, which enables vector search even in private networks.No
dimensionsintegerThe number of dimensions the embeddings should have. Only supported in text-embedding-3 and later models.No
typeonYourDataVectorizationSourceTypeRepresents the available sources Azure OpenAI On Your Data can use to configure vectorization of data for use with vector search.Yes

onYourDataIntegratedVectorizationSource

Represents the integrated vectorizer defined within the search resource.
NameTypeDescriptionRequiredDefault
typeonYourDataVectorizationSourceTypeRepresents the available sources Azure OpenAI On Your Data can use to configure vectorization of data for use with vector search.Yes

onYourDataModelIdVectorizationSource

The details of a vectorization source, used by Azure OpenAI On Your Data when applying vector search, that is based on a search service model ID. Currently only supported by Elasticsearch®.
NameTypeDescriptionRequiredDefault
model_idstringSpecifies the model ID to use for vectorization. This model ID must be defined in the search service.No
typeonYourDataVectorizationSourceTypeRepresents the available sources Azure OpenAI On Your Data can use to configure vectorization of data for use with vector search.Yes

azureChatExtensionsMessageContext

A representation of the additional context information available when Azure OpenAI chat extensions are involved in the generation of a corresponding chat completions response. This context information is only populated when using an Azure OpenAI request configured to use a matching extension.
NameTypeDescriptionRequiredDefault
all_retrieved_documentsarrayAll the retrieved documents.No
citationsarrayThe data source retrieval result, used to generate the assistant message in the response.No
intentstringThe detected intent from the chat history, used to pass to the next turn to carry over the context.No

citation

citation information for a chat completions response message.
NameTypeDescriptionRequiredDefault
chunk_idstringThe chunk ID of the citation.No
contentstringThe content of the citation.Yes
filepathstringThe file path of the citation.No
rerank_scorenumberThe rerank score of the retrieved document.No
titlestringThe title of the citation.No
urlstringThe URL of the citation.No

retrievedDocument

The retrieved document.
NameTypeDescriptionRequiredDefault
chunk_idstringThe chunk ID of the citation.No
contentstringThe content of the citation.Yes
data_source_indexintegerThe index of the data source.No
filepathstringThe file path of the citation.No
filter_reasonfilterReasonThe filtering reason of the retrieved document.No
original_search_scorenumberThe original search score of the retrieved document.No
rerank_scorenumberThe rerank score of the retrieved document.No
search_queriesarrayThe search queries used to retrieve the document.No
titlestringThe title of the citation.No
urlstringThe URL of the citation.No

filterReason

The filtering reason of the retrieved document.
PropertyValue
DescriptionThe filtering reason of the retrieved document.
Typestring
Valuesscore
rerank

chatCompletionMessageToolCall

NameTypeDescriptionRequiredDefault
functionobjectThe function that the model called.Yes
└─ argumentsstringThe arguments to call the function with, as generated by the model in JSON format. Note that the model does not always generate valid JSON, and may hallucinate parameters not defined by your function schema. Validate the arguments in your code before calling your function.No
└─ namestringThe name of the function to call.No
idstringThe ID of the tool call.Yes
typetoolCallTypeThe type of the tool call, in this case function.Yes

toolCallType

The type of the tool call, in this case function.
PropertyValue
DescriptionThe type of the tool call, in this case function.
Typestring
Valuesfunction

chatCompletionRequestMessageTool

NameTypeDescriptionRequiredDefault
contentstringThe contents of the message.No
tool_call_idstringTool call that this message is responding to.No

chatCompletionRequestMessageFunction

NameTypeDescriptionRequiredDefault
contentstringThe contents of the message.No
namestringThe contents of the message.No
roleenumThe role of the messages author, in this case function.
Possible values: function
No

createChatCompletionResponse

Represents a chat completion response returned by model, based on the provided input.
NameTypeDescriptionRequiredDefault
choicesarrayA list of chat completion choices. Can be more than one if n is greater than 1.Yes
createdintegerThe Unix timestamp (in seconds) of when the chat completion was created.Yes
idstringA unique identifier for the chat completion.Yes
modelstringThe model used for the chat completion.Yes
objectenumThe object type, which is always chat.completion.
Possible values: chat.completion
Yes
prompt_filter_resultspromptFilterResultsContent filtering results for zero or more prompts in the request. In a streaming request, results for different prompts may arrive at different times or in different orders.No
system_fingerprintstringThis fingerprint represents the backend configuration that the model runs with.

Can be used in conjunction with the seed request parameter to understand when backend changes have been made that might impact determinism.
No
usagecompletionUsageUsage statistics for the completion request.No

createChatCompletionStreamResponse

Represents a streamed chunk of a chat completion response returned by model, based on the provided input.
NameTypeDescriptionRequiredDefault
choicesarrayA list of chat completion choices. Can contain more than one elements if n is greater than 1.
Yes
createdintegerThe Unix timestamp (in seconds) of when the chat completion was created. Each chunk has the same timestamp.Yes
idstringA unique identifier for the chat completion. Each chunk has the same ID.Yes
modelstringThe model to generate the completion.Yes
objectenumThe object type, which is always chat.completion.chunk.
Possible values: chat.completion.chunk
Yes
system_fingerprintstringThis fingerprint represents the backend configuration that the model runs with.
Can be used in conjunction with the seed request parameter to understand when backend changes have been made that might impact determinism.
No

chatCompletionStreamResponseDelta

A chat completion delta generated by streamed model responses.
NameTypeDescriptionRequiredDefault
contentstringThe contents of the chunk message.No
function_callobjectDeprecated and replaced by tool_calls. The name and arguments of a function that should be called, as generated by the model.No
└─ argumentsstringThe arguments to call the function with, as generated by the model in JSON format. Note that the model does not always generate valid JSON, and may hallucinate parameters not defined by your function schema. Validate the arguments in your code before calling your function.No
└─ namestringThe name of the function to call.No
refusalstringThe refusal message generated by the model.No
roleenumThe role of the author of this message.
Possible values: system, user, assistant, tool
No
tool_callsarrayNo

chatCompletionMessageToolCallChunk

NameTypeDescriptionRequiredDefault
functionobjectNo
└─ argumentsstringThe arguments to call the function with, as generated by the model in JSON format. Note that the model does not always generate valid JSON, and may hallucinate parameters not defined by your function schema. Validate the arguments in your code before calling your function.No
└─ namestringThe name of the function to call.No
idstringThe ID of the tool call.No
indexintegerYes
typeenumThe type of the tool. Currently, only function is supported.
Possible values: function
No

chatCompletionStreamOptions

Options for streaming response. Only set this when you set stream: true.
NameTypeDescriptionRequiredDefault
include_usagebooleanIf set, an additional chunk will be streamed before the data: [DONE] message. The usage field on this chunk shows the token usage statistics for the entire request, and the choices field will always be an empty array. All other chunks will also include a usage field, but with a null value.
No

chatCompletionChoiceLogProbs

Log probability information for the choice.
NameTypeDescriptionRequiredDefault
contentarrayA list of message content tokens with log probability information.Yes
refusalarrayA list of message refusal tokens with log probability information.No

chatCompletionTokenLogprob

NameTypeDescriptionRequiredDefault
bytesarrayA list of integers representing the UTF-8 bytes representation of the token. Useful in instances where characters are represented by multiple tokens and their byte representations must be combined to generate the correct text representation. Can be null if there is no bytes representation for the token.Yes
logprobnumberThe log probability of this token.Yes
tokenstringThe token.Yes
top_logprobsarrayList of the most likely tokens and their log probability, at this token position. In rare cases, there may be fewer than the number of requested top_logprobs returned.Yes

chatCompletionResponseMessage

A chat completion message generated by the model.
NameTypeDescriptionRequiredDefault
audioobjectIf the audio output modality is requested, this object contains data about the audio response from the model.No
└─ datastringBase64 encoded audio bytes generated by the model, in the format specified in the request.
No
└─ expires_atintegerThe Unix timestamp (in seconds) for when this audio response will no longer be accessible on the server for use in multi-turn conversations.
No
└─ idstringUnique identifier for this audio response.No
└─ transcriptstringTranscript of the audio generated by the model.No
contentstringThe contents of the message.Yes
contextazureChatExtensionsMessageContextA representation of the additional context information available when Azure OpenAI chat extensions are involved in the generation of a corresponding chat completions response. This context information is only populated when using an Azure OpenAI request configured to use a matching extension.No
function_callchatCompletionFunctionCallDeprecated and replaced by tool_calls. The name and arguments of a function that should be called, as generated by the model.No
refusalstringThe refusal message generated by the model.Yes
rolechatCompletionResponseMessageRoleThe role of the author of the response message.Yes
tool_callsarrayThe tool calls generated by the model, such as function calls.No

chatCompletionResponseMessageRole

The role of the author of the response message.
PropertyValue
DescriptionThe role of the author of the response message.
Typestring
Valuesassistant

chatCompletionToolChoiceOption

Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. none is the default when no tools are present. auto is the default if tools are present. This component can be one of the following:

chatCompletionNamedToolChoice

Specifies a tool the model should use. Use to force the model to call a specific function.
NameTypeDescriptionRequiredDefault
functionobjectYes
└─ namestringThe name of the function to call.No
typeenumThe type of the tool. Currently, only function is supported.
Possible values: function
Yes

ParallelToolCalls

Whether to enable parallel function calling during tool use. No properties defined for this component.

PredictionContent

Static predicted output content, such as the content of a text file that is being regenerated.
NameTypeDescriptionRequiredDefault
contentstring or arrayThe content that should be matched when generating a model response. If generated tokens would match this content, the entire model response can be returned much more quickly.Yes
typeenumThe type of the predicted content you want to provide. This type is currently always content.
Possible values: content
Yes

chatCompletionMessageToolCalls

The tool calls generated by the model, such as function calls. No properties defined for this component.

ChatCompletionModalities

Output types that you would like the model to generate for this request. Most models are capable of generating text, which is the default: ["text"] The gpt-4o-audio-preview model can also be used to generate audio. To request that this model generate both text and audio responses, you can use: ["text", "audio"] No properties defined for this component.

chatCompletionFunctionCall

Deprecated and replaced by tool_calls. The name and arguments of a function that should be called, as generated by the model.
NameTypeDescriptionRequiredDefault
argumentsstringThe arguments to call the function with, as generated by the model in JSON format. Note that the model does not always generate valid JSON, and may hallucinate parameters not defined by your function schema. Validate the arguments in your code before calling your function.Yes
namestringThe name of the function to call.Yes

completionUsage

Usage statistics for the completion request.
NameTypeDescriptionRequiredDefault
completion_tokensintegerNumber of tokens in the generated completion.Yes
completion_tokens_detailsobjectBreakdown of tokens used in a completion.No
└─ accepted_prediction_tokensintegerWhen using Predicted Outputs, the number of tokens in the prediction that appeared in the completion.No
└─ audio_tokensintegerAudio input tokens generated by the model.No
└─ reasoning_tokensintegerTokens generated by the model for reasoning.No
└─ rejected_prediction_tokensintegerWhen using Predicted Outputs, the number of tokens in the prediction that did not appear in the completion. However, like reasoning tokens, these tokens are still counted in the total completion tokens for purposes of billing, output, and context window limits.No
prompt_tokensintegerNumber of tokens in the prompt.Yes
prompt_tokens_detailsobjectDetails of the prompt tokens.No
└─ audio_tokensintegerAudio input tokens present in the prompt.No
└─ cached_tokensintegerThe number of cached prompt tokens.No
total_tokensintegerTotal number of tokens used in the request (prompt + completion).Yes

chatCompletionTool

NameTypeDescriptionRequiredDefault
functionFunctionObjectYes
typeenumThe type of the tool. Currently, only function is supported.
Possible values: function
Yes

FunctionParameters

The parameters the functions accepts, described as a JSON Schema object. See the guide for examples, and the JSON Schema reference for documentation about the format. Omitting parameters defines a function with an empty parameter list. No properties defined for this component.

FunctionObject

NameTypeDescriptionRequiredDefault
descriptionstringA description of what the function does, used by the model to choose when and how to call the function.No
namestringThe name of the function to be called. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.Yes
parametersFunctionParametersThe parameters the functions accepts, described as a JSON Schema object. See the guide for examples, and the JSON Schema reference for documentation about the format.

Omitting parameters defines a function with an empty parameter list.
No
strictbooleanWhether to enable strict schema adherence when generating the function call. If set to true, the model will follow the exact schema defined in the parameters field. Only a subset of JSON Schema is supported when strict is true.NoFalse

ResponseFormatText

NameTypeDescriptionRequiredDefault
typeenumThe type of response format being defined: text
Possible values: text
Yes

ResponseFormatJsonObject

NameTypeDescriptionRequiredDefault
typeenumThe type of response format being defined: json_object
Possible values: json_object
Yes

ResponseFormatJsonSchemaSchema

The schema for the response format, described as a JSON Schema object. No properties defined for this component.

ResponseFormatJsonSchema

NameTypeDescriptionRequiredDefault
json_schemaobjectYes
└─ descriptionstringA description of what the response format is for, used by the model to determine how to respond in the format.No
└─ namestringThe name of the response format. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.No
└─ schemaResponseFormatJsonSchemaSchemaThe schema for the response format, described as a JSON Schema object.No
└─ strictbooleanWhether to enable strict schema adherence when generating the output. If set to true, the model will always follow the exact schema defined in the schema field. Only a subset of JSON Schema is supported when strict is true.NoFalse
typeenumThe type of response format being defined: json_schema
Possible values: json_schema
Yes

chatCompletionChoiceCommon

NameTypeDescriptionRequiredDefault
finish_reasonstringNo
indexintegerNo

createTranslationRequest

Translation request.
NameTypeDescriptionRequiredDefault
filestringThe audio file to translate.Yes
promptstringAn optional text to guide the model’s style or continue a previous audio segment. The prompt should be in English.No
response_formataudioResponseFormatDefines the format of the output.No
temperaturenumberThe sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model uses log probability to automatically increase the temperature until certain thresholds are hit.No0

audioResponse

Translation or transcription response when response_format was json
NameTypeDescriptionRequiredDefault
textstringTranslated or transcribed text.Yes

audioVerboseResponse

Translation or transcription response when response_format was verbose_json
NameTypeDescriptionRequiredDefault
durationnumberDuration.No
languagestringLanguage.No
segmentsarrayNo
taskstringType of audio task.No
textstringTranslated or transcribed text.Yes
wordsarrayNo

audioResponseFormat

Defines the format of the output.
PropertyValue
DescriptionDefines the format of the output.
Typestring
Valuesjson
text
srt
verbose_json
vtt

createTranscriptionRequest

Transcription request.
NameTypeDescriptionRequiredDefault
filestringThe audio file object to transcribe.Yes
languagestringThe language of the input audio. Supplying the input language in ISO-639-1 format improves accuracy and latency.No
promptstringAn optional text to guide the model’s style or continue a previous audio segment. The prompt should match the audio language.No
response_formataudioResponseFormatDefines the format of the output.No
temperaturenumberThe sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model uses log probability to automatically increase the temperature until certain thresholds are hit.No0
timestamp_granularities[]arrayThe timestamp granularities to populate for this transcription. response_format must be set verbose_json to use timestamp granularities. Either or both of these options are supported: word, or segment. Note: There is no additional latency for segment timestamps, but generating word timestamps incurs additional latency.No[‘segment’]

audioSegment

Transcription or translation segment.
NameTypeDescriptionRequiredDefault
avg_logprobnumberAverage log probability.No
compression_rationumberCompression ratio.No
endnumberSegment end offset.No
idintegerSegment identifier.No
no_speech_probnumberProbability of ‘no speech’.No
seeknumberOffset of the segment.No
startnumberSegment start offset.No
temperaturenumberTemperature.No
textstringSegment text.No
tokensarrayTokens of the text.No

audioWord

Transcription or translation word.
NameTypeDescriptionRequiredDefault
endnumberWord end offset.No
startnumberWord start offset.No
wordstringWordNo

createSpeechRequest

Speech request.
NameTypeDescriptionRequiredDefault
inputstringThe text to synthesize audio for. The maximum length is 4,096 characters.Yes
response_formatenumThe format to synthesize the audio in.
Possible values: mp3, opus, aac, flac, wav, pcm
No
speednumberThe speed of the synthesized audio. Select a value from 0.25 to 4.0. 1.0 is the default.No1.0
voiceenumThe voice to use for speech synthesis.
Possible values: alloy, echo, fable, onyx, nova, shimmer
Yes

imageQuality

The quality of the image that will be generated.
PropertyValue
DescriptionThe quality of the image that will be generated.
Typestring
Defaultauto
Valuesauto
high
medium
low
hd
standard

imagesResponseFormat

The format in which the generated images are returned.
PropertyValue
DescriptionThe format in which the generated images are returned.
Typestring
Defaulturl
Valuesurl
b64_json

imagesOutputFormat

The file format in which the generated images are returned. Only supported for series models.
PropertyValue
DescriptionThe file format in which the generated images are returned. Only supported for gpt-image-1 series models.
Typestring
Defaultpng
Valuespng
jpeg

imageSize

The size of the generated images.
PropertyValue
DescriptionThe size of the generated images.
Typestring
Defaultauto
Valuesauto
1792x1024
1024x1792
1024x1024
1024x1536
1536x1024

imageStyle

The style of the generated images. Only supported for dall-e-3.
PropertyValue
DescriptionThe style of the generated images. Only supported for dall-e-3.
Typestring
Defaultvivid
Valuesvivid
natural

imageBackground

Allows to set transparency for the background of the generated image(s). This parameter is only supported for gpt-image-1 series models.
PropertyValue
DescriptionAllows to set transparency for the background of the generated image(s). This parameter is only supported for gpt-image-1 series models.
Typestring
Defaultauto
Valuestransparent
opaque
auto

imageGenerationsRequest

NameTypeDescriptionRequiredDefault
backgroundimageBackgroundAllows to set transparency for the background of the generated image(s). This parameter is only supported for gpt-image-1 series models.Noauto
nintegerThe number of images to generate. For dall-e-3, only n=1 is supported.No1
output_compressionintegerThe compression level (0-100%) for the generated images. This parameter is only supported for gpt-image-1 series models with the jpeg output format.No100
output_formatimagesOutputFormatThe file format in which the generated images are returned. Only supported for gpt-image-1 series models.Nopng
promptstringA text description of the desired image(s). The maximum length is 32000 characters for gpt-image-1 series models and 4000 characters for dall-e-3Yes
qualityimageQualityThe quality of the image that will be generated.Noauto
response_formatimagesResponseFormatThe format in which the generated images are returned. Only supported for dall-e-3.Nourl
sizeimageSizeThe size of the generated images.Noauto
styleimageStyleThe style of the generated images. Only supported for dall-e-3.Novivid
userstringA unique identifier representing your end-user, which can help to monitor and detect abuse.No

imageEditsRequest

NameTypeDescriptionRequiredDefault
imagestring or arrayThe image(s) to edit. Must be a supported image file or an array of images. Each image should be a png, or jpg file less than 25MB.Yes
maskstringAn additional image whose fully transparent areas (e.g., where alpha is zero) indicate where the image should be edited. If there are multiple images provided, the mask will be applied to the first image. Must be a valid PNG file, less than 4MB, and have the same dimensions as the image.No
nintegerThe number of images to generate.No1
promptstringA text description of the desired image(s). The maximum length is 32000 characters.Yes
qualityimageQualityThe quality of the image that will be generated.Noauto
response_formatimagesResponseFormatThe format in which the generated images are returned.Nourl
sizeimageSizeThe size of the generated images.Noauto
userstringA unique identifier representing your end-user, which can help to monitor and detect abuse.No

generateImagesResponse

NameTypeDescriptionRequiredDefault
createdintegerThe unix timestamp when the operation was created.Yes
dataarrayThe result data of the operation, if successfulYes
usageimageGenerationsUsageRepresents token usage details for image generation requests. Only for gpt-image-1 series models.No

imageResult

The image url or encoded image if successful, and an error otherwise.
NameTypeDescriptionRequiredDefault
b64_jsonstringThe base64 encoded imageNo
content_filter_resultsdalleContentFilterResultsInformation about the content filtering results.No
prompt_filter_resultsdalleFilterResultsInformation about the content filtering category (hate, sexual, violence, self_harm), if it has been detected, as well as the severity level (very_low, low, medium, high-scale that determines the intensity and risk level of harmful content) and if it has been filtered or not. Information about jailbreak content and profanity, if it has been detected, and if it has been filtered or not. And information about customer block list, if it has been filtered and its id.No
revised_promptstringThe prompt that was used to generate the image, if there was any revision to the prompt.No
urlstringThe image url.No

imageGenerationsUsage

Represents token usage details for image generation requests. Only for gpt-image-1 series models.
NameTypeDescriptionRequiredDefault
input_tokensintegerThe number of input tokens.No
input_tokens_detailsobjectA detailed breakdown of the input tokens.No
└─ image_tokensintegerThe number of image tokens.No
└─ text_tokensintegerThe number of text tokens.No
output_tokensintegerThe number of output tokens.No
total_tokensintegerThe total number of tokens used.No

line

A content line object consisting of an adjacent sequence of content elements, such as words and selection marks.
NameTypeDescriptionRequiredDefault
spansarrayAn array of spans that represent detected objects and its bounding box information.Yes
textstringYes

span

A span object that represents a detected object and its bounding box information.
NameTypeDescriptionRequiredDefault
lengthintegerThe length of the span in characters, measured in Unicode codepoints.Yes
offsetintegerThe character offset within the text where the span begins. This offset is defined as the position of the first character of the span, counting from the start of the text as Unicode codepoints.Yes
polygonarrayAn array of objects representing points in the polygon that encloses the detected object.Yes
textstringThe text content of the span that represents the detected object.Yes

runCompletionUsage

Usage statistics related to the run. This value will be null if the run isn’t in a terminal state (i.e. in_progress, queued, etc.).
NameTypeDescriptionRequiredDefault
completion_tokensintegerNumber of completion tokens used over the course of the run.Yes
prompt_tokensintegerNumber of prompt tokens used over the course of the run.Yes
total_tokensintegerTotal number of tokens used (prompt + completion).Yes

runStepCompletionUsage

Usage statistics related to the run step. This value will be null while the run step’s status is in_progress.
NameTypeDescriptionRequiredDefault
completion_tokensintegerNumber of completion tokens used over the course of the run step.Yes
prompt_tokensintegerNumber of prompt tokens used over the course of the run step.Yes
total_tokensintegerTotal number of tokens used (prompt + completion).Yes

assistantsApiResponseFormatOption

Specifies the format that the model must output. Compatible with GPT-4o, GPT-4 Turbo, and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106. Setting to { "type": "json_schema", "json_schema": {...} } enables Structured Outputs which ensure the model matches your supplied JSON schema. Learn more in the Structured Outputs guide. Setting to { "type": "json_object" } enables JSON mode, which ensures the message the model generates is valid JSON. Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly “stuck” request. Also note that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length. This component can be one of the following:

assistantsApiResponseFormat

An object describing the expected output of the model. If json_object only function type tools are allowed to be passed to the Run. If text the model can return text or any value needed.
NameTypeDescriptionRequiredDefault
typestringMust be one of text or json_object.Notext
type Enum: AssistantsApiResponseFormat
ValueDescription
text
json_object

assistantObject

Represents an assistant that can call the model and use tools.
NameTypeDescriptionRequiredDefault
created_atintegerThe Unix timestamp (in seconds) for when the assistant was created.Yes
descriptionstringThe description of the assistant. The maximum length is 512 characters.
Yes
idstringThe identifier, which can be referenced in API endpoints.Yes
instructionsstringThe system instructions that the assistant uses. The maximum length is 256,000 characters.
Yes
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
Yes
modelstringID of the model to use.Yes
namestringThe name of the assistant. The maximum length is 256 characters.
Yes
objectstringThe object type, which is always assistant.Yes
response_formatassistantsApiResponseFormatOptionSpecifies the format that the model must output. Compatible with GPT-4o, GPT-4 Turbo, and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106.

Setting to { "type": "json_schema", "json_schema": {...} } enables Structured Outputs which ensure the model matches your supplied JSON schema. Learn more in the Structured Outputs guide.

Setting to { "type": "json_object" } enables JSON mode, which ensures the message the model generates is valid JSON.

Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly “stuck” request. Also note that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.
No
temperaturenumberWhat sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
No1
tool_resourcesobjectA set of resources that are used by the assistant’s tools. The resources are specific to the type of tool. For example, the code_interpreter tool requires a list of file IDs, while the file_search tool requires a list of vector store IDs.
No
└─ code_interpreterobjectNo
└─ file_idsarrayA list of file IDs made available to the code_interpreter tool. There can be a maximum of 20 files associated with the tool.
No[]
└─ file_searchobjectNo
└─ vector_store_idsarrayThe ID of the vector store attached to this assistant. There can be a maximum of 1 vector store attached to the assistant.
No
toolsarrayA list of tool enabled on the assistant. There can be a maximum of 128 tools per assistant. Tools can be of types code_interpreter, file_search, or function.
Yes[]
top_pnumberAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.
No1
object Enum: AssistantObjectType
ValueDescription
assistantThe object type, which is always assistant

createAssistantRequest

NameTypeDescriptionRequiredDefault
descriptionstringThe description of the assistant. The maximum length is 512 characters.
No
instructionsstringThe system instructions that the assistant uses. The maximum length is 256,000 characters.
No
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
No
modelstringYes
namestringThe name of the assistant. The maximum length is 256 characters.
No
response_formatassistantsApiResponseFormatOptionSpecifies the format that the model must output. Compatible with GPT-4o, GPT-4 Turbo, and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106.

Setting to { "type": "json_schema", "json_schema": {...} } enables Structured Outputs which ensure the model matches your supplied JSON schema. Learn more in the Structured Outputs guide.

Setting to { "type": "json_object" } enables JSON mode, which ensures the message the model generates is valid JSON.

Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly “stuck” request. Also note that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.
No
temperaturenumberWhat sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
No1
tool_resourcesobjectA set of resources that are used by the assistant’s tools. The resources are specific to the type of tool. For example, the code_interpreter tool requires a list of file IDs, while the file_search tool requires a list of vector store IDs.
No
└─ code_interpreterobjectNo
└─ file_idsarrayA list of file IDs made available to the code_interpreter tool. There can be a maximum of 20 files associated with the tool.
No[]
└─ file_searchobjectNo
└─ vector_store_idsarrayThe vector store attached to this assistant. There can be a maximum of 1 vector store attached to the assistant.
No
└─ vector_storesarrayA helper to create a vector store with file_ids and attach it to this assistant. There can be a maximum of 1 vector store attached to the assistant.
No
toolsarrayA list of tool enabled on the assistant. There can be a maximum of 128 tools per assistant. Tools can be of types code_interpreter, retrieval, or function.
No[]
top_pnumberAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.
No1

modifyAssistantRequest

NameTypeDescriptionRequiredDefault
descriptionstringThe description of the assistant. The maximum length is 512 characters.
No
instructionsstringThe system instructions that the assistant uses. The maximum length is 32768 characters.
No
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
No
modelstringNo
namestringThe name of the assistant. The maximum length is 256 characters.
No
response_formatassistantsApiResponseFormatOptionSpecifies the format that the model must output. Compatible with GPT-4o, GPT-4 Turbo, and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106.

Setting to { "type": "json_schema", "json_schema": {...} } enables Structured Outputs which ensure the model matches your supplied JSON schema. Learn more in the Structured Outputs guide.

Setting to { "type": "json_object" } enables JSON mode, which ensures the message the model generates is valid JSON.

Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly “stuck” request. Also note that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.
No
temperaturenumberWhat sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
No1
tool_resourcesobjectA set of resources that are used by the assistant’s tools. The resources are specific to the type of tool. For example, the code_interpreter tool requires a list of file IDs, while the file_search tool requires a list of vector store IDs.
No
└─ code_interpreterobjectNo
└─ file_idsarrayOverrides the list of file IDs made available to the code_interpreter tool. There can be a maximum of 20 files associated with the tool.
No[]
└─ file_searchobjectNo
└─ vector_store_idsarrayOverrides the vector store attached to this assistant. There can be a maximum of 1 vector store attached to the assistant.
No
toolsarrayA list of tool enabled on the assistant. There can be a maximum of 128 tools per assistant. Tools can be of types code_interpreter, retrieval, or function.
No[]
top_pnumberAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.
No1

deleteAssistantResponse

NameTypeDescriptionRequiredDefault
deletedbooleanYes
idstringYes
objectstringYes
object Enum: DeleteAssistantResponseState
ValueDescription
assistant.deleted

listAssistantsResponse

NameTypeDescriptionRequiredDefault
dataarrayYes
first_idstringYes
has_morebooleanYes
last_idstringYes
objectstringYes

assistantToolsCode

NameTypeDescriptionRequiredDefault
typestringThe type of tool being defined: code_interpreterYes
type Enum: assistantToolsCodeType
ValueDescription
code_interpreter

assistantToolsFileSearch

NameTypeDescriptionRequiredDefault
file_searchobjectOverrides for the file search tool.No
└─ max_num_resultsintegerThe maximum number of results the file search tool should output. The default is 20 for gpt-4* models and 5 for gpt-3.5-turbo. This number should be between 1 and 50 inclusive.

Note that the file search tool may output fewer than max_num_results results.
No
typestringThe type of tool being defined: file_searchYes
type Enum: assistantToolsFileSearchType
ValueDescription
file_search

assistantToolsFileSearchTypeOnly

NameTypeDescriptionRequiredDefault
typestringThe type of tool being defined: file_searchYes
type Enum: assistantToolsFileSearchType
ValueDescription
file_search

assistantToolsFunction

NameTypeDescriptionRequiredDefault
functionobjectThe function definition.Yes
└─ descriptionstringA description of what the function does, used by the model to choose when and how to call the function.No
└─ namestringThe name of the function to be called. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.No
└─ parameterschatCompletionFunctionParametersThe parameters the functions accepts, described as a JSON Schema object. See the guide/ for examples, and the JSON Schema reference for documentation about the format.No
typestringThe type of tool being defined: functionYes
type Enum: assistantToolsFunction
ValueDescription
function

truncationObject

Controls for how a thread will be truncated prior to the run. Use this to control the initial context window of the run.
NameTypeDescriptionRequiredDefault
last_messagesintegerThe number of most recent messages from the thread when constructing the context for the run.No
typestringThe truncation strategy to use for the thread. The default is auto. If set to last_messages, the thread will be truncated to the n most recent messages in the thread. When set to auto, messages in the middle of the thread will be dropped to fit the context length of the model, max_prompt_tokens.Yes
type Enum: TruncationType
ValueDescription
auto
last_messages

assistantsApiToolChoiceOption

Controls which (if any) tool is called by the model. none means the model will not call any tools and instead generates a message. auto is the default value and means the model can pick between generating a message or calling a tool. Specifying a particular tool like {"type": "file_search"} or {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool. This component can be one of the following:

assistantsNamedToolChoice

Specifies a tool the model should use. Use to force the model to call a specific tool.
NameTypeDescriptionRequiredDefault
functionobjectNo
└─ namestringThe name of the function to call.No
typestringThe type of the tool. If type is function, the function name must be setYes
type Enum: AssistantsNamedToolChoiceType
ValueDescription
function
code_interpreter
file_search

runObject

Represents an execution run on a threads.
NameTypeDescriptionRequiredDefault
assistant_idstringThe ID of the assistant used for execution of this run.Yes
cancelled_atintegerThe Unix timestamp (in seconds) for when the run was cancelled.Yes
completed_atintegerThe Unix timestamp (in seconds) for when the run was completed.Yes
created_atintegerThe Unix timestamp (in seconds) for when the run was created.Yes
expires_atintegerThe Unix timestamp (in seconds) for when the run will expire.Yes
failed_atintegerThe Unix timestamp (in seconds) for when the run failed.Yes
idstringThe identifier, which can be referenced in API endpoints.Yes
incomplete_detailsobjectDetails on why the run is incomplete. Will be null if the run isn’t incomplete.Yes
└─ reasonstringThe reason why the run is incomplete. This will point to which specific token limit was reached over the course of the run.No
instructionsstringThe instructions that the assistant used for this run.Yes
last_errorobjectThe last error associated with this run. Will be null if there are no errors.Yes
└─ codestringOne of server_error or rate_limit_exceeded.No
└─ messagestringA human-readable description of the error.No
max_completion_tokensintegerThe maximum number of completion tokens specified to have been used over the course of the run.
Yes
max_prompt_tokensintegerThe maximum number of prompt tokens specified to have been used over the course of the run.
Yes
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
Yes
modelstringThe model that the assistant used for this run.Yes
objectstringThe object type, which is always thread.run.Yes
parallel_tool_callsParallelToolCallsWhether to enable parallel function calling during tool use.NoTrue
required_actionobjectDetails on the action required to continue the run. Will be null if no action is required.Yes
└─ submit_tool_outputsobjectDetails on the tool outputs needed for this run to continue.No
└─ tool_callsarrayA list of the relevant tool calls.No
└─ typeenumFor now, this is always submit_tool_outputs.
Possible values: submit_tool_outputs
No
response_formatassistantsApiResponseFormatOptionSpecifies the format that the model must output. Compatible with GPT-4o, GPT-4 Turbo, and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106.

Setting to { "type": "json_schema", "json_schema": {...} } enables Structured Outputs which ensure the model matches your supplied JSON schema. Learn more in the Structured Outputs guide.

Setting to { "type": "json_object" } enables JSON mode, which ensures the message the model generates is valid JSON.

Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly “stuck” request. Also note that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.
Yes
started_atintegerThe Unix timestamp (in seconds) for when the run was started.Yes
statusstringThe status of the run, which can be either queued, in_progress, requires_action, cancelling, cancelled, failed, completed, or expired.Yes
temperaturenumberThe sampling temperature used for this run. If not set, defaults to 1.No
thread_idstringThe ID of the threads that was executed on as a part of this run.Yes
tool_choiceassistantsApiToolChoiceOptionControls which (if any) tool is called by the model.
none means the model will not call any tools and instead generates a message.
auto is the default value and means the model can pick between generating a message or calling a tool.
Specifying a particular tool like {"type": "file_search"} or {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool.
Yes
toolsarrayThe list of tools that the assistant used for this run.Yes[]
top_pnumberThe nucleus sampling value used for this run. If not set, defaults to 1.No
truncation_strategytruncationObjectControls for how a thread will be truncated prior to the run. Use this to control the initial context window of the run.Yes
usagerunCompletionUsageUsage statistics related to the run. This value will be null if the run isn’t in a terminal state (i.e. in_progress, queued, etc.).Yes
object Enum: runObjectType
ValueDescription
thread.runThe run object type which is always thread.run
status Enum: RunObjectStatus
ValueDescription
queuedThe queued state
in_progressThe in_progress state
requires_actionThe required_action state
cancellingThe cancelling state
cancelledThe cancelled state
failedThe failed state
completedThe completed state
expiredThe expired state

createRunRequest

NameTypeDescriptionRequiredDefault
additional_instructionsstringAppends additional instructions at the end of the instructions for the run. This is useful for modifying the behavior on a per-run basis without overriding other instructions.No
additional_messagesarrayAdds additional messages to the thread before creating the run.No
assistant_idstringThe ID of the assistant to use to execute this run.Yes
instructionsstringOverride the default system message of the assistant. This is useful for modifying the behavior on a per-run basis.No
max_completion_tokensintegerThe maximum number of completion tokens that may be used over the course of the run. The run makes a best effort to use only the number of completion tokens specified, across multiple turns of the run. If the run exceeds the number of completion tokens specified, the run will end with status incomplete. See incomplete_details for more info.
No
max_prompt_tokensintegerThe maximum number of prompt tokens that may be used over the course of the run. The run makes a best effort to use only the number of prompt tokens specified, across multiple turns of the run. If the run exceeds the number of prompt tokens specified, the run will end with status incomplete. See incomplete_details for more info.
No
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
No
modelstringThe ID of the Model to be used to execute this run. If a value is provided here, it will override the model associated with the assistant. If not, the model associated with the assistant will be used.No
parallel_tool_callsParallelToolCallsWhether to enable parallel function calling during tool use.NoTrue
response_formatassistantsApiResponseFormatOptionSpecifies the format that the model must output. Compatible with GPT-4o, GPT-4 Turbo, and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106.

Setting to { "type": "json_schema", "json_schema": {...} } enables Structured Outputs which ensure the model matches your supplied JSON schema. Learn more in the Structured Outputs guide.

Setting to { "type": "json_object" } enables JSON mode, which ensures the message the model generates is valid JSON.

Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly “stuck” request. Also note that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.
No
streambooleanIf true, returns a stream of events that happen during the Run as server-sent events, terminating when the Run enters a terminal state with a data: [DONE] message.
No
temperaturenumberWhat sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
No1
tool_choiceassistantsApiToolChoiceOptionControls which (if any) tool is called by the model.
none means the model will not call any tools and instead generates a message.
auto is the default value and means the model can pick between generating a message or calling a tool.
Specifying a particular tool like {"type": "file_search"} or {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool.
No
toolsarrayOverride the tools the assistant can use for this run. This is useful for modifying the behavior on a per-run basis.No
top_pnumberAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.
No1
truncation_strategytruncationObjectControls for how a thread will be truncated prior to the run. Use this to control the initial context window of the run.No

listRunsResponse

NameTypeDescriptionRequiredDefault
dataarrayYes
first_idstringYes
has_morebooleanYes
last_idstringYes
objectstringYes

modifyRunRequest

NameTypeDescriptionRequiredDefault
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
No

submitToolOutputsRunRequest

NameTypeDescriptionRequiredDefault
streambooleanIf true, returns a stream of events that happen during the Run as server-sent events, terminating when the Run enters a terminal state with a data: [DONE] message.
No
tool_outputsarrayA list of tools for which the outputs are being submitted.Yes

runToolCallObject

Tool call objects
NameTypeDescriptionRequiredDefault
functionobjectThe function definition.Yes
└─ argumentsstringThe arguments that the model expects you to pass to the function.No
└─ namestringThe name of the function.No
idstringThe ID of the tool call. This ID must be referenced when you submit the tool outputs in using the submit tool outputs to run endpointendpoint.Yes
typestringThe type of tool call the output is required for. For now, this is always function.Yes
type Enum: RunToolCallObjectType
ValueDescription
function

createThreadAndRunRequest

NameTypeDescriptionRequiredDefault
assistant_idstringThe ID of the assistant to use to execute this run.Yes
instructionsstringOverride the default system message of the assistant. This is useful for modifying the behavior on a per-run basis.No
max_completion_tokensintegerThe maximum number of completion tokens that may be used over the course of the run. The run makes a best effort to use only the number of completion tokens specified, across multiple turns of the run. If the run exceeds the number of completion tokens specified, the run will end with status incomplete. See incomplete_details for more info.
No
max_prompt_tokensintegerThe maximum number of prompt tokens that may be used over the course of the run. The run makes a best effort to use only the number of prompt tokens specified, across multiple turns of the run. If the run exceeds the number of prompt tokens specified, the run will end with status incomplete. See incomplete_details for more info.
No
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
No
modelstringThe ID of the models to be used to execute this run. If a value is provided here, it will override the model associated with the assistant. If not, the model associated with the assistant will be used.No
parallel_tool_callsParallelToolCallsWhether to enable parallel function calling during tool use.NoTrue
response_formatassistantsApiResponseFormatOptionSpecifies the format that the model must output. Compatible with GPT-4o, GPT-4 Turbo, and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106.

Setting to { "type": "json_schema", "json_schema": {...} } enables Structured Outputs which ensure the model matches your supplied JSON schema. Learn more in the Structured Outputs guide.

Setting to { "type": "json_object" } enables JSON mode, which ensures the message the model generates is valid JSON.

Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly “stuck” request. Also note that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.
No
streambooleanIf true, returns a stream of events that happen during the Run as server-sent events, terminating when the Run enters a terminal state with a data: [DONE] message.
No
stream_optionschatCompletionStreamOptionsOptions for streaming response. Only set this when you set stream: true.
NoNone
temperaturenumberWhat sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
No1
threadcreateThreadRequestNo
tool_choiceassistantsApiToolChoiceOptionControls which (if any) tool is called by the model.
none means the model will not call any tools and instead generates a message.
auto is the default value and means the model can pick between generating a message or calling a tool.
Specifying a particular tool like {"type": "file_search"} or {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool.
No
tool_resourcesobjectA set of resources that are used by the assistant’s tools. The resources are specific to the type of tool. For example, the code_interpreter tool requires a list of file IDs, while the file_search tool requires a list of vector store IDs.
No
└─ code_interpreterobjectNo
└─ file_idsarrayA list of file IDs made available to the code_interpreter tool. There can be a maximum of 20 files associated with the tool.
No[]
└─ file_searchobjectNo
└─ vector_store_idsarrayThe ID of the vector store attached to this assistant. There can be a maximum of 1 vector store attached to the assistant.
No
toolsarrayOverride the tools the assistant can use for this run. This is useful for modifying the behavior on a per-run basis.No
top_pnumberAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.
No1
truncation_strategytruncationObjectControls for how a thread will be truncated prior to the run. Use this to control the initial context window of the run.No

threadObject

Represents a thread that contains messages.
NameTypeDescriptionRequiredDefault
created_atintegerThe Unix timestamp (in seconds) for when the thread was created.Yes
idstringThe identifier, which can be referenced in API endpoints.Yes
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
Yes
objectstringThe object type, which is always thread.Yes
tool_resourcesobjectA set of resources that are made available to the assistant’s tools in this thread. The resources are specific to the type of tool. For example, the code_interpreter tool requires a list of file IDs, while the file_search tool requires a list of vector store IDs.
Yes
└─ code_interpreterobjectNo
└─ file_idsarrayA list of file IDs made available to the code_interpreter tool. There can be a maximum of 20 files associated with the tool.
No[]
└─ file_searchobjectNo
└─ vector_store_idsarrayThe vector store attached to this thread. There can be a maximum of 1 vector store attached to the thread.
No
object Enum: ThreadObjectType
ValueDescription
threadThe type of thread object which is always thread

createThreadRequest

NameTypeDescriptionRequiredDefault
messagesarrayA list of messagesto start the thread with.No
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
No
tool_resourcesobjectA set of resources that are made available to the assistant’s tools in this thread. The resources are specific to the type of tool. For example, the code_interpreter tool requires a list of file IDs, while the file_search tool requires a list of vector store IDs.
No
└─ code_interpreterobjectNo
└─ file_idsarrayA list of file IDs made available to the code_interpreter tool. There can be a maximum of 20 files associated with the tool.
No[]
└─ file_searchobjectNo
└─ vector_store_idsarrayThe vector store attached to this thread. There can be a maximum of 1 vector store attached to the thread.
No
└─ vector_storesarrayA helper to create a vector store with file_ids and attach it to this thread. There can be a maximum of 1 vector store attached to the thread.
No

modifyThreadRequest

NameTypeDescriptionRequiredDefault
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
No
tool_resourcesobjectA set of resources that are made available to the assistant’s tools in this thread. The resources are specific to the type of tool. For example, the code_interpreter tool requires a list of file IDs, while the file_search tool requires a list of vector store IDs.
No
└─ code_interpreterobjectNo
└─ file_idsarrayA list of File IDs made available to the code_interpreter tool. There can be a maximum of 20 files associated with the tool.
No[]
└─ file_searchobjectNo
└─ vector_store_idsarrayThe vector store attached to this thread. There can be a maximum of 1 vector store attached to the thread.
No

deleteThreadResponse

NameTypeDescriptionRequiredDefault
deletedbooleanYes
idstringYes
objectstringYes
object Enum: DeleteThreadResponseObjectState
ValueDescription
thread.deletedThe delete thread response object state which is thread.deleted

listThreadsResponse

NameTypeDescriptionRequiredDefault
dataarrayYes
first_idstringYes
has_morebooleanYes
last_idstringYes
objectstringYes

messageObject

Represents a message within a threads.
NameTypeDescriptionRequiredDefault
assistant_idstringIf applicable, the ID of the assistant that authored this message.Yes
attachmentsarrayA list of files attached to the message, and the tools they were added to.Yes
completed_atintegerThe Unix timestamp (in seconds) for when the message was completed.Yes
contentarrayThe content of the message in array of text and/or images.Yes
created_atintegerThe Unix timestamp (in seconds) for when the message was created.Yes
idstringThe identifier, which can be referenced in API endpoints.Yes
incomplete_atintegerThe Unix timestamp (in seconds) for when the message was marked as incomplete.Yes
incomplete_detailsobjectOn an incomplete message, details about why the message is incomplete.Yes
└─ reasonstringThe reason the message is incomplete.No
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
Yes
objectstringThe object type, which is always thread.message.Yes
rolestringThe entity that produced the message. One of user or assistant.Yes
run_idstringIf applicable, the ID of the run associated with the authoring of this message.Yes
statusstringThe status of the message, which can be either in_progress, incomplete, or completed.Yes
thread_idstringThe threads ID that this message belongs to.Yes
object Enum: MessageObjectType
ValueDescription
thread.messageThe message object type which is thread.message
status Enum: MessageObjectStatus
ValueDescription
in_progress
incomplete
completed
role Enum: MessageObjectRole
ValueDescription
user
assistant

messageDeltaObject

Represents a message delta i.e. any changed fields on a message during streaming.
NameTypeDescriptionRequiredDefault
deltaobjectThe delta containing the fields that have changed on the Message.Yes
└─ contentarrayThe content of the message in array of text and/or images.No
└─ rolestringThe entity that produced the message. One of user or assistant.No
idstringThe identifier of the message, which can be referenced in API endpoints.Yes
objectstringThe object type, which is always thread.message.delta.Yes
object Enum: MessageDeltaObjectType
ValueDescription
thread.message.delta

createMessageRequest

NameTypeDescriptionRequiredDefault
attachmentsarrayA list of files attached to the message, and the tools they should be added to.No
contentstringThe content of the message.Yes
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
No
rolestringThe role of the entity that is creating the message. Allowed values include:
- user: Indicates the message is sent by an actual user and should be used in most cases to represent user-generated messages.
- assistant: Indicates the message is generated by the assistant. Use this value to insert messages from the assistant into the conversation.
Yes
role Enum: CreateMessageRequestRole
ValueDescription
user
assistant

modifyMessageRequest

NameTypeDescriptionRequiredDefault
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
No

deleteMessageResponse

NameTypeDescriptionRequiredDefault
deletedbooleanYes
idstringYes
objectstringYes
object Enum: DeleteMessageResponseObject
ValueDescription
thread.message.deletedThe delete message response object state

listMessagesResponse

NameTypeDescriptionRequiredDefault
dataarrayYes
first_idstringYes
has_morebooleanYes
last_idstringYes
objectstringYes

messageContentImageFileObject

References an image File in the content of a message.
NameTypeDescriptionRequiredDefault
image_fileobjectYes
└─ file_idstringThe File ID of the image in the message content.No
typestringAlways image_file.Yes
type Enum: MessageContentImageFileObjectType
ValueDescription
image_fileThe message content image file type

messageContentTextObject

The text content that is part of a message.
NameTypeDescriptionRequiredDefault
textobjectYes
└─ annotationsarrayNo
└─ valuestringThe data that makes up the text.No
typestringAlways text.Yes
type Enum: messageContentTextObjectType
ValueDescription
textThe message content text Object type

messageContentTextAnnotationsFileCitationObject

A citation within the message that points to a specific quote from a specific File associated with the assistant or the message. Generated when the assistant uses the “retrieval” tool to search files.
NameTypeDescriptionRequiredDefault
end_indexintegerYes
file_citationobjectYes
└─ file_idstringThe ID of the specific File the citation is from.No
start_indexintegerYes
textstringThe text in the message content that needs to be replaced.Yes
typestringAlways file_citation.Yes
type Enum: FileCitationObjectType
ValueDescription
file_citationThe file citation object type

messageContentTextAnnotationsFilePathObject

A URL for the file that’s generated when the assistant used the code_interpreter tool to generate a file.
NameTypeDescriptionRequiredDefault
end_indexintegerYes
file_pathobjectYes
└─ file_idstringThe ID of the file that was generated.No
start_indexintegerYes
textstringThe text in the message content that needs to be replaced.Yes
typestringAlways file_path.Yes
type Enum: FilePathObjectType
ValueDescription
file_pathThe file path object type

messageDeltaContentImageFileObject

References an image File in the content of a message.
NameTypeDescriptionRequiredDefault
image_fileobjectNo
└─ file_idstringThe File ID of the image in the message content.No
indexintegerThe index of the content part in the message.Yes
typestringAlways image_file.Yes
type Enum: MessageDeltaContentImageFileObjectType
ValueDescription
image_file

messageDeltaContentTextObject

The text content that is part of a message.
NameTypeDescriptionRequiredDefault
indexintegerThe index of the content part in the message.Yes
textobjectNo
└─ annotationsarrayNo
└─ valuestringThe data that makes up the text.No
typestringAlways text.Yes
type Enum: MessageDeltaContentTextObjectType
ValueDescription
text

messageDeltaContentTextAnnotationsFileCitationObject

A citation within the message that points to a specific quote from a specific File associated with the assistant or the message. Generated when the assistant uses the “file_search” tool to search files.
NameTypeDescriptionRequiredDefault
end_indexintegerNo
file_citationobjectNo
└─ file_idstringThe ID of the specific File the citation is from.No
└─ quotestringThe specific quote in the file.No
indexintegerThe index of the annotation in the text content part.Yes
start_indexintegerNo
textstringThe text in the message content that needs to be replaced.No
typestringAlways file_citation.Yes
type Enum: MessageDeltaContentTextAnnotationsFileCitationObjectType
ValueDescription
file_citation

messageDeltaContentTextAnnotationsFilePathObject

A URL for the file that’s generated when the assistant used the code_interpreter tool to generate a file.
NameTypeDescriptionRequiredDefault
end_indexintegerNo
file_pathobjectNo
└─ file_idstringThe ID of the file that was generated.No
indexintegerThe index of the annotation in the text content part.Yes
start_indexintegerNo
textstringThe text in the message content that needs to be replaced.No
typestringAlways file_path.Yes
type Enum: MessageDeltaContentTextAnnotationsFilePathObjectType
ValueDescription
file_path

runStepObject

Represents a step in execution of a run.
NameTypeDescriptionRequiredDefault
assistant_idstringThe ID of the assistant associated with the run step.Yes
cancelled_atintegerThe Unix timestamp (in seconds) for when the run step was cancelled.Yes
completed_atintegerThe Unix timestamp (in seconds) for when the run step completed.Yes
created_atintegerThe Unix timestamp (in seconds) for when the run step was created.Yes
expired_atintegerThe Unix timestamp (in seconds) for when the run step expired. A step is considered expired if the parent run is expired.Yes
failed_atintegerThe Unix timestamp (in seconds) for when the run step failed.Yes
idstringThe identifier of the run step, which can be referenced in API endpoints.Yes
last_errorobjectThe last error associated with this run step. Will be null if there are no errors.Yes
└─ codestringOne of server_error or rate_limit_exceeded.No
└─ messagestringA human-readable description of the error.No
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
Yes
objectstringThe object type, which is always assistant.run.step.Yes
run_idstringThe ID of the run that this run step is a part of.Yes
statusstringThe status of the run, which can be either in_progress, cancelled, failed, completed, or expired.Yes
step_detailsrunStepDetailsMessageCreationObject or runStepDetailsToolCallsObjectThe details of the run step.Yes
thread_idstringThe ID of the threads that was run.Yes
typestringThe type of run step, which can be either message_creation or tool_calls.Yes
object Enum: RunStepObjectType
ValueDescription
assistant.run.stepThe object type, which is always assistant.run.step
type Enum: RunStepObjectType
ValueDescription
message_creationThe message_creation run step
tool_callsThe tool_calls run step
status Enum: RunStepObjectStatus
ValueDescription
in_progressThe in_progress run status
cancelledThe cancelled run status
failedThe cancelled run status
completedThe cancelled run status
expiredThe cancelled run status

runStepDeltaObject

Represents a run step delta i.e. any changed fields on a run step during streaming.
NameTypeDescriptionRequiredDefault
deltaobjectThe delta containing the fields that have changed on the run step.Yes
└─ step_detailsrunStepDeltaStepDetailsMessageCreationObject or runStepDeltaStepDetailsToolCallsObjectThe details of the run step.No
idstringThe identifier of the run step, which can be referenced in API endpoints.Yes
objectstringThe object type, which is always thread.run.step.delta.Yes
object Enum: RunStepDeltaObjectType
ValueDescription
thread.run.step.delta

listRunStepsResponse

NameTypeDescriptionRequiredDefault
dataarrayYes
first_idstringYes
has_morebooleanYes
last_idstringYes
objectstringYes

runStepDetailsMessageCreationObject

Details of the message creation by the run step.
NameTypeDescriptionRequiredDefault
message_creationobjectYes
└─ message_idstringThe ID of the message that was created by this run step.No
typestringAlways message_creation.Yes
type Enum: RunStepDetailsMessageCreationObjectType
ValueDescription
message_creation

runStepDeltaStepDetailsMessageCreationObject

Details of the message creation by the run step.
NameTypeDescriptionRequiredDefault
message_creationobjectNo
└─ message_idstringThe ID of the message that was created by this run step.No
typestringAlways message_creation.Yes
type Enum: RunStepDeltaStepDetailsMessageCreationObjectType
ValueDescription
message_creation

runStepDetailsToolCallsObject

Details of the tool call.
NameTypeDescriptionRequiredDefault
tool_callsarrayAn array of tool calls the run step was involved in. These can be associated with one of three types of tools: code_interpreter, retrieval or function.
Yes
typestringAlways tool_calls.Yes
type Enum: RunStepDetailsToolCallsObjectType
ValueDescription
tool_calls

runStepDeltaStepDetailsToolCallsObject

Details of the tool call.
NameTypeDescriptionRequiredDefault
tool_callsarrayAn array of tool calls the run step was involved in. These can be associated with one of three types of tools: code_interpreter, file_search or function.
No
typestringAlways tool_calls.Yes
type Enum: RunStepDeltaStepDetailsToolCallsObjectType
ValueDescription
tool_calls

runStepDetailsToolCallsCodeObject

Details of the Code Interpreter tool call the run step was involved in.
NameTypeDescriptionRequiredDefault
code_interpreterobjectThe Code Interpreter tool call definition.Yes
└─ inputstringThe input to the Code Interpreter tool call.No
└─ outputsarrayThe outputs from the Code Interpreter tool call. Code Interpreter can output one or more items, including text (logs) or images (image). Each of these are represented by a different object type.No
idstringThe ID of the tool call.Yes
typestringThe type of tool call. This is always going to be code_interpreter for this type of tool call.Yes
type Enum: RunStepDetailsToolCallsCodeObjectType
ValueDescription
code_interpreter

runStepDeltaStepDetailsToolCallsCodeObject

Details of the Code Interpreter tool call the run step was involved in.
NameTypeDescriptionRequiredDefault
code_interpreterobjectThe Code Interpreter tool call definition.No
└─ inputstringThe input to the Code Interpreter tool call.No
└─ outputsarrayThe outputs from the Code Interpreter tool call. Code Interpreter can output one or more items, including text (logs) or images (image). Each of these are represented by a different object type.No
idstringThe ID of the tool call.No
indexintegerThe index of the tool call in the tool calls array.Yes
typestringThe type of tool call. This is always going to be code_interpreter for this type of tool call.Yes
type Enum: RunStepDeltaStepDetailsToolCallsCodeObjectType
ValueDescription
code_interpreter

runStepDetailsToolCallsCodeOutputLogsObject

Text output from the Code Interpreter tool call as part of a run step.
NameTypeDescriptionRequiredDefault
logsstringThe text output from the Code Interpreter tool call.Yes
typestringAlways logs.Yes
type Enum: RunStepDetailsToolCallsCodeOutputLogsObjectType
ValueDescription
logs

runStepDeltaStepDetailsToolCallsCodeOutputLogsObject

Text output from the Code Interpreter tool call as part of a run step.
NameTypeDescriptionRequiredDefault
indexintegerThe index of the output in the outputs array.Yes
logsstringThe text output from the Code Interpreter tool call.No
typestringAlways logs.Yes
type Enum: RunStepDeltaStepDetailsToolCallsCodeOutputLogsObjectType
ValueDescription
logs

runStepDetailsToolCallsCodeOutputImageObject

NameTypeDescriptionRequiredDefault
imageobjectYes
└─ file_idstringThe File ID of the image.No
typestringAlways image.Yes
type Enum: RunStepDetailsToolCallsCodeOutputImageObjectType
ValueDescription
image

runStepDeltaStepDetailsToolCallsCodeOutputImageObject

NameTypeDescriptionRequiredDefault
imageobjectNo
└─ file_idstringThe file ID of the image.No
indexintegerThe index of the output in the outputs array.Yes
typestringAlways image.Yes
type Enum: RunStepDeltaStepDetailsToolCallsCodeOutputImageObject
ValueDescription
image

runStepDetailsToolCallsFileSearchObject

NameTypeDescriptionRequiredDefault
file_searchobjectFor now, this is always going to be an empty object.Yes
└─ resultsarrayThe results of the file search.No
idstringThe ID of the tool call object.Yes
typestringThe type of tool call. This is always going to be file_search for this type of tool call.Yes
type Enum: RunStepDetailsToolCallsFileSearchObjectType
ValueDescription
file_search

runStepDetailsToolCallsFileSearchResultObject

A result instance of the file search.
NameTypeDescriptionRequiredDefault
contentarrayThe content of the result that was found. The content is only included if requested via the include query parameter.No
file_idstringThe ID of the file that result was found in.Yes
file_namestringThe name of the file that result was found in.Yes
scorenumberThe score of the result. All values must be a floating point number between 0 and 1.Yes

runStepDeltaStepDetailsToolCallsFileSearchObject

NameTypeDescriptionRequiredDefault
file_searchobjectFor now, this is always going to be an empty object.Yes
idstringThe ID of the tool call object.No
indexintegerThe index of the tool call in the tool calls array.Yes
typestringThe type of tool call. This is always going to be retrieval for this type of tool call.Yes
type Enum: RunStepDeltaStepDetailsToolCallsFileSearchObjectType
ValueDescription
file_search

runStepDetailsToolCallsFunctionObject

NameTypeDescriptionRequiredDefault
functionobjectThe definition of the function that was called.Yes
└─ argumentsstringThe arguments passed to the function.No
└─ namestringThe name of the function.No
└─ outputstringThe output of the function. This will be null if the outputs have not been submitted yet.No
idstringThe ID of the tool call object.Yes
typestringThe type of tool call. This is always going to be function for this type of tool call.Yes
type Enum: RunStepDetailsToolCallsFunctionObjectType
ValueDescription
function

runStepDeltaStepDetailsToolCallsFunctionObject

NameTypeDescriptionRequiredDefault
functionobjectThe definition of the function that was called.No
└─ argumentsstringThe arguments passed to the function.No
└─ namestringThe name of the function.No
└─ outputstringThe output of the function. This will be null if the outputs have not been submitted yet.No
idstringThe ID of the tool call object.No
indexintegerThe index of the tool call in the tool calls array.Yes
typestringThe type of tool call. This is always going to be function for this type of tool call.Yes
type Enum: RunStepDetailsToolCallsFunctionObjectType
ValueDescription
function

vectorStoreExpirationAfter

The expiration policy for a vector store.
NameTypeDescriptionRequiredDefault
anchorstringAnchor timestamp after which the expiration policy applies. Supported anchors: last_active_at.Yes
daysintegerThe number of days after the anchor time that the vector store will expire.Yes
anchor Enum: VectorStoreExpirationAfterAnchor
ValueDescription
last_active_atThe anchor timestamp after which the expiration policy applies.

vectorStoreObject

A vector store is a collection of processed files can be used by the file_search tool.
NameTypeDescriptionRequiredDefault
created_atintegerThe Unix timestamp (in seconds) for when the vector store was created.Yes
expires_aftervectorStoreExpirationAfterThe expiration policy for a vector store.No
expires_atintegerThe Unix timestamp (in seconds) for when the vector store will expire.No
file_countsobjectYes
└─ cancelledintegerThe number of files that were cancelled.No
└─ completedintegerThe number of files that have been successfully processed.No
└─ failedintegerThe number of files that have failed to process.No
└─ in_progressintegerThe number of files that are currently being processed.No
└─ totalintegerThe total number of files.No
idstringThe identifier, which can be referenced in API endpoints.Yes
last_active_atintegerThe Unix timestamp (in seconds) for when the vector store was last active.Yes
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
Yes
namestringThe name of the vector store.Yes
objectenumThe object type, which is always vector_store.
Possible values: vector_store
Yes
statusstringThe status of the vector store, which can be either expired, in_progress, or completed. A status of completed indicates that the vector store is ready for use.Yes
usage_bytesintegerThe total number of bytes used by the files in the vector store.Yes
status Enum: VectorStoreObjectStatus
ValueDescription
expired
in_progress
completed

createVectorStoreRequest

NameTypeDescriptionRequiredDefault
chunking_strategyautoChunkingStrategyRequestParam or staticChunkingStrategyRequestParamThe chunking strategy used to chunk the file(s). If not set, will use the auto strategy. Only applicable if file_ids is non-empty.No
expires_aftervectorStoreExpirationAfterThe expiration policy for a vector store.No
file_idsarrayA list of file IDs that the vector store should use. Useful for tools like file_search that can access files.No
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
No
namestringThe name of the vector store.No

updateVectorStoreRequest

NameTypeDescriptionRequiredDefault
expires_aftervectorStoreExpirationAfterThe expiration policy for a vector store.No
metadataobjectSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
No
namestringThe name of the vector store.No

listVectorStoresResponse

NameTypeDescriptionRequiredDefault
dataarrayYes
first_idstringYes
has_morebooleanYes
last_idstringYes
objectstringYes

deleteVectorStoreResponse

NameTypeDescriptionRequiredDefault
deletedbooleanYes
idstringYes
objectstringYes
object Enum: DeleteVectorStoreResponseObject
ValueDescription
vector_store.deletedThe delete vector store response object state

vectorStoreFileObject

A list of files attached to a vector store.
NameTypeDescriptionRequiredDefault
chunking_strategyautoChunkingStrategyRequestParam or staticChunkingStrategyRequestParamThe chunking strategy used to chunk the file(s). If not set, will use the auto strategy. Only applicable if file_ids is non-empty.No
created_atintegerThe Unix timestamp (in seconds) for when the vector store file was created.Yes
idstringThe identifier, which can be referenced in API endpoints.Yes
last_errorobjectThe last error associated with this vector store file. Will be null if there are no errors.Yes
└─ codestringOne of server_error or invalid_file or unsupported_file.No
└─ messagestringA human-readable description of the error.No
objectstringThe object type, which is always vector_store.file.Yes
statusstringThe status of the vector store file, which can be either in_progress, completed, cancelled, or failed. The status completed indicates that the vector store file is ready for use.Yes
usage_bytesintegerThe total vector store usage in bytes. Note that this may be different from the original file size.Yes
vector_store_idstringThe ID of the vector store that the file is attached to.Yes
object Enum: VectorStoreFileObjectType
ValueDescription
vector_store.file
status Enum: VectorStoreFileObjectStatus
ValueDescription
in_progress
completed
cancelled
failed

otherChunkingStrategyResponseParam

This is returned when the chunking strategy is unknown. Typically, this is because the file was indexed before the chunking_strategy concept was introduced in the API.
NameTypeDescriptionRequiredDefault
typestringAlways other.Yes
type Enum: OtherChunkingStrategyResponseParamType
ValueDescription
other

staticChunkingStrategyResponseParam

NameTypeDescriptionRequiredDefault
staticstaticChunkingStrategyYes
typestringAlways static.Yes
type Enum: StaticChunkingStrategyResponseParamType
ValueDescription
static

staticChunkingStrategy

NameTypeDescriptionRequiredDefault
chunk_overlap_tokensintegerThe number of tokens that overlap between chunks. The default value is 400.

Note that the overlap must not exceed half of max_chunk_size_tokens.
Yes
max_chunk_size_tokensintegerThe maximum number of tokens in each chunk. The default value is 800. The minimum value is 100 and the maximum value is 4096.Yes

autoChunkingStrategyRequestParam

The default strategy. This strategy currently uses a max_chunk_size_tokens of 800 and chunk_overlap_tokens of 400.
NameTypeDescriptionRequiredDefault
typeenumAlways auto.
Possible values: auto
Yes

staticChunkingStrategyRequestParam

NameTypeDescriptionRequiredDefault
staticstaticChunkingStrategyYes
typeenumAlways static.
Possible values: static
Yes

chunkingStrategyRequestParam

The chunking strategy used to chunk the file(s). If not set, will use the auto strategy. This component can be one of the following:

createVectorStoreFileRequest

NameTypeDescriptionRequiredDefault
chunking_strategychunkingStrategyRequestParamThe chunking strategy used to chunk the file(s). If not set, will use the auto strategy.No
file_idstringA File ID that the vector store should use. Useful for tools like file_search that can access files.Yes

listVectorStoreFilesResponse

NameTypeDescriptionRequiredDefault
dataarrayYes
first_idstringYes
has_morebooleanYes
last_idstringYes
objectstringYes

deleteVectorStoreFileResponse

NameTypeDescriptionRequiredDefault
deletedbooleanYes
idstringYes
objectstringYes
object Enum: DeleteVectorStoreFileResponseObject
ValueDescription
vector_store.file.deleted

vectorStoreFileBatchObject

A batch of files attached to a vector store.
NameTypeDescriptionRequiredDefault
created_atintegerThe Unix timestamp (in seconds) for when the vector store files batch was created.Yes
file_countsobjectYes
└─ cancelledintegerThe number of files that were cancelled.No
└─ completedintegerThe number of files that have been processed.No
└─ failedintegerThe number of files that have failed to process.No
└─ in_progressintegerThe number of files that are currently being processed.No
└─ totalintegerThe total number of files.No
idstringThe identifier, which can be referenced in API endpoints.Yes
objectstringThe object type, which is always vector_store.file_batch.Yes
statusstringThe status of the vector store files batch, which can be either in_progress, completed, cancelled or failed.Yes
vector_store_idstringThe ID of the vector store that the File is attached to.Yes
object Enum: VectorStoreFileBatchObjectType
ValueDescription
vector_store.files_batch
status Enum: VectorStoreFileBatchObjectStatus
ValueDescription
in_progress
completed
cancelled
failed

createVectorStoreFileBatchRequest

NameTypeDescriptionRequiredDefault
chunking_strategychunkingStrategyRequestParamThe chunking strategy used to chunk the file(s). If not set, will use the auto strategy.No
file_idsarrayA list of File IDs that the vector store should use. Useful for tools like file_search that can access files.Yes

assistantStreamEvent

Represents an event emitted when streaming a Run. Each event in a server-sent events stream has an event and data property:
event: thread.created
data: {"id": "thread_123", "object": "thread", ...}
We emit events whenever a new object is created, transitions to a new state, or is being streamed in parts (deltas). For example, we emit thread.run.created when a new run is created, thread.run.completed when a run completes, and so on. When an Assistant chooses to create a message during a run, we emit a thread.message.created event, a thread.message.in_progress event, many thread.message.delta events, and finally a thread.message.completed event. We may add additional events over time, so we recommend handling unknown events gracefully in your code. This component can be one of the following:

threadStreamEvent

This component can be one of the following:

thread.created

Occurs when a new thread is created.
NameTypeDescriptionRequiredDefault
datathreadObjectRepresents a thread that contains messages.Yes
eventstringYes
Data: threadObject Event Enum: ThreadStreamEventEnum
ValueDescription
thread.createdThe thread created event

runStreamEvent

This component can be one of the following:

thread.run.created

Occurs when a new run is created.
NameTypeDescriptionRequiredDefault
datarunObjectRepresents an execution run on a threads.Yes
eventstringYes
Data: runObject Event Enum: RunStreamEventCreated
ValueDescription
thread.run.created

thread.run.queued

Occurs when a run moves to a queued status.
NameTypeDescriptionRequiredDefault
datarunObjectRepresents an execution run on a threads.Yes
eventstringYes
Data: runObject Event Enum: RunStreamEventQueued
ValueDescription
thread.run.queued

thread.run.in_progress

Occurs when a run moves to an in_progress status.
NameTypeDescriptionRequiredDefault
datarunObjectRepresents an execution run on a threads.Yes
eventstringYes
Data: runObject Event Enum: RunStreamEventInProgress
ValueDescription
thread.run.in_progress

thread.run.requires_action

Occurs when a run moves to a requires_action status.
NameTypeDescriptionRequiredDefault
datarunObjectRepresents an execution run on a threads.Yes
eventstringYes
Data: runObject Event Enum: RunStreamEventRequiresAction
ValueDescription
thread.run.requires_action

thread.run.completed

Occurs when a run is completed.
NameTypeDescriptionRequiredDefault
datarunObjectRepresents an execution run on a threads.Yes
eventstringYes
Data: runObject Event Enum: RunStreamEventCompleted
ValueDescription
thread.run.completed

thread.run.failed

Occurs when a run fails.
NameTypeDescriptionRequiredDefault
datarunObjectRepresents an execution run on a threads.Yes
eventstringYes
Data: runObject Event Enum: RunStreamEventFailed
ValueDescription
thread.run.failed

thread.run.cancelling

Occurs when a run moves to a cancelling status.
NameTypeDescriptionRequiredDefault
datarunObjectRepresents an execution run on a threads.Yes
eventstringYes
Data: runObject Event Enum: RunStreamEventCancelling
ValueDescription
thread.run.cancelling

thread.run.cancelled

Occurs when a run is cancelled.
NameTypeDescriptionRequiredDefault
datarunObjectRepresents an execution run on a threads.Yes
eventstringYes
Data: runObject Event Enum: RunStreamEventCancelled
ValueDescription
thread.run.cancelled

thread.run.expired

Occurs when a run expires.
NameTypeDescriptionRequiredDefault
datarunObjectRepresents an execution run on a threads.Yes
eventstringYes
Data: runObject Event Enum: RunStreamEventExpired
ValueDescription
thread.run.expired

runStepStreamEvent

This component can be one of the following:

thread.run.step.created

Occurs when a run step is created.
NameTypeDescriptionRequiredDefault
datarunStepObjectRepresents a step in execution of a run.
Yes
eventstringYes
Data: runStepObject Event Enum: RunStepStreamEventCreated
ValueDescription
thread.run.step.created

thread.run.step.in_progress

Occurs when a run step moves to an in_progress state.
NameTypeDescriptionRequiredDefault
datarunStepObjectRepresents a step in execution of a run.
Yes
eventstringYes
Data: runStepObject Event Enum: RunStepStreamEventInProgress
ValueDescription
thread.run.step.in_progress

thread.run.step.delta

Occurs when parts of a run step are being streamed.
NameTypeDescriptionRequiredDefault
datarunStepDeltaObjectRepresents a run step delta i.e. any changed fields on a run step during streaming.
Yes
eventstringYes
Data: runStepDeltaObject Event Enum: RunStepStreamEventDelta
ValueDescription
thread.run.step.delta

thread.run.step.completed

Occurs when a run step is completed.
NameTypeDescriptionRequiredDefault
datarunStepObjectRepresents a step in execution of a run.
Yes
eventstringYes
Data: runStepObject Event Enum: RunStepStreamEventCompleted
ValueDescription
thread.run.step.completed

thread.run.step.failed

Occurs when a run step fails.
NameTypeDescriptionRequiredDefault
datarunStepObjectRepresents a step in execution of a run.
Yes
eventstringYes
Data: runStepObject Event Enum: RunStepStreamEventFailed
ValueDescription
thread.run.step.failed

thread.run.step.cancelled

Occurs when a run step is cancelled.
NameTypeDescriptionRequiredDefault
datarunStepObjectRepresents a step in execution of a run.
Yes
eventstringYes
Data: runStepObject Event Enum: RunStepStreamEventCancelled
ValueDescription
thread.run.step.cancelled

thread.run.step.expired

Occurs when a run step expires.
NameTypeDescriptionRequiredDefault
datarunStepObjectRepresents a step in execution of a run.
Yes
eventstringYes
Data: runStepObject Event Enum: RunStepStreamEventExpired
ValueDescription
thread.run.step.expired

messageStreamEvent

This component can be one of the following:

thread.message.created

Occurs when a message is created.
NameTypeDescriptionRequiredDefault
datamessageObjectRepresents a message within a threads.Yes
eventstringYes
Data: messageObject Event Enum: MessageStreamEventCreated
ValueDescription
thread.message.created

thread.message.in_progress

Occurs when a message moves to an in_progress state.
NameTypeDescriptionRequiredDefault
datamessageObjectRepresents a message within a threads.Yes
eventstringYes
Data: messageObject Event Enum: MessageStreamEventInProgress
ValueDescription
thread.message.in_progress

thread.message.delta

Occurs when parts of a message are being streamed.
NameTypeDescriptionRequiredDefault
datamessageDeltaObjectRepresents a message delta i.e. any changed fields on a message during streaming.
Yes
eventstringYes
Data: messageDeltaObject Event Enum: MessageStreamEventDelta
ValueDescription
thread.message.delta

thread.message.completed

Occurs when a message is completed.
NameTypeDescriptionRequiredDefault
datamessageObjectRepresents a message within a threads.Yes
eventstringYes
Data: messageObject Event Enum: MessageStreamEventCompleted
ValueDescription
thread.message.completed

thread.message.incomplete

Occurs when a message ends before it is completed.
NameTypeDescriptionRequiredDefault
datamessageObjectRepresents a message within a threads.Yes
eventstringYes
Data: messageObject Event Enum: MessageStreamEventIncomplete
ValueDescription
thread.message.incomplete

Annotation

This component can be one of the following:

Click

A click action.
NameTypeDescriptionRequiredDefault
buttonenumIndicates which mouse button was pressed during the click. One of left, right, wheel, back, or forward.

Possible values: left, right, wheel, back, forward
Yes
typeenumSpecifies the event type. For a click action, this property is always set to click.

Possible values: click
Yes
xintegerThe x-coordinate where the click occurred.
Yes
yintegerThe y-coordinate where the click occurred.
Yes

CodeInterpreterFileOutput

The output of a code interpreter tool call that is a file.
NameTypeDescriptionRequiredDefault
filesarrayYes
typeenumThe type of the code interpreter file output. Always files.

Possible values: files
Yes

CodeInterpreterTextOutput

The output of a code interpreter tool call that is text.
NameTypeDescriptionRequiredDefault
logsstringThe logs of the code interpreter tool call.
Yes
typeenumThe type of the code interpreter text output. Always logs.

Possible values: logs
Yes

CodeInterpreterTool

A tool that runs code.
NameTypeDescriptionRequiredDefault
file_idsarrayThe IDs of the files to run the code on.
Yes
typeenumThe type of the code interpreter tool. Always code_interpreter.

Possible values: code_interpreter
Yes

CodeInterpreterToolCall

A tool call to run code.
NameTypeDescriptionRequiredDefault
codestringThe code to run.
Yes
idstringThe unique ID of the code interpreter tool call.
Yes
resultsarrayThe results of the code interpreter tool call.
Yes
statusenumThe status of the code interpreter tool call.

Possible values: in_progress, interpreting, completed
Yes
typeenumThe type of the code interpreter tool call. Always code_interpreter_call.

Possible values: code_interpreter_call
Yes

CodeInterpreterToolOutput

This component can be one of the following:

ComparisonFilter

A filter used to compare a specified attribute key to a given value using a defined comparison operation.
NameTypeDescriptionRequiredDefault
keystringThe key to compare against the value.Yes
typeenumSpecifies the comparison operator: eq, ne, gt, gte, lt, lte.
- eq: equals
- ne: not equal
- gt: greater than
- gte: greater than or equal
- lt: less than
- lte: less than or equal

Possible values: eq, ne, gt, gte, lt, lte
Yes
valuestring or number or booleanThe value to compare against the attribute key; supports string, number, or boolean types.Yes

CompoundFilter

Combine multiple filters using and or or.
NameTypeDescriptionRequiredDefault
filtersarrayArray of filters to combine. Items can be ComparisonFilter or CompoundFilter.Yes
typeenumType of operation: and or or.
Possible values: and, or
Yes

ComputerAction

This component can be one of the following:

ComputerScreenshotImage

A computer screenshot image used with the computer use tool.
NameTypeDescriptionRequiredDefault
file_idstringThe identifier of an uploaded file that contains the screenshot.No
image_urlstringThe URL of the screenshot image.No
typeenumSpecifies the event type. For a computer screenshot, this property is always set to computer_screenshot.

Possible values: computer_screenshot
Yes

ComputerTool

A tool that controls a virtual computer.
NameTypeDescriptionRequiredDefault
display_heightnumberThe height of the computer display.
Yes
display_widthnumberThe width of the computer display.
Yes
environmentenumThe type of computer environment to control.

Possible values: mac, windows, ubuntu, browser
Yes
typeenumThe type of the computer use tool. Always computer_use_preview.

Possible values: computer-use-preview
Yes

ComputerToolCall

A tool call to a computer use tool.
NameTypeDescriptionRequiredDefault
actionComputerActionYes
call_idstringAn identifier used when responding to the tool call with output.
Yes
idstringThe unique ID of the computer call.Yes
pending_safety_checksarrayThe pending safety checks for the computer call.
Yes
statusenumThe status of the item. One of in_progress, completed, or incomplete. Populated when items are returned via API.

Possible values: in_progress, completed, incomplete
Yes
typeenumThe type of the computer call. Always computer_call.
Possible values: computer_call
Yes

ComputerToolCallOutput

The output of a computer tool call.
NameTypeDescriptionRequiredDefault
acknowledged_safety_checksarrayThe safety checks reported by the API that have been acknowledged by the developer.
No
call_idstringThe ID of the computer tool call that produced the output.
Yes
idstringThe ID of the computer tool call output.
No
outputComputerScreenshotImageA computer screenshot image used with the computer use tool.
Yes
statusenumThe status of the message input. One of in_progress, completed, or incomplete. Populated when input items are returned via API.

Possible values: in_progress, completed, incomplete
No
typeenumThe type of the computer tool call output. Always computer_call_output.

Possible values: computer_call_output
Yes

ComputerToolCallOutputResource

NameTypeDescriptionRequiredDefault
acknowledged_safety_checksarrayThe safety checks reported by the API that have been acknowledged by the developer.
No
call_idstringThe ID of the computer tool call that produced the output.
Yes
idstringThe unique ID of the computer call tool output.
Yes
outputComputerScreenshotImageA computer screenshot image used with the computer use tool.
Yes
statusenumThe status of the message input. One of in_progress, completed, or incomplete. Populated when input items are returned via API.

Possible values: in_progress, completed, incomplete
No
typeenumThe type of the computer tool call output. Always computer_call_output.

Possible values: computer_call_output
Yes

ComputerToolCallSafetyCheck

A pending safety check for the computer call.
NameTypeDescriptionRequiredDefault
codestringThe type of the pending safety check.Yes
idstringThe ID of the pending safety check.Yes
messagestringDetails about the pending safety check.Yes

Content

Multi-modal input and output contents. This component can be one of the following:

Coordinate

An x/y coordinate pair, e.g. { x: 100, y: 200 }.
NameTypeDescriptionRequiredDefault
xintegerThe x-coordinate.
Yes
yintegerThe y-coordinate.
Yes

CreateModelResponseProperties

NameTypeDescriptionRequiredDefault
metadataMetadataSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.

Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
No
modelstringModel used to generate the responses.No
temperaturenumberWhat sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
We generally recommend altering this or top_p but not both.
No1
top_pnumberAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.
No1
userstringA unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse. .
No

createResponse

NameTypeDescriptionRequiredDefault
includearray{“$ref”: ”#/components/schemas/includable/description”}No
inputstring or arrayText, image, or file inputs to the model, used to generate a response.Yes
instructionsstringInserts a system (or developer) message as the first item in the model’s context.

When using along with previous_response_id, the instructions from a previous response will be not be carried over to the next response. This makes it simple to swap out system (or developer) messages in new responses.
No
max_output_tokensintegerAn upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.
No
parallel_tool_callsbooleanWhether to allow the model to run tool calls in parallel.
NoTrue
previous_response_idstringThe unique ID of the previous response to the model. Use this to create multi-turn conversations.No
reasoningReasoningConfiguration options for reasoning models.No
storebooleanWhether to store the generated model response for later retrieval via API.
NoTrue
streambooleanIf set to true, the model response data will be streamed to the client as it is generated using server-sent events.NoFalse
textobjectConfiguration options for a text response from the model. Can be plain text or structured JSON data. Learn more:
- text inputs and outputs
- Structured Outputs
No
└─ formatTextResponseFormatConfigurationAn object specifying the format that the model must output.

Configuring { "type": "json_schema" } enables Structured Outputs, which ensures the model matches your supplied JSON schema. The default format is { "type": "text" } with no additional options.

Not recommended for gpt-4o and newer models:

Setting to { "type": "json_object" } enables the older JSON mode, which ensures the message the model generates is valid JSON. Using json_schema is preferred for models that support it.
No
tool_choiceToolChoiceOptions or ToolChoiceTypes or ToolChoiceFunctionHow the model should select which tool (or tools) to use when generating a response. See the tools parameter to see how to specify which tools the model can call.
No
toolsarrayAn array of tools the model may call while generating a response. You can specify which tool to use by setting the tool_choice parameter.

The two categories of tools you can provide the model are:

- Built-in tools
No
truncationenumThe truncation strategy to use for the model response.
- auto: If the context of this response and previous ones exceeds the model’s context window size, the model will truncate the
response to fit the context window by dropping input items in the middle of the conversation.
- disabled (default): If a model response will exceed the context window size for a model, the request will fail with a 400 error.

Possible values: auto, disabled
No

DoubleClick

A double click action.
NameTypeDescriptionRequiredDefault
typeenumSpecifies the event type. For a double click action, this property is always set to double_click.

Possible values: double_click
Yes
xintegerThe x-coordinate where the double click occurred.
Yes
yintegerThe y-coordinate where the double click occurred.
Yes

Drag

A drag action.
NameTypeDescriptionRequiredDefault
patharrayAn array of coordinates representing the path of the drag action. Coordinates will appear as an array of objects, eg[{ x: 100, y: 200 }, { x: 200, y: 300 }]Yes
typeenumSpecifies the event type. For a drag action, this property is always set to drag.

Possible values: drag
Yes

EasyInputMessage

A message input to the model with a role indicating instruction following hierarchy. Instructions given with the developer or system role take precedence over instructions given with the user role. Messages with the assistant role are presumed to have been generated by the model in previous interactions.
NameTypeDescriptionRequiredDefault
contentstring or InputMessageContentListText, image, or audio input to the model, used to generate a response.
Can also contain previous assistant responses.
Yes
roleenumThe role of the message input. One of user, assistant, system, or developer.

Possible values: user, assistant, system, developer
Yes
typeenumThe type of the message input. Always message.

Possible values: message
No

FileCitation

A citation to a file.
NameTypeDescriptionRequiredDefault
file_idstringThe ID of the file.
Yes
indexintegerThe index of the file in the list of files.
Yes
typeenumThe type of the file citation. Always file_citation.

Possible values: file_citation
Yes

FilePath

A path to a file.
NameTypeDescriptionRequiredDefault
file_idstringThe ID of the file.
Yes
indexintegerThe index of the file in the list of files.
Yes
typeenumThe type of the file path. Always file_path.

Possible values: file_path
Yes

FileSearchRanker

The ranker to use for the file search. If not specified will use the auto ranker.
PropertyValue
DescriptionThe ranker to use for the file search. If not specified will use the auto ranker.
Typestring
Valuesauto
default_2024_08_21

FileSearchTool

A tool that searches for relevant content from uploaded files.
NameTypeDescriptionRequiredDefault
filtersComparisonFilter or CompoundFilterA filter to apply based on file attributes.No
max_num_resultsintegerThe maximum number of results to return. This number should be between 1 and 50 inclusive.
No
ranking_optionsobjectRanking options for search.No
└─ rankerenumThe ranker to use for the file search.
Possible values: auto, default-2024-11-15
No
└─ score_thresholdnumberThe score threshold for the file search, a number between 0 and 1.
Numbers closer to 1 will attempt to return only the most relevant results, but may return fewer results.
No0
typeenumThe type of the file search tool. Always file_search.

Possible values: file_search
Yes
vector_store_idsarrayThe IDs of the vector stores to search.
Yes

FileSearchToolCall

The results of a file search tool call.
NameTypeDescriptionRequiredDefault
idstringThe unique ID of the file search tool call.
Yes
queriesarrayThe queries used to search for files.
Yes
resultsarrayThe results of the file search tool call.
No
statusenumThe status of the file search tool call. One of in_progress, searching, incomplete or failed,

Possible values: in_progress, searching, completed, incomplete, failed
Yes
typeenumThe type of the file search tool call. Always file_search_call.

Possible values: file_search_call
Yes

FunctionTool

Defines a function in your own code the model can choose to call.
NameTypeDescriptionRequiredDefault
descriptionstringA description of the function. Used by the model to determine whether or not to call the function.
No
namestringThe name of the function to call.
Yes
parametersobjectA JSON schema object describing the parameters of the function.
Yes
strictbooleanWhether to enforce strict parameter validation. Default true.
Yes
typeenumThe type of the function tool. Always function.

Possible values: function
Yes

FunctionToolCall

A tool call to run a function.
NameTypeDescriptionRequiredDefault
argumentsstringA JSON string of the arguments to pass to the function.
Yes
call_idstringThe unique ID of the function tool call generated by the model.
Yes
idstringThe unique ID of the function tool call.
Yes
namestringThe name of the function to run.
Yes
statusenumThe status of the item. One of in_progress, completed, or incomplete. Populated when items are returned via API.

Possible values: in_progress, completed, incomplete
No
typeenumThe type of the function tool call. Always function_call.

Possible values: function_call
Yes

FunctionToolCallOutput

The output of a function tool call.
NameTypeDescriptionRequiredDefault
call_idstringThe unique ID of the function tool call generated by the model.
Yes
idstringThe unique ID of the function tool call output. Populated when this item is returned via API.
No
outputstringA JSON string of the output of the function tool call.
Yes
statusenumThe status of the item. One of in_progress, completed, or incomplete. Populated when items are returned via API.

Possible values: in_progress, completed, incomplete
No
typeenumThe type of the function tool call output. Always function_call_output.

Possible values: function_call_output
Yes

FunctionToolCallOutputResource

NameTypeDescriptionRequiredDefault
call_idstringThe unique ID of the function tool call generated by the model.
Yes
idstringThe unique ID of the function call tool output.
Yes
outputstringA JSON string of the output of the function tool call.
Yes
statusenumThe status of the item. One of in_progress, completed, or incomplete. Populated when items are returned via API.

Possible values: in_progress, completed, incomplete
No
typeenumThe type of the function tool call output. Always function_call_output.

Possible values: function_call_output
Yes

includable

Specify additional output data to include in the model response. Currently supported values are:
  • file_search_call.results: Include the search results of the file search tool call.
  • message.input_image.image_url: Include image urls from the input message.
  • computer_call_output.output.image_url: Include image urls from the computer call output.
PropertyValue
DescriptionSpecify additional output data to include in the model response. Currently supported values are:
- file_search_call.results: Include the search results of the file search tool call.
- message.input_image.image_url: Include image urls from the input message.
- computer_call_output.output.image_url: Include image urls from the computer call output.
Typestring
Valuesfile_search_call.results
message.input_image.image_url
computer_call_output.output.image_url

InputAudio

An audio input to the model.
NameTypeDescriptionRequiredDefault
datastringBase64-encoded audio data.
Yes
formatenumThe format of the audio data. Currently supported formats are mp3 and wav.

Possible values: mp3, wav
Yes
typeenumThe type of the input item. Always input_audio.

Possible values: input_audio
Yes

InputContent

This component can be one of the following:

InputFile

A file input to the model.
NameTypeDescriptionRequiredDefault
file_datastringThe content of the file to be sent to the model.
No
file_idstringThe ID of the file to be sent to the model.
No
filenamestringThe name of the file to be sent to the model.
No
typeenumThe type of the input item. Always input_file.

Possible values: input_file
Yes

InputImage

An image input to the model.
NameTypeDescriptionRequiredDefault
detailenumThe detail level of the image to be sent to the model. One of high, low, or auto. Defaults to auto.

Possible values: high, low, auto
Yes
file_idstringThe ID of the file to be sent to the model.
No
image_urlstringThe URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.
No
typeenumThe type of the input item. Always input_image.

Possible values: input_image
Yes

InputItem

This component can be one of the following:

InputMessage

A message input to the model with a role indicating instruction following hierarchy. Instructions given with the developer or system role take precedence over instructions given with the user role.
NameTypeDescriptionRequiredDefault
contentInputMessageContentListA list of one or many input items to the model, containing different content
types.
Yes
roleenumThe role of the message input. One of user, system, or developer.

Possible values: user, system, developer
Yes
statusenumThe status of item. One of in_progress, completed, or incomplete. Populated when items are returned via API.

Possible values: in_progress, completed, incomplete
No
typeenumThe type of the message input. Always set to message.

Possible values: message
No

InputMessageContentList

A list of one or many input items to the model, containing different content types. No properties defined for this component.

InputMessageResource

NameTypeDescriptionRequiredDefault
contentInputMessageContentListA list of one or many input items to the model, containing different content
types.
Yes
idstringThe unique ID of the message input.
Yes
roleenumThe role of the message input. One of user, system, or developer.

Possible values: user, system, developer
Yes
statusenumThe status of item. One of in_progress, completed, or incomplete. Populated when items are returned via API.

Possible values: in_progress, completed, incomplete
No
typeenumThe type of the message input. Always set to message.

Possible values: message
No

InputText

A text input to the model.
NameTypeDescriptionRequiredDefault
textstringThe text input to the model.
Yes
typeenumThe type of the input item. Always input_text.

Possible values: input_text
Yes

Item

Content item used to generate a response. This component can be one of the following:

ItemReference

An internal identifier for an item to reference.
NameTypeDescriptionRequiredDefault
idstringThe ID of the item to reference.
Yes
typeenumThe type of item to reference. Always item_reference.

Possible values: item_reference
Yes

ItemResource

Content item used to generate a response. This component can be one of the following:

KeyPress

A collection of keypresses the model would like to perform.
NameTypeDescriptionRequiredDefault
keysarrayThe combination of keys the model is requesting to be pressed. This is an array of strings, each representing a key.
Yes
typeenumSpecifies the event type. For a keypress action, this property is always set to keypress.

Possible values: keypress
Yes

Metadata

Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters. No properties defined for this component.

ModelResponseProperties

NameTypeDescriptionRequiredDefault
metadataMetadataSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.

Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
No
modelstringModel used to generate the responses.No
temperaturenumberWhat sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
We generally recommend altering this or top_p but not both.
No1
top_pnumberAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.
No1
userstringA unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse. .
No

Move

A mouse move action.
NameTypeDescriptionRequiredDefault
typeenumSpecifies the event type. For a move action, this property is always set to move.

Possible values: move
Yes
xintegerThe x-coordinate to move to.
Yes
yintegerThe y-coordinate to move to.
Yes

OutputAudio

An audio output from the model.
NameTypeDescriptionRequiredDefault
datastringBase64-encoded audio data from the model.
Yes
transcriptstringThe transcript of the audio data from the model.
Yes
typeenumThe type of the output audio. Always output_audio.

Possible values: output_audio
Yes

OutputContent

This component can be one of the following:

OutputItem

NameTypeDescriptionRequiredDefault
actionComputerActionYes
argumentsstringA JSON string of the arguments to pass to the function.
Yes
call_idstringAn identifier used when responding to the tool call with output.
Yes
contentarrayReasoning text contents.
Yes
idstringThe unique identifier of the reasoning content.
Yes
namestringThe name of the function to run.
Yes
pending_safety_checksarrayThe pending safety checks for the computer call.
Yes
queriesarrayThe queries used to search for files.
Yes
resultsarrayThe results of the file search tool call.
No
roleenumThe role of the output message. Always assistant.

Possible values: assistant
Yes
statusenumThe status of the item. One of in_progress, completed, or incomplete. Populated when items are returned via API.

Possible values: in_progress, completed, incomplete
Yes
typeenumThe type of the object. Always reasoning.

Possible values: reasoning
Yes

OutputMessage

An output message from the model.
NameTypeDescriptionRequiredDefault
contentarrayThe content of the output message.
Yes
idstringThe unique ID of the output message.
Yes
roleenumThe role of the output message. Always assistant.

Possible values: assistant
Yes
statusenumThe status of the message input. One of in_progress, completed, or incomplete. Populated when input items are returned via API.

Possible values: in_progress, completed, incomplete
Yes
typeenumThe type of the output message. Always message.

Possible values: message
Yes

OutputText

A text output from the model.
NameTypeDescriptionRequiredDefault
annotationsarrayThe annotations of the text output.
Yes
textstringThe text output from the model.
Yes
typeenumThe type of the output text. Always output_text.

Possible values: output_text
Yes

RealtimeSessionCreateRequest

Realtime session object configuration.
NameTypeDescriptionRequiredDefault
input_audio_formatenumThe format of input audio. Options are pcm16, g711_ulaw, or g711_alaw.
For pcm16, input audio must be 16-bit PCM at a 24kHz sample rate, single channel (mono), and little-endian byte order.

Possible values: pcm16, g711_ulaw, g711_alaw
No
input_audio_noise_reductionobjectConfiguration for input audio noise reduction. This can be set to null to turn off.
Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.
Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.
No
└─ typeenumType of noise reduction. near_field is for close-talking microphones such as headphones, far_field is for far-field microphones such as laptop or conference room microphones.

Possible values: near_field, far_field
No
input_audio_transcriptionobjectConfiguration for input audio transcription, defaults to off and can be set to null to turn off once on. Input audio transcription isn’t native to the model, since the model consumes audio directly. Transcription runs asynchronously through the Transcriptions endpoint and should be treated as guidance of input audio content rather than precisely what the model heard. The client can optionally set the language and prompt for transcription, these offer additional guidance to the transcription service.
No
└─ languagestringThe language of the input audio. Supplying the input language in ISO-639-1 (e.g. en) format improves accuracy and latency.
No
└─ modelstringThe model to use for transcription, current options are gpt-4o-transcribe, gpt-4o-mini-transcribe, gpt-4o-mini-transcribe-2025-12-15, and whisper-1.
No
└─ promptstringAn optional text to guide the model’s style or continue a previous audio segment.
For whisper-1, the prompt is a list of keywords.
For gpt-4o-transcribe models, the prompt is a free text string, for example “expect words related to technology”.
No
instructionsstringThe default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format, (e.g. “be extremely succinct”, “act friendly”, “here are examples of good responses”) and on audio behavior (e.g. “talk quickly”, “inject emotion into your voice”, “laugh frequently”). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.

Note that the server sets default instructions which will be used if this field isn’t set and are visible in the session.created event at the start of the session.
No
max_response_output_tokensinteger or stringMaximum number of output tokens for a single assistant response, inclusive of tool calls. Provide an integer between 1 and 4096 to limit output tokens, or inf for the maximum available tokens for a given model. Defaults to inf.
No
modalitiesThe set of modalities the model can respond with. To disable audio, set this to [“text”].
No
modelstringThe name of the deployment used for this session.
No
output_audio_formatenumThe format of output audio. Options are pcm16, g711_ulaw, or g711_alaw.
For pcm16, output audio is sampled at a rate of 24kHz.

Possible values: pcm16, g711_ulaw, g711_alaw
No
temperaturenumberSampling temperature for the model, limited to [0.6, 1.2]. For audio models a temperature of 0.8 is highly recommended for best performance.
No0.8
tool_choicestringHow the model chooses tools. Options are auto, none, required, or specify a function.
Noauto
toolsarrayTools (functions) available to the model.No
turn_detectionobjectConfiguration for turn detection, ether Server VAD or Semantic VAD. This can be set to null to turn off, in which case the client must manually trigger model response.
Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.
Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with uhhm, the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.
No
└─ create_responsebooleanWhether or not to automatically generate a response when a VAD stop event occurs.
NoTrue
└─ eagernessenumUsed only for semantic_vad mode. The eagerness of the model to respond. low will wait longer for the user to continue speaking, high will respond more quickly. auto is the default and is equivalent to medium.

Possible values: low, medium, high, auto
No
└─ interrupt_responsebooleanWhether or not to automatically interrupt any ongoing response with output to the default conversation (i.e. conversation of auto) when a VAD start event occurs.
NoTrue
└─ prefix_padding_msintegerUsed only for server_vad mode. Amount of audio to include before the VAD detected speech (in milliseconds). Defaults to 300ms.
No
└─ silence_duration_msintegerUsed only for server_vad mode. Duration of silence to detect speech stop (in milliseconds). Defaults to 500ms. With shorter values the model will respond more quickly, but may jump in on short pauses from the user.
No
└─ thresholdnumberUsed only for server_vad mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A higher threshold will require louder audio to activate the model, and thus might perform better in noisy environments.
No
└─ typeenumType of turn detection.

Possible values: server_vad, semantic_vad
No
voiceVoiceIdsSharedNo

RealtimeSessionCreateResponse

A new Realtime session configuration, with an ephemeral key. Default TTL for keys is one minute.
NameTypeDescriptionRequiredDefault
client_secretobjectEphemeral key returned by the API.Yes
└─ expires_atintegerTimestamp for when the token expires. Currently, all tokens expire after one minute.
No
└─ valuestringEphemeral key usable in client environments to authenticate connections to the Realtime API. Use this in client-side environments rather than a standard API token, which should only be used server-side.
No
input_audio_formatstringThe format of input audio. Options are pcm16, g711_ulaw, or g711_alaw.
No
input_audio_transcriptionobjectConfiguration for input audio transcription, defaults to off and can be set to null to turn off once on. Input audio transcription isn’t native to the model, since the model consumes audio directly. Transcription runs asynchronously through Whisper and should be treated as rough guidance rather than the representation understood by the model.
No
└─ modelstringThe model to use for transcription, whisper-1 is the only currently supported model.
No
instructionsstringThe default system instructions (i.e. system message) prepended to model calls. This field allows the client to guide the model on desired responses. The model can be instructed on response content and format (e.g. “be extremely succinct”, “act friendly”, “here are examples of good responses”) and on audio behavior (e.g. “talk quickly”, “inject emotion into your voice”, “laugh frequently”). The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behavior.
Note that the server sets default instructions which will be used if this field isn’t set and are visible in the session.created event at the start of the session.
No
max_response_output_tokensinteger or stringMaximum number of output tokens for a single assistant response, inclusive of tool calls. Provide an integer between 1 and 4096 to limit output tokens, or inf for the maximum available tokens for a given model. Defaults to inf.
No
modalitiesThe set of modalities the model can respond with. To disable audio, set this to [“text”].
No
output_audio_formatstringThe format of output audio. Options are pcm16, g711_ulaw, or g711_alaw.
No
temperaturenumberSampling temperature for the model, limited to [0.6, 1.2]. Defaults to 0.8.
No
tool_choicestringHow the model chooses tools. Options are auto, none, required, or specify a function.
No
toolsarrayTools (functions) available to the model.No
turn_detectionobjectConfiguration for turn detection. Can be set to null to turn off. Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.
No
└─ prefix_padding_msintegerAmount of audio to include before the VAD detected speech (in milliseconds). Defaults to 300ms.
No
└─ silence_duration_msintegerDuration of silence to detect speech stop (in milliseconds). Defaults to 500ms. With shorter values the model will respond more quickly, but may jump in on short pauses from the user.
No
└─ thresholdnumberActivation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A higher threshold will require louder audio to activate the model, and thus might perform better in noisy environments.
No
└─ typestringType of turn detection, only server_vad is currently supported.
No
voiceVoiceIdsSharedNo

RealtimeTranscriptionSessionCreateRequest

Realtime transcription session object configuration.
NameTypeDescriptionRequiredDefault
includearrayThe set of items to include in the transcription. Current available items are:
- item.input_audio_transcription.logprobs
No
input_audio_formatenumThe format of input audio. Options are pcm16, g711_ulaw, or g711_alaw.
For pcm16, input audio must be 16-bit PCM at a 24kHz sample rate, single channel (mono), and little-endian byte order.

Possible values: pcm16, g711_ulaw, g711_alaw
No
input_audio_noise_reductionobjectConfiguration for input audio noise reduction. This can be set to null to turn off.
Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model.
Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.
No
└─ typeenumType of noise reduction. near_field is for close-talking microphones such as headphones, far_field is for far-field microphones such as laptop or conference room microphones.

Possible values: near_field, far_field
No
input_audio_transcriptionobjectConfiguration for input audio transcription. The client can optionally set the language and prompt for transcription, these offer additional guidance to the transcription service.
No
└─ languagestringThe language of the input audio. Supplying the input language in ISO-639-1 (e.g. en) format improves accuracy and latency.
No
└─ modelenumThe model to use for transcription, current options are gpt-4o-transcribe, gpt-4o-transcribe-diarize, gpt-4o-mini-transcribe, gpt-4o-mini-transcribe-2025-12-15, and whisper-1.

Possible values: gpt-4o-transcribe, gpt-4o-transcribe-diarize, gpt-4o-mini-transcribe, gpt-4o-mini-transcribe-2025-12-15, whisper-1
No
└─ promptstringAn optional text to guide the model’s style or continue a previous audio segment.
For whisper-1, the prompt is a list of keywords.
For gpt-4o-transcribe models, the prompt is a free text string, for example “expect words related to technology”.
No
modalitiesThe set of modalities the model can respond with. To disable audio, set this to [“text”].
No
turn_detectionobjectConfiguration for turn detection, ether Server VAD or Semantic VAD. This can be set to null to turn off, in which case the client must manually trigger model response.
Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.
Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with uhhm, the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.
No
└─ create_responsebooleanWhether or not to automatically generate a response when a VAD stop event occurs. Not available for transcription sessions.
NoTrue
└─ eagernessenumUsed only for semantic_vad mode. The eagerness of the model to respond. low will wait longer for the user to continue speaking, high will respond more quickly. auto is the default and is equivalent to medium.

Possible values: low, medium, high, auto
No
└─ interrupt_responsebooleanWhether or not to automatically interrupt any ongoing response with output to the default conversation (i.e. conversation of auto) when a VAD start event occurs. Not available for transcription sessions.
NoTrue
└─ prefix_padding_msintegerUsed only for server_vad mode. Amount of audio to include before the VAD detected speech (in milliseconds). Defaults to 300ms.
No
└─ silence_duration_msintegerUsed only for server_vad mode. Duration of silence to detect speech stop (in milliseconds). Defaults to 500ms. With shorter values the model will respond more quickly, but may jump in on short pauses from the user.
No
└─ thresholdnumberUsed only for server_vad mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A higher threshold will require louder audio to activate the model, and thus might perform better in noisy environments.
No
└─ typeenumType of turn detection.

Possible values: server_vad, semantic_vad
No

RealtimeTranscriptionSessionCreateResponse

A new Realtime transcription session configuration. When a session is created on the server via REST API, the session object also contains an ephemeral key. Default TTL for keys is one minute. This property isn’t present when a session is updated via the WebSocket API.
NameTypeDescriptionRequiredDefault
client_secretobjectEphemeral key returned by the API. Only present when the session is created on the server via REST API.
Yes
└─ expires_atintegerTimestamp for when the token expires. Currently, all tokens expire after one minute.
No
└─ valuestringEphemeral key usable in client environments to authenticate connections to the Realtime API. Use this in client-side environments rather than a standard API token, which should only be used server-side.
No
input_audio_formatstringThe format of input audio. Options are pcm16, g711_ulaw, or g711_alaw.
No
input_audio_transcriptionobjectConfiguration of the transcription model.
No
└─ languagestringThe language of the input audio. Supplying the input language in ISO-639-1 (e.g. en) format improves accuracy and latency.
No
└─ modelenumThe model to use for transcription. Can be gpt-4o-transcribe, gpt-4o-mini-transcribe, gpt-4o-mini-transcribe-2025-12-15, or whisper-1.

Possible values: gpt-4o-transcribe, gpt-4o-mini-transcribe, gpt-4o-mini-transcribe-2025-12-15, whisper-1
No
└─ promptstringAn optional text to guide the model’s style or continue a previous audio segment. The prompt should match the audio language.
No
modalitiesThe set of modalities the model can respond with. To disable audio, set this to [“text”].
No
turn_detectionobjectConfiguration for turn detection. Can be set to null to turn off. Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.
No
└─ prefix_padding_msintegerAmount of audio to include before the VAD detected speech (in milliseconds). Defaults to 300ms.
No
└─ silence_duration_msintegerDuration of silence to detect speech stop (in milliseconds). Defaults to 500ms. With shorter values the model will respond more quickly, but may jump in on short pauses from the user.
No
└─ thresholdnumberActivation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A higher threshold will require louder audio to activate the model, and thus might perform better in noisy environments.
No
└─ typestringType of turn detection, only server_vad is currently supported.
No

Reasoning

Configuration options for reasoning models.
NameTypeDescriptionRequiredDefault
effortReasoningEffortConstrains effort on reasoning for reasoning models.
Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
Yesmedium
summaryenumA summary of the reasoning performed by the model. This can be useful for debugging and understanding the model’s reasoning process.
One of concise or detailed.

Possible values: concise, detailed
No

ReasoningItem

A description of the chain of thought used by a reasoning model while generating a response.
NameTypeDescriptionRequiredDefault
contentarrayReasoning text contents.
Yes
idstringThe unique identifier of the reasoning content.
Yes
statusenumThe status of the item. One of in_progress, completed, or incomplete. Populated when items are returned via API.

Possible values: in_progress, completed, incomplete
No
typeenumThe type of the object. Always reasoning.

Possible values: reasoning
Yes

Refusal

A refusal from the model.
NameTypeDescriptionRequiredDefault
refusalstringThe refusal explanation from the model.
Yes
typeenumThe type of the refusal. Always refusal.

Possible values: refusal
Yes

response

NameTypeDescriptionRequiredDefault
created_atnumberUnix timestamp (in seconds) of when this Response was created.
Yes
errorResponseErrorAn error object returned when the model fails to generate a Response.
Yes
idstringUnique identifier for this Response.
Yes
incomplete_detailsobjectDetails about why the response is incomplete.
Yes
└─ reasonenumThe reason why the response is incomplete.
Possible values: max_output_tokens, content_filter
No
instructionsstringInserts a system (or developer) message as the first item in the model’s context.

When using along with previous_response_id, the instructions from a previous response will be not be carried over to the next response. This makes it simple to swap out system (or developer) messages in new responses.
Yes
max_output_tokensintegerAn upper bound for the number of tokens that can be generated for a response, including visible output tokens and conversation state.
No
metadataMetadataSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.

Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
Yes
modelstringModel used to generate the responses.Yes
objectenumThe object type of this resource - always set to response.

Possible values: response
Yes
outputarrayAn array of content items generated by the model.

- The length and order of items in the output array is dependent on the model’s response.
- Rather than accessing the first item in the output array and assuming it’s an assistant message with the content generated by the model, you might consider using the output_text property where supported in SDKs.
Yes
output_textstringSDK-only convenience property that contains the aggregated text output from all output_text items in the output array, if any are present.
Supported in the Python and JavaScript SDKs.
No
parallel_tool_callsbooleanWhether to allow the model to run tool calls in parallel.
YesTrue
previous_response_idstringThe unique ID of the previous response to the model. Use this to create multi-turn conversations.No
reasoningReasoningConfiguration options for reasoning models.
No
statusenumThe status of the response generation. One of completed, failed, in_progress, or incomplete.

Possible values: completed, failed, in_progress, incomplete
No
temperaturenumberWhat sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
We generally recommend altering this or top_p but not both.
Yes1
textobjectConfiguration options for a text response from the model. Can be plain text or structured JSON data. Learn more:
- text inputs and outputs
- Structured Outputs
No
└─ formatTextResponseFormatConfigurationAn object specifying the format that the model must output.

Configuring { "type": "json_schema" } enables Structured Outputs, which ensures the model matches your supplied JSON schema. The default format is { "type": "text" } with no additional options.

Not recommended for gpt-4o and newer models:

Setting to { "type": "json_object" } enables the older JSON mode, which ensures the message the model generates is valid JSON. Using json_schema is preferred for models that support it.
No
tool_choiceToolChoiceOptions or ToolChoiceTypes or ToolChoiceFunctionHow the model should select which tool (or tools) to use when generating a response. See the tools parameter to see how to specify which tools the model can call.
Yes
toolsarrayAn array of tools the model may call while generating a response. You can specify which tool to use by setting the tool_choice parameter.

The two categories of tools you can provide the model are:

- Built-in tools
Yes
top_pnumberAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.
Yes1
truncationenumThe truncation strategy to use for the model response.
- auto: If the context of this response and previous ones exceeds the model’s context window size, the model will truncate the response to fit the context window by dropping input items in the middle of the conversation.
- disabled (default): If a model response will exceed the context window size for a model, the request will fail with a 400 error.

Possible values: auto, disabled
No
usageResponseUsageRepresents token usage details including input tokens, output tokens, a breakdown of output tokens, and the total tokens used.
No
userstringA unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse. .
No

ResponseAudioDeltaEvent

Emitted when there is a partial audio response.
NameTypeDescriptionRequiredDefault
deltastringA chunk of Base64 encoded response audio bytes.
Yes
typeenumThe type of the event. Always response.audio.delta.

Possible values: response.audio.delta
Yes

ResponseAudioDoneEvent

Emitted when the audio response is complete.
NameTypeDescriptionRequiredDefault
typeenumThe type of the event. Always response.audio.done.

Possible values: response.audio.done
Yes

ResponseAudioTranscriptDeltaEvent

Emitted when there is a partial transcript of audio.
NameTypeDescriptionRequiredDefault
deltastringThe partial transcript of the audio response.
Yes
typeenumThe type of the event. Always response.audio.transcript.delta.

Possible values: response.audio.transcript.delta
Yes

ResponseAudioTranscriptDoneEvent

Emitted when the full audio transcript is completed.
NameTypeDescriptionRequiredDefault
typeenumThe type of the event. Always response.audio.transcript.done.

Possible values: response.audio.transcript.done
Yes

ResponseCodeInterpreterCallCodeDeltaEvent

Emitted when a partial code snippet is added by the code interpreter.
NameTypeDescriptionRequiredDefault
deltastringThe partial code snippet added by the code interpreter.
Yes
output_indexintegerThe index of the output item that the code interpreter call is in progress.
Yes
typeenumThe type of the event. Always response.code_interpreter_call.code.delta.

Possible values: response.code_interpreter_call.code.delta
Yes

ResponseCodeInterpreterCallCodeDoneEvent

Emitted when code snippet output is finalized by the code interpreter.
NameTypeDescriptionRequiredDefault
codestringThe final code snippet output by the code interpreter.
Yes
output_indexintegerThe index of the output item that the code interpreter call is in progress.
Yes
typeenumThe type of the event. Always response.code_interpreter_call.code.done.

Possible values: response.code_interpreter_call.code.done
Yes

ResponseCodeInterpreterCallCompletedEvent

Emitted when the code interpreter call is completed.
NameTypeDescriptionRequiredDefault
code_interpreter_callCodeInterpreterToolCallA tool call to run code.
Yes
output_indexintegerThe index of the output item that the code interpreter call is in progress.
Yes
typeenumThe type of the event. Always response.code_interpreter_call.completed.

Possible values: response.code_interpreter_call.completed
Yes

ResponseCodeInterpreterCallInProgressEvent

Emitted when a code interpreter call is in progress.
NameTypeDescriptionRequiredDefault
code_interpreter_callCodeInterpreterToolCallA tool call to run code.
Yes
output_indexintegerThe index of the output item that the code interpreter call is in progress.
Yes
typeenumThe type of the event. Always response.code_interpreter_call.in_progress.

Possible values: response.code_interpreter_call.in_progress
Yes

ResponseCodeInterpreterCallInterpretingEvent

Emitted when the code interpreter is actively interpreting the code snippet.
NameTypeDescriptionRequiredDefault
code_interpreter_callCodeInterpreterToolCallA tool call to run code.
Yes
output_indexintegerThe index of the output item that the code interpreter call is in progress.
Yes
typeenumThe type of the event. Always response.code_interpreter_call.interpreting.

Possible values: response.code_interpreter_call.interpreting
Yes

ResponseCompletedEvent

Emitted when the model response is complete.
NameTypeDescriptionRequiredDefault
responseresponseYes
typeenumThe type of the event. Always response.completed.

Possible values: response.completed
Yes

ResponseContentPartAddedEvent

Emitted when a new content part is added.
NameTypeDescriptionRequiredDefault
content_indexintegerThe index of the content part that was added.
Yes
item_idstringThe ID of the output item that the content part was added to.
Yes
output_indexintegerThe index of the output item that the content part was added to.
Yes
partOutputContentYes
typeenumThe type of the event. Always response.content_part.added.

Possible values: response.content_part.added
Yes

ResponseContentPartDoneEvent

Emitted when a content part is done.
NameTypeDescriptionRequiredDefault
content_indexintegerThe index of the content part that is done.
Yes
item_idstringThe ID of the output item that the content part was added to.
Yes
output_indexintegerThe index of the output item that the content part was added to.
Yes
partOutputContentYes
typeenumThe type of the event. Always response.content_part.done.

Possible values: response.content_part.done
Yes

ResponseCreatedEvent

An event that is emitted when a response is created.
NameTypeDescriptionRequiredDefault
responseresponseYes
typeenumThe type of the event. Always response.created.

Possible values: response.created
Yes

ResponseError

An error object returned when the model fails to generate a Response.
NameTypeDescriptionRequiredDefault
codeResponseErrorCodeThe error code for the response.
Yes
messagestringA human-readable description of the error.
Yes

ResponseErrorCode

The error code for the response.
PropertyValue
DescriptionThe error code for the response.
Typestring
Valuesserver_error
rate_limit_exceeded
invalid_prompt
vector_store_timeout
invalid_image
invalid_image_format
invalid_base64_image
invalid_image_url
image_too_large
image_too_small
image_parse_error
image_content_policy_violation
invalid_image_mode
image_file_too_large
unsupported_image_media_type
empty_image_file
failed_to_download_image
image_file_not_found

ResponseErrorEvent

Emitted when an error occurs.
NameTypeDescriptionRequiredDefault
codestringThe error code.
Yes
messagestringThe error message.
Yes
paramstringThe error parameter.
Yes
typeenumThe type of the event. Always error.

Possible values: error
Yes

ResponseFailedEvent

An event that is emitted when a response fails.
NameTypeDescriptionRequiredDefault
responseresponseYes
typeenumThe type of the event. Always response.failed.

Possible values: response.failed
Yes

ResponseFileSearchCallCompletedEvent

Emitted when a file search call is completed (results found).
NameTypeDescriptionRequiredDefault
item_idstringThe ID of the output item that the file search call is initiated.
Yes
output_indexintegerThe index of the output item that the file search call is initiated.
Yes
typeenumThe type of the event. Always response.file_search_call.completed.

Possible values: response.file_search_call.completed
Yes

ResponseFileSearchCallInProgressEvent

Emitted when a file search call is initiated.
NameTypeDescriptionRequiredDefault
item_idstringThe ID of the output item that the file search call is initiated.
Yes
output_indexintegerThe index of the output item that the file search call is initiated.
Yes
typeenumThe type of the event. Always response.file_search_call.in_progress.

Possible values: response.file_search_call.in_progress
Yes

ResponseFileSearchCallSearchingEvent

Emitted when a file search is currently searching.
NameTypeDescriptionRequiredDefault
item_idstringThe ID of the output item that the file search call is initiated.
Yes
output_indexintegerThe index of the output item that the file search call is searching.
Yes
typeenumThe type of the event. Always response.file_search_call.searching.

Possible values: response.file_search_call.searching
Yes

ResponseFunctionCallArgumentsDeltaEvent

Emitted when there is a partial function-call arguments delta.
NameTypeDescriptionRequiredDefault
deltastringThe function-call arguments delta that is added.
Yes
item_idstringThe ID of the output item that the function-call arguments delta is added to.
Yes
output_indexintegerThe index of the output item that the function-call arguments delta is added to.
Yes
typeenumThe type of the event. Always response.function_call_arguments.delta.

Possible values: response.function_call_arguments.delta
Yes

ResponseFunctionCallArgumentsDoneEvent

Emitted when function-call arguments are finalized.
NameTypeDescriptionRequiredDefault
argumentsstringThe function-call arguments.Yes
item_idstringThe ID of the item.Yes
output_indexintegerThe index of the output item.Yes
typeenum
Possible values: response.function_call_arguments.done
Yes

ResponseInProgressEvent

Emitted when the response is in progress.
NameTypeDescriptionRequiredDefault
responseresponseYes
typeenumThe type of the event. Always response.in_progress.

Possible values: response.in_progress
Yes

ResponseIncompleteEvent

An event that is emitted when a response finishes as incomplete.
NameTypeDescriptionRequiredDefault
responseresponseYes
typeenumThe type of the event. Always response.incomplete.

Possible values: response.incomplete
Yes

responseItemList

A list of Response items.
NameTypeDescriptionRequiredDefault
dataarrayA list of items used to generate this response.Yes
first_idstringThe ID of the first item in the list.Yes
has_morebooleanWhether there are more items available.Yes
last_idstringThe ID of the last item in the list.Yes
objectenumThe type of object returned, must be list.
Possible values: list
Yes

ResponseModalities

Output types that you would like the model to generate. Most models are capable of generating text, which is the default: ["text"] The gpt-4o-audio-preview model can also be used to generate audio. To request that this model generate both text and audio responses, you can use: ["text", "audio"] No properties defined for this component.

ResponseModalitiesTextOnly

Output types that you would like the model to generate. Most models are capable of generating text, which is the default: ["text"] This API will soon support other output modalities, including audio and images. No properties defined for this component.

ResponseOutputItemAddedEvent

Emitted when a new output item is added.
NameTypeDescriptionRequiredDefault
itemOutputItemYes
output_indexintegerThe index of the output item that was added.
Yes
typeenumThe type of the event. Always response.output_item.added.

Possible values: response.output_item.added
Yes

ResponseOutputItemDoneEvent

Emitted when an output item is marked done.
NameTypeDescriptionRequiredDefault
itemOutputItemYes
output_indexintegerThe index of the output item that was marked done.
Yes
typeenumThe type of the event. Always response.output_item.done.

Possible values: response.output_item.done
Yes

ResponseProperties

NameTypeDescriptionRequiredDefault
instructionsstringInserts a system (or developer) message as the first item in the model’s context.

When using along with previous_response_id, the instructions from a previous response will be not be carried over to the next response. This makes it simple to swap out system (or developer) messages in new responses.
No
max_output_tokensintegerAn upper bound for the number of tokens that can be generated for a response, including visible output tokens and conversation state.
No
previous_response_idstringThe unique ID of the previous response to the model. Use this to create multi-turn conversations.No
reasoningReasoningConfiguration options for reasoning models.
No
textobjectConfiguration options for a text response from the model. Can be plain text or structured JSON data. Learn more:
- text inputs and outputs
- Structured Outputs
No
└─ formatTextResponseFormatConfigurationAn object specifying the format that the model must output.

Configuring { "type": "json_schema" } enables Structured Outputs, which ensures the model matches your supplied JSON schema. The default format is { "type": "text" } with no additional options.

Not recommended for gpt-4o and newer models:

Setting to { "type": "json_object" } enables the older JSON mode, which ensures the message the model generates is valid JSON. Using json_schema is preferred for models that support it.
No
tool_choiceToolChoiceOptions or ToolChoiceTypes or ToolChoiceFunctionHow the model should select which tool (or tools) to use when generating a response. See the tools parameter to see how to specify which tools the model can call.
No
toolsarrayAn array of tools the model may call while generating a response. You can specify which tool to use by setting the tool_choice parameter.

The two categories of tools you can provide the model are:

- Built-in tools
No
truncationenumThe truncation strategy to use for the model response.
- auto: If the context of this response and previous ones exceeds the model’s context window size, the model will truncate the response to fit the context window by dropping input items in the middle of the conversation.
- disabled (default): If a model response will exceed the context window size for a model, the request will fail with a 400 error.

Possible values: auto, disabled
No

ResponseRefusalDeltaEvent

Emitted when there is a partial refusal text.
NameTypeDescriptionRequiredDefault
content_indexintegerThe index of the content part that the refusal text is added to.
Yes
deltastringThe refusal text that is added.
Yes
item_idstringThe ID of the output item that the refusal text is added to.
Yes
output_indexintegerThe index of the output item that the refusal text is added to.
Yes
typeenumThe type of the event. Always response.refusal.delta.

Possible values: response.refusal.delta
Yes

ResponseRefusalDoneEvent

Emitted when refusal text is finalized.
NameTypeDescriptionRequiredDefault
content_indexintegerThe index of the content part that the refusal text is finalized.
Yes
item_idstringThe ID of the output item that the refusal text is finalized.
Yes
output_indexintegerThe index of the output item that the refusal text is finalized.
Yes
refusalstringThe refusal text that is finalized.
Yes
typeenumThe type of the event. Always response.refusal.done.

Possible values: response.refusal.done
Yes

responseStreamEvent

NameTypeDescriptionRequiredDefault
annotationAnnotationYes
annotation_indexintegerThe index of the annotation that was added.
Yes
argumentsstringThe function-call arguments.Yes
codestringThe error code.
Yes
code_interpreter_callCodeInterpreterToolCallA tool call to run code.
Yes
content_indexintegerThe index of the content part that the text content is finalized.
Yes
deltastringThe text delta that was added.
Yes
itemOutputItemThe output item that was marked done.
Yes
item_idstringThe ID of the output item that the text content is finalized.
Yes
messagestringThe error message.
Yes
output_indexintegerThe index of the output item that the text content is finalized.
Yes
paramstringThe error parameter.
Yes
partOutputContentThe content part that is done.
Yes
refusalstringThe refusal text that is finalized.
Yes
responseresponseThe response that was incomplete.
Yes
textstringThe text content that is finalized.
Yes
typeenumThe type of the event. Always response.output_text.done.

Possible values: response.output_text.done
Yes

ResponseTextAnnotationDeltaEvent

Emitted when a text annotation is added.
NameTypeDescriptionRequiredDefault
annotationAnnotationYes
annotation_indexintegerThe index of the annotation that was added.
Yes
content_indexintegerThe index of the content part that the text annotation was added to.
Yes
item_idstringThe ID of the output item that the text annotation was added to.
Yes
output_indexintegerThe index of the output item that the text annotation was added to.
Yes
typeenumThe type of the event. Always response.output_text.annotation.added.

Possible values: response.output_text.annotation.added
Yes

ResponseTextDeltaEvent

Emitted when there is an additional text delta.
NameTypeDescriptionRequiredDefault
content_indexintegerThe index of the content part that the text delta was added to.
Yes
deltastringThe text delta that was added.
Yes
item_idstringThe ID of the output item that the text delta was added to.
Yes
output_indexintegerThe index of the output item that the text delta was added to.
Yes
typeenumThe type of the event. Always response.output_text.delta.

Possible values: response.output_text.delta
Yes

ResponseTextDoneEvent

Emitted when text content is finalized.
NameTypeDescriptionRequiredDefault
content_indexintegerThe index of the content part that the text content is finalized.
Yes
item_idstringThe ID of the output item that the text content is finalized.
Yes
output_indexintegerThe index of the output item that the text content is finalized.
Yes
textstringThe text content that is finalized.
Yes
typeenumThe type of the event. Always response.output_text.done.

Possible values: response.output_text.done
Yes

ResponseUsage

Represents token usage details including input tokens, output tokens, a breakdown of output tokens, and the total tokens used.
NameTypeDescriptionRequiredDefault
input_tokensintegerThe number of input tokens.Yes
output_tokensintegerThe number of output tokens.Yes
output_tokens_detailsobjectA detailed breakdown of the output tokens.Yes
└─ reasoning_tokensintegerThe number of reasoning tokens.No
total_tokensintegerThe total number of tokens used.Yes

Screenshot

A screenshot action.
NameTypeDescriptionRequiredDefault
typeenumSpecifies the event type. For a screenshot action, this property is always set to screenshot.

Possible values: screenshot
Yes

Scroll

A scroll action.
NameTypeDescriptionRequiredDefault
scroll_xintegerThe horizontal scroll distance.
Yes
scroll_yintegerThe vertical scroll distance.
Yes
typeenumSpecifies the event type. For a scroll action, this property is always set to scroll.

Possible values: scroll
Yes
xintegerThe x-coordinate where the scroll occurred.
Yes
yintegerThe y-coordinate where the scroll occurred.
Yes

StopConfiguration

Up to 4 sequences where the API stops generating further tokens. The returned text will not contain the stop sequence. This component can be one of the following:

TextResponseFormatConfiguration

An object specifying the format that the model must output. Configuring { "type": "json_schema" } enables Structured Outputs, which ensure the model matches your supplied JSON schema. The default format is { "type": "text" } with no additional options. Not recommended for gpt-4o and newer models: Setting to { "type": "json_object" } enables the older JSON mode, which ensures the message the model generates is valid JSON. Using json_schema is preferred for models that support it. This component can be one of the following:

TextResponseFormatJsonSchema

JSON Schema response format. Used to generate structured JSON responses. Learn more about Structured Outputs.
NameTypeDescriptionRequiredDefault
descriptionstringA description of what the response format is for, used by the model to determine how to respond in the format.
No
namestringThe name of the response format. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.
No
schemaResponseFormatJsonSchemaSchemaThe schema for the response format, described as a JSON Schema object.Yes
strictbooleanWhether to enable strict schema adherence when generating the output.
If set to true, the model will always follow the exact schema defined in the schema field. Only a subset of JSON Schema is supported when strict is true.
NoFalse
typeenumThe type of response format being defined. Always json_schema.
Possible values: json_schema
Yes

Tool

This component can be one of the following:

ToolChoiceFunction

Use this option to force the model to call a specific function.
NameTypeDescriptionRequiredDefault
namestringThe name of the function to call.Yes
typeenumFor function calling, the type is always function.
Possible values: function
Yes

ToolChoiceOptions

Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
PropertyValue
DescriptionControls which (if any) tool is called by the model.

none means the model will not call any tool and instead generates a message.

auto means the model can pick between generating a message or calling one or more tools.

required means the model must call one or more tools.
Typestring
Valuesnone
auto
required

ToolChoiceTypes

Indicates that the model should use a built-in tool to generate a response.
NameTypeDescriptionRequiredDefault
typeenumThe type of hosted tool the model should use. Allowed values are:
- file_search
- computer_use_preview

Possible values: file_search, computer_use_preview
Yes

Type

An action to type in text.
NameTypeDescriptionRequiredDefault
textstringThe text to type.
Yes
typeenumSpecifies the event type. For a type action, this property is always set to type.

Possible values: type
Yes

UpdateVectorStoreFileAttributesRequest

NameTypeDescriptionRequiredDefault
attributesVectorStoreFileAttributesSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters, booleans, or numbers.
Yes

UrlCitation

A citation for a web resource used to generate a model response.
NameTypeDescriptionRequiredDefault
end_indexintegerThe index of the last character of the URL citation in the message.
Yes
start_indexintegerThe index of the first character of the URL citation in the message.
Yes
titlestringThe title of the web resource.
Yes
typeenumThe type of the URL citation. Always url_citation.

Possible values: url_citation
Yes
urlstringThe URL of the web resource.
Yes

VectorStoreFileAttributes

Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters, booleans, or numbers. No properties defined for this component.

VectorStoreFileContentResponse

Represents the parsed content of a vector store file.
NameTypeDescriptionRequiredDefault
dataarrayParsed content of the file.Yes
has_morebooleanIndicates if there are more content pages to fetch.Yes
next_pagestringThe token for the next page, if any.Yes
objectenumThe object type, which is always vector_store.file_content.page
Possible values: vector_store.file_content.page
Yes

VectorStoreSearchRequest

NameTypeDescriptionRequiredDefault
filtersComparisonFilter or CompoundFilterA filter to apply based on file attributes.No
max_num_resultsintegerThe maximum number of results to return. This number should be between 1 and 50 inclusive.No10
querystring or arrayA query string for a searchYes
ranking_optionsobjectRanking options for search.No
└─ rankerenumPossible values: auto, default-2024-11-15No
└─ score_thresholdnumberNo0
rewrite_querybooleanWhether to rewrite the natural language query for vector search.NoFalse

VectorStoreSearchResultContentObject

NameTypeDescriptionRequiredDefault
textstringThe text content returned from search.Yes
typeenumThe type of content.
Possible values: text
Yes

VectorStoreSearchResultItem

NameTypeDescriptionRequiredDefault
attributesVectorStoreFileAttributesSet of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters, booleans, or numbers.
Yes
contentarrayContent chunks from the file.Yes
file_idstringThe ID of the vector store file.Yes
filenamestringThe name of the vector store file.Yes
scorenumberThe similarity score for the result.Yes

VectorStoreSearchResultsPage

NameTypeDescriptionRequiredDefault
dataarrayThe list of search result items.Yes
has_morebooleanIndicates if there are more results to fetch.Yes
next_pagestringThe token for the next page, if any.Yes
objectenumThe object type, which is always vector_store.search_results.page
Possible values: vector_store.search_results.page
Yes
search_queryarrayYes

VoiceIdsShared

No properties defined for this component.

Wait

A wait action.
NameTypeDescriptionRequiredDefault
typeenumSpecifies the event type. For a wait action, this property is always set to wait.

Possible values: wait
Yes

ReasoningEffort

Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
PropertyValue
DescriptionConstrains effort on reasoning for reasoning models.
Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
Typestring
Defaultmedium
Valueslow
medium
high

errorEvent

Occurs when an error occurs. This can happen due to an internal server error or a timeout.
NameTypeDescriptionRequiredDefault
dataerrorYes
eventstringYes
event Enum: ErrorEventEnum
ValueDescription
error

doneEvent

Occurs when a stream ends.
NameTypeDescriptionRequiredDefault
datastringYes
eventstringYes
event Enum: DoneEventEnum
ValueDescription
done
data Enum: DoneEventDataEnum
ValueDescription
[DONE]

Next steps

Learn about Models, and fine-tuning with the REST API. Learn more about the underlying models that power Azure OpenAI.