Memory in Microsoft Foundry Agent Service (preview)
Memory (preview) in Foundry Agent Service and the Memory Store API (preview) are licensed to you as part of your Azure subscription and are subject to terms applicable to “Previews” in the Microsoft Product Terms and the Microsoft Products and Services Data Protection Addendum, as well as the Microsoft Generative AI Services Previews terms in the Supplemental Terms of Use for Microsoft Azure Previews.
What is memory?
Memory is persistent knowledge retained by an agent across sessions. Generally, agent memory falls into two categories:- Short-term memory tracks the current session’s conversation and maintains immediate context for ongoing interactions. Agent orchestration frameworks typically manage this memory as part of the session context.
- Long-term memory retains distilled knowledge across sessions. The model can recall and build on previous user interactions over time. Long-term memory requires a persistent system that extracts, consolidates, and manages knowledge.
How memory works
Behind the scenes, memories are stored as items in a managed memory store. The system may apply consolidation and conflict‑resolution logic where applicable (for example, to merge duplicate or overlapping user profile information).Consolidation behavior can vary by memory type and may change during preview. For the latest behavior, see Create and use memory in Foundry Agent Service.
- Extraction: When a user interacts with an agent, the system actively extracts key information from the conversation, such as user preferences, facts, and relevant context. For example, preferences like “allergic to dairy” and summaries of recent activities are identified and stored.
- Consolidation: Extracted memories are consolidated to keep the memory store efficient and relevant. The system uses LLMs to merge similar or duplicate topics so that the agent doesn’t store redundant information. Conflicting facts, such as a new allergy, are resolved to maintain an accurate memory.
- Retrieval: When the agent needs to recall information, it searches the memory store for the most relevant memories. This allows the agent to quickly surface the right context, making conversations feel natural and informed. For best results, retrieve stable user profile information early in the conversation so the agent can personalize responses.
Memory types
Memory in Foundry Agent Service extracts and stores two types of long-term memory:| Type | Description | Configuration |
|---|---|---|
| User profile memory | Information and preferences about the user, such as preferred name, dietary restrictions, and language preference. These memories are considered “static” with respect to a conversation because they generally don’t depend on the current chat context. Retrieve user profile memories once at the beginning of each conversation. | Specify user_profile_details in a memory store. |
| Chat summary memory | A distilled summary of each topic or thread covered in a chat session. These memories allow users to continue conversations or reference prior sessions without repeating earlier context. Retrieve chat summary memories based on the current conversation to surface relevant threads. | Set chat_summary_enabled to true in a memory store. |
Working with memory
There are two ways to use memory for agent interactions:- Memory search tool: Attach the memory search tool to a prompt agent to enable reading from and writing to the memory store during conversations. This approach is ideal for most scenarios because it simplifies memory management. For more information, see Use memories via an agent tool.
- Memory store APIs: Interact directly with the memory store using the low-level APIs. This approach provides more control and flexibility for advanced use cases. For more information, see Use memories via APIs.
Use cases
The following examples illustrate how memory can enhance various types of agents.- Conversational agent
- Planning agent
- Research agent
- A customer support agent that remembers your name, previous issues and resolutions, ticket numbers, and your preferred contact method (chat, email, or call back). This memory helps you avoid repeating information, so conversations are more efficient and satisfying.
- A personal shopping assistant that remembers your size in specific brands, preferred colors, past returns, and recent purchases. The agent can suggest relevant items as soon as you start a session and avoid recommending products you already own.
Security risks
When you work with memory in Foundry Agent Service, the large language model (LLM) extracts and consolidates memories based on conversations. Protect memory against threats such as prompt injection and memory corruption. These risks arise when incorrect or harmful data is stored in the agent’s memory, potentially influencing agent responses and actions. To mitigate security risks, consider these actions:- Use Azure AI Content Safety and its prompt injection detection: Validate all prompts entering or leaving the memory system to prevent malicious content.
- Perform attack and adversarial testing: Regularly stress-test your agent for injection vulnerabilities through controlled adversarial exercises.
Limitations and quotas
- Memory currently requires compatible Azure OpenAI chat and embedding model deployments. For a list of supported models, see Azure OpenAI models and regions for Foundry Agent Service.
- You must set the
scopevalue explicitly. Automatic population from the user identity specified in the request isn’t currently supported.
Quotas
- Maximum scopes per memory store: 100
- Maximum memories per scope: 10,000
- Search memories: 1,000 requests per minute
- Update memories: 1,000 requests per minute
Pricing
Memory is currently in public preview. Pricing and billing for memory and the Memory Store API can change during preview. You’re billed for usage of the underlying chat and embedding models you configure. For current pricing details, see Foundry Agent Service pricing.Related content
- Follow the end-to-end setup: Create and use memory in Foundry Agent Service.
- Confirm model availability: Azure OpenAI models and regions for Foundry Agent Service.
- Build a complete agent: Microsoft Foundry Quickstart.