langchain-azure-ai and Foundry Memory to add long-term memory to your
applications. In this article, you create a memory-backed chain,
store user preferences, recall them in a new session, and run direct memory
queries.
This pattern works for both LangChain and LangGraph applications. The core idea
is to keep short-term chat history in your runtime and use Foundry Memory as the
long-term store for user-level context.
Foundry Memory focuses on long-term memory. Keep short-term turn-by-turn state
in LangChain or LangGraph runtime state.
Prerequisites
- An Azure subscription. Create one for free.
- A Foundry project.
- A deployed Microsoft Foundry chat model for memory retrieval.
- This tutorial uses “gpt-4.1”.
- A deployed chat model and embedding model for the memory store.
- This tutorial uses
text-embedding-3-large.
- This tutorial uses
- Python 3.10 or later.
- Azure CLI signed in (
az login) soDefaultAzureCredentialcan authenticate with role Azure AI Developer.
Configure your environment
Install the required packages for this tutorial. Uselangchain-azure-ai for
LangChain and LangGraph integration, azure-ai-projects for memory store
management, and azure-identity for authentication.
Understand the memory model
Foundry Memory stores and retrieves two long-term memory types:- User profile memory: stable user facts and preferences, such as preferred name or dietary constraints.
- Chat summary memory: distilled summaries of prior discussion topics.
- You can use user IDs as the stable identity for long-term memory. Keep it the same across sessions for the same user.
- You can use session IDs as the short-term conversation identity. Change it per chat session.
- You can use resource IDs as the stable identifier for long-term memory across multiple users.
Create the memory store
Before getting started, you need to create a memory store. For this operation, use the Microsoft Foundry projects SDKazure-ai-projects.
Using memory in LangGraph and LangChain
Foundry Memory integrates in LangGraph and LangChain by introducing two objects:- The class
langchain_azure_ai.chat_message_history.AzureAIMemoryChatMessageHistorycreates a memory-backed chat history. - The class
langchain_azure_ai.retrievers.AzureAIMemoryRetrieverallows retrieval of memories from the chat message history.
- Retrieve user profile memory early in a conversation to personalize responses.
- Retrieve chat summary memory based on the current turn to recover relevant prior context.
Example: Add a session-aware memory layer
In this example, we build a single runnable in LangChain that retrieves relevant long-term memory, injects it into the prompt, and executes the model with short-term chat history and long-term memory together. Let’s see how to implement it:Create the chat message history
This example uses a stableuser_id as the memory scope. Use session_id for per-session
conversation context.
(user_id, session_id) pair and caches them so retrieval state survives across
turns in the same session. For this walkthrough, update_delay=0 makes memory updates immediately visible.
In production, use the default delay unless you specifically need instant
extraction. session_histories is used to avoid having to recreate the objects constantly.
Compose the runnable with memory retrieval
Let’s create a runnable to implement the loop:RunnableWithMessageHistory so chat history
and long-term memory work together.
This pattern keeps your prompt deterministic: every turn explicitly includes
retrieved memory in the Memories section.
Run a practical cross-session scenario
This scenario shows the full value of long-term memory:- In session A, the user shares preferences.
- In session B, the app recalls those preferences automatically.
Example: Query memory directly for non-chat use cases
Use an ad-hoc retriever when you want direct memory reads outside the conversation pipeline, for example in personalization middleware or profile inspection tools.k) but sorted by relevance.
Use this pattern when you need direct memory reads for features such as profile
cards, personalization middleware, or workflow routing.
Example: Use memory in graphs
LangGraph uses the same conceptual pattern:- Keep
user_idstable for long-term memory. - Use
thread_id(or equivalent) for short-term thread context. - Retrieve memory before calling the model node.
StateGraph, inject retrieval in your model node and
append memory text to your model input. Another typical strategy is to use
a pre-model hook.
Understand preview limits and operational guidance
Before moving to production, validate these constraints:- Memory is in preview and behavior can change.
- Memory requires compatible chat and embedding deployments.
- Quotas apply per store and per scope, including search and update request rates.