Platform Architecture

Azure AI Foundry is built on a distributed, cloud-native architecture designed to provide scalable, secure, and reliable AI services. Understanding this architecture helps you make informed decisions about deployment patterns, security configurations, and integration strategies.

Architectural overview

Azure AI Foundry follows a multi-layered architecture that separates concerns while providing seamless integration between components. The platform is designed with enterprise requirements in mind: high availability, global scale, security, and compliance.

Core architectural principles

Separation of concerns: The platform separates data plane operations (model inference, training) from control plane operations (management, configuration) to ensure scalability and security. Multi-tenancy with isolation: While the platform serves multiple customers, each tenant’s resources are logically and physically isolated to ensure security and performance. Regional distribution: Services are distributed across Azure regions to provide low latency and compliance with data residency requirements. Event-driven design: Many platform operations are asynchronous and event-driven, enabling scalability and resilience.

High-level architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Management Layer                         │
├─────────────────────────────────────────────────────────────────┤
│  Portal UI  │  REST APIs  │  SDKs  │  CLI  │  ARM Templates    │
├─────────────────────────────────────────────────────────────────┤
│                        Service Layer                           │
├─────────────────────────────────────────────────────────────────┤
│ Model Catalog │ Deployment │ Evaluation │ Safety │ Monitoring  │
├─────────────────────────────────────────────────────────────────┤
│                        Compute Layer                           │
├─────────────────────────────────────────────────────────────────┤
│  Serverless   │   Managed   │   Dedicated  │    Edge          │
│  Endpoints    │   Compute   │   Clusters   │  Deployment      │
├─────────────────────────────────────────────────────────────────┤
│                        Infrastructure Layer                     │
├─────────────────────────────────────────────────────────────────┤
│   Azure Infrastructure (Compute, Storage, Networking)          │
└─────────────────────────────────────────────────────────────────┘

Organizational structure

Hub and project model

Azure AI Foundry uses a hierarchical model that aligns with organizational structures and governance requirements: Azure AI Foundry Hub

Central governance and resource management point
Shared infrastructure (compute, storage, networking)
Organization-wide policies and compliance settings
Centralized billing and cost management
Cross-project resource sharing and collaboration

Projects

Isolated development environments for specific AI initiatives
Project-specific datasets, models, and deployments
Team-based access control and permissions
Independent lifecycle management
Inherit governance from parent hub

This model enables organizations to maintain centralized control while providing teams with the autonomy needed for effective AI development.

Service architecture components

Model catalog and registry

The model catalog serves as a centralized repository for AI models from multiple sources:

Model Catalog Architecture:
┌─────────────────────────────────────────────────────────────┐
│                     Model Catalog                          │
├─────────────────────────────────────────────────────────────┤
│  Microsoft  │  OpenAI  │  Hugging  │  Partners │  Custom   │
│   Models    │  Models  │   Face    │  Models   │  Models   │
├─────────────────────────────────────────────────────────────┤
│              Model Metadata & Versioning                   │
├─────────────────────────────────────────────────────────────┤
│         Security Scanning & Compliance Validation          │
├─────────────────────────────────────────────────────────────┤
│              Deployment & Lifecycle Management             │
└─────────────────────────────────────────────────────────────┘

Key characteristics:

Versioning: Models are versioned with semantic versioning and metadata tracking
Security: All models undergo security scanning and vulnerability assessment
Compliance: Models are tagged with compliance and governance metadata
Performance metrics: Benchmarking data and performance characteristics are tracked

Deployment engine

The deployment engine provides multiple deployment patterns to accommodate different use cases: Serverless API deployments:

Multi-tenant infrastructure with automatic scaling
Usage-based pricing model
Managed entirely by Azure AI Foundry
Ideal for variable or unpredictable workloads

Managed compute deployments:

Dedicated compute resources for consistent performance
Customer-controlled scaling and configuration
Enhanced isolation and security
Suitable for production workloads with predictable traffic

Real-time endpoints:

Low-latency inference for interactive applications
WebSocket support for streaming responses
Load balancing across multiple instances
Auto-scaling based on traffic patterns

Safety and governance layer

Built-in safety and governance capabilities are integrated throughout the platform:

Safety Architecture:
┌─────────────────────────────────────────────────────────────┐
│                Input Processing                             │
├─────────────────────────────────────────────────────────────┤
│ Content      │ Prompt      │ PII         │ Custom         │
│ Filtering    │ Injection   │ Detection   │ Policies       │
├─────────────────────────────────────────────────────────────┤
│                Model Inference                              │
├─────────────────────────────────────────────────────────────┤
│ Output       │ Bias        │ Toxicity    │ Quality        │
│ Filtering    │ Detection   │ Screening   │ Assurance      │
├─────────────────────────────────────────────────────────────┤
│                Monitoring & Alerting                        │
└─────────────────────────────────────────────────────────────┘

Network architecture

Public network configuration

In the default public network configuration, Azure AI Foundry services are accessible via public endpoints with built-in security measures:

Public Network Flow:
Internet → Azure Front Door → API Gateway → Service Endpoints
    ↓              ↓             ↓              ↓
  WAF/DDoS    Load Balancing  Authentication  Rate Limiting

Security layers:

Web Application Firewall (WAF) for common web attacks
DDoS protection against volumetric attacks
API gateway for authentication and rate limiting
Network security groups for traffic filtering

Private network configuration

For enhanced security, Azure AI Foundry supports private networking through Azure Private Link:

Private Network Flow:
On-premises → ExpressRoute/VPN → Azure VNet → Private Endpoints → Services
     ↓              ↓              ↓              ↓              ↓
  Corporate      Encrypted      Network       Private IP    Service
  Firewall       Tunnel         Security      Addressing    Access

Benefits of private networking:

Traffic never traverses the public internet
Integration with existing corporate networks
Enhanced control over network routing and security
Compliance with strict data governance requirements

Data flow and processing

Training data flow

Understanding how data flows through the platform during model training and fine-tuning:

Training Data Flow:
Data Sources → Data Validation → Preprocessing → Training → Model Registry
     ↓              ↓              ↓              ↓           ↓
  Multiple       Schema         Feature       Distributed  Versioned
  Formats        Validation     Engineering   Computing    Storage

Data processing characteristics:

Multi-format support: JSON, CSV, Parquet, images, audio, video
Validation pipelines: Automated data quality and schema validation
Preprocessing: Built-in data transformation and feature engineering
Distributed training: Automatic parallelization across compute resources

Inference data flow

How data flows during model inference and response generation:

Inference Flow:
Client Request → Authentication → Content Safety → Model → Response Processing → Client
      ↓              ↓              ↓              ↓            ↓              ↓
   API/SDK        Token/Key      Input Filter   GPU/CPU     Output Filter   JSON/Stream

Performance optimizations:

Caching: Intelligent caching of model weights and intermediate results
Batching: Automatic request batching for improved throughput
Load balancing: Traffic distribution across multiple model instances
Edge optimization: Regional deployment for reduced latency

Security architecture

Identity and access management

Azure AI Foundry integrates with Azure Active Directory (now Microsoft Entra ID) for comprehensive identity management:

Identity Architecture:
Users/Apps → Microsoft Entra ID → RBAC → Azure AI Foundry Resources
     ↓              ↓              ↓              ↓
  Identity      Authentication  Authorization  Resource Access
  Provider      & MFA           Policies       Control

Security features:

Multi-factor authentication: Required for sensitive operations
Conditional access: Context-aware access policies
Privileged access: Just-in-time access for administrative operations
Audit logging: Comprehensive logging of all access and operations

Data protection and encryption

Data is protected at multiple layers throughout the platform: Encryption at rest:

Azure Storage Service Encryption (SSE) with customer-managed keys
Transparent Data Encryption (TDE) for databases
Key management through Azure Key Vault

Encryption in transit:

TLS 1.2+ for all API communications
VPN/ExpressRoute for private network connections
mTLS for service-to-service communication

Data isolation:

Tenant-specific encryption keys
Logical isolation in multi-tenant services
Physical isolation for dedicated deployments

Monitoring and observability

Platform telemetry

Azure AI Foundry provides comprehensive monitoring and observability capabilities:

Monitoring Architecture:
Applications → Azure Monitor → Log Analytics → Alerts & Dashboards
     ↓              ↓              ↓              ↓
   Metrics        Collection     Storage &      Visualization
   & Logs         & Routing      Analysis       & Alerting

Telemetry categories:

Performance metrics: Latency, throughput, error rates
Resource utilization: CPU, memory, GPU usage
Business metrics: API calls, token usage, costs
Security events: Authentication failures, policy violations

Application insights

Built-in integration with Azure Application Insights provides deep application monitoring:

Request tracing: End-to-end request tracking across services
Dependency mapping: Automatic discovery of service dependencies
Performance profiling: Code-level performance analysis
Custom telemetry: Application-specific metrics and events

Scalability and performance

Auto-scaling mechanisms

Azure AI Foundry implements multiple auto-scaling strategies: Horizontal scaling:

Automatic instance provisioning based on load
Regional load balancing for global applications
Compute cluster auto-scaling for training workloads

Vertical scaling:

Dynamic resource allocation within instances
GPU memory optimization for large models
Storage performance scaling based on I/O patterns

Performance optimization strategies

Model optimization:

Quantization and pruning for reduced model size
ONNX runtime optimization for inference performance
Hardware-specific optimizations (GPU, CPU, NPU)

Infrastructure optimization:

Intelligent workload placement across regions
Predictive scaling based on usage patterns
Resource pooling and sharing for efficiency

Integration patterns

API-first design

Azure AI Foundry follows API-first design principles, enabling seamless integration:

Integration Patterns:
External Systems → REST APIs → Service Layer → AI Models
      ↓              ↓           ↓              ↓
   Various        Standard    Business       Model
   Clients        Protocols   Logic          Inference

Integration approaches:

Synchronous APIs: Real-time request/response patterns
Asynchronous APIs: Queue-based processing for long-running tasks
Streaming APIs: Real-time data streaming and processing
Webhook APIs: Event-driven integration patterns

Ecosystem connectivity

The platform provides extensive connectivity to the broader Azure and Microsoft ecosystem: Azure services integration:

Azure Cosmos DB for document storage
Azure Cognitive Search for vector search
Azure Service Bus for messaging
Azure Functions for serverless computing

Microsoft 365 integration:

SharePoint and OneDrive for document processing
Teams for collaborative AI applications
Outlook for email-based AI workflows
Power Platform for low-code AI solutions

Deployment architectures

Single-region deployment

Suitable for development, testing, and applications with regional user bases:

Single Region:
Load Balancer → App Gateway → AI Foundry Services → Storage
      ↓              ↓              ↓              ↓
   Traffic        SSL Termination  Processing     Data
   Distribution   & Routing       & Inference    Persistence

Multi-region deployment

Required for global applications, disaster recovery, and compliance:

Multi-Region:
Global Load Balancer → Regional Deployments → Data Replication
         ↓                    ↓                    ↓
   Traffic Routing      Independent Regions   Consistent Data
   & Failover          with Local Storage     Across Regions

Multi-region considerations:

Data residency: Ensuring data stays within required geographic boundaries
Latency optimization: Routing users to the nearest region
Disaster recovery: Automated failover between regions
Consistency: Managing data consistency across distributed deployments

Future architecture evolution

Emerging technologies

Azure AI Foundry architecture continues to evolve with emerging technologies: Edge computing:

Model deployment to edge devices and locations
Hybrid cloud-edge processing scenarios
Offline-capable AI applications

Quantum computing:

Integration with Azure Quantum services
Quantum-enhanced AI algorithms
Hybrid classical-quantum processing

Confidential computing:

Hardware-based trusted execution environments
Processing encrypted data without decryption
Enhanced privacy for sensitive AI workloads

Understanding these architectural principles helps you design applications that leverage Azure AI Foundry’s capabilities while meeting your specific requirements for scale, security, and performance.

Getting Started

Tutorials

How-to Guides

Concepts

Core Concepts

Platform architecture

Platform Architecture

Architectural overview

Core architectural principles

High-level architecture

Organizational structure

Hub and project model

Service architecture components

Model catalog and registry

Deployment engine

Safety and governance layer

Network architecture

Public network configuration

Private network configuration

Data flow and processing

Training data flow

Inference data flow

Security architecture

Identity and access management

Data protection and encryption

Monitoring and observability

Platform telemetry

Application insights

Scalability and performance

Auto-scaling mechanisms

Performance optimization strategies

Integration patterns

API-first design

Ecosystem connectivity

Deployment architectures

Single-region deployment

Multi-region deployment

Future architecture evolution

Emerging technologies

Getting Started

Tutorials

How-to Guides

Concepts

Core Concepts

​Platform Architecture

​Architectural overview

​Core architectural principles

​High-level architecture

​Organizational structure

​Hub and project model

​Service architecture components

​Model catalog and registry

​Deployment engine

​Safety and governance layer

​Network architecture

​Public network configuration

​Private network configuration

​Data flow and processing

​Training data flow

​Inference data flow

​Security architecture

​Identity and access management

​Data protection and encryption

​Monitoring and observability

​Platform telemetry

​Application insights

​Scalability and performance

​Auto-scaling mechanisms

​Performance optimization strategies

​Integration patterns

​API-first design

​Ecosystem connectivity

​Deployment architectures

​Single-region deployment

​Multi-region deployment

​Future architecture evolution

​Emerging technologies

Platform Architecture

Architectural overview

Core architectural principles

High-level architecture

Organizational structure

Hub and project model

Service architecture components

Model catalog and registry

Deployment engine

Safety and governance layer

Network architecture

Public network configuration

Private network configuration

Data flow and processing

Training data flow

Inference data flow

Security architecture

Identity and access management

Data protection and encryption

Monitoring and observability

Platform telemetry

Application insights

Scalability and performance

Auto-scaling mechanisms

Performance optimization strategies

Integration patterns

API-first design

Ecosystem connectivity

Deployment architectures

Single-region deployment

Multi-region deployment

Future architecture evolution

Emerging technologies