Skip to main content

Platform Architecture

Azure AI Foundry is built on a distributed, cloud-native architecture designed to provide scalable, secure, and reliable AI services. Understanding this architecture helps you make informed decisions about deployment patterns, security configurations, and integration strategies.

Architectural overview

Azure AI Foundry follows a multi-layered architecture that separates concerns while providing seamless integration between components. The platform is designed with enterprise requirements in mind: high availability, global scale, security, and compliance.

Core architectural principles

Separation of concerns: The platform separates data plane operations (model inference, training) from control plane operations (management, configuration) to ensure scalability and security. Multi-tenancy with isolation: While the platform serves multiple customers, each tenant’s resources are logically and physically isolated to ensure security and performance. Regional distribution: Services are distributed across Azure regions to provide low latency and compliance with data residency requirements. Event-driven design: Many platform operations are asynchronous and event-driven, enabling scalability and resilience.

High-level architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Management Layer                         │
├─────────────────────────────────────────────────────────────────┤
│  Portal UI  │  REST APIs  │  SDKs  │  CLI  │  ARM Templates    │
├─────────────────────────────────────────────────────────────────┤
│                        Service Layer                           │
├─────────────────────────────────────────────────────────────────┤
│ Model Catalog │ Deployment │ Evaluation │ Safety │ Monitoring  │
├─────────────────────────────────────────────────────────────────┤
│                        Compute Layer                           │
├─────────────────────────────────────────────────────────────────┤
│  Serverless   │   Managed   │   Dedicated  │    Edge          │
│  Endpoints    │   Compute   │   Clusters   │  Deployment      │
├─────────────────────────────────────────────────────────────────┤
│                        Infrastructure Layer                     │
├─────────────────────────────────────────────────────────────────┤
│   Azure Infrastructure (Compute, Storage, Networking)          │
└─────────────────────────────────────────────────────────────────┘

Organizational structure

Hub and project model

Azure AI Foundry uses a hierarchical model that aligns with organizational structures and governance requirements: Azure AI Foundry Hub
  • Central governance and resource management point
  • Shared infrastructure (compute, storage, networking)
  • Organization-wide policies and compliance settings
  • Centralized billing and cost management
  • Cross-project resource sharing and collaboration
Projects
  • Isolated development environments for specific AI initiatives
  • Project-specific datasets, models, and deployments
  • Team-based access control and permissions
  • Independent lifecycle management
  • Inherit governance from parent hub
This model enables organizations to maintain centralized control while providing teams with the autonomy needed for effective AI development.

Service architecture components

Model catalog and registry

The model catalog serves as a centralized repository for AI models from multiple sources:
Model Catalog Architecture:
┌─────────────────────────────────────────────────────────────┐
│                     Model Catalog                          │
├─────────────────────────────────────────────────────────────┤
│  Microsoft  │  OpenAI  │  Hugging  │  Partners │  Custom   │
│   Models    │  Models  │   Face    │  Models   │  Models   │
├─────────────────────────────────────────────────────────────┤
│              Model Metadata & Versioning                   │
├─────────────────────────────────────────────────────────────┤
│         Security Scanning & Compliance Validation          │
├─────────────────────────────────────────────────────────────┤
│              Deployment & Lifecycle Management             │
└─────────────────────────────────────────────────────────────┘
Key characteristics:
  • Versioning: Models are versioned with semantic versioning and metadata tracking
  • Security: All models undergo security scanning and vulnerability assessment
  • Compliance: Models are tagged with compliance and governance metadata
  • Performance metrics: Benchmarking data and performance characteristics are tracked

Deployment engine

The deployment engine provides multiple deployment patterns to accommodate different use cases: Serverless API deployments:
  • Multi-tenant infrastructure with automatic scaling
  • Usage-based pricing model
  • Managed entirely by Azure AI Foundry
  • Ideal for variable or unpredictable workloads
Managed compute deployments:
  • Dedicated compute resources for consistent performance
  • Customer-controlled scaling and configuration
  • Enhanced isolation and security
  • Suitable for production workloads with predictable traffic
Real-time endpoints:
  • Low-latency inference for interactive applications
  • WebSocket support for streaming responses
  • Load balancing across multiple instances
  • Auto-scaling based on traffic patterns

Safety and governance layer

Built-in safety and governance capabilities are integrated throughout the platform:
Safety Architecture:
┌─────────────────────────────────────────────────────────────┐
│                Input Processing                             │
├─────────────────────────────────────────────────────────────┤
│ Content      │ Prompt      │ PII         │ Custom         │
│ Filtering    │ Injection   │ Detection   │ Policies       │
├─────────────────────────────────────────────────────────────┤
│                Model Inference                              │
├─────────────────────────────────────────────────────────────┤
│ Output       │ Bias        │ Toxicity    │ Quality        │
│ Filtering    │ Detection   │ Screening   │ Assurance      │
├─────────────────────────────────────────────────────────────┤
│                Monitoring & Alerting                        │
└─────────────────────────────────────────────────────────────┘

Network architecture

Public network configuration

In the default public network configuration, Azure AI Foundry services are accessible via public endpoints with built-in security measures:
Public Network Flow:
Internet → Azure Front Door → API Gateway → Service Endpoints
    ↓              ↓             ↓              ↓
  WAF/DDoS    Load Balancing  Authentication  Rate Limiting
Security layers:
  • Web Application Firewall (WAF) for common web attacks
  • DDoS protection against volumetric attacks
  • API gateway for authentication and rate limiting
  • Network security groups for traffic filtering

Private network configuration

For enhanced security, Azure AI Foundry supports private networking through Azure Private Link:
Private Network Flow:
On-premises → ExpressRoute/VPN → Azure VNet → Private Endpoints → Services
     ↓              ↓              ↓              ↓              ↓
  Corporate      Encrypted      Network       Private IP    Service
  Firewall       Tunnel         Security      Addressing    Access
Benefits of private networking:
  • Traffic never traverses the public internet
  • Integration with existing corporate networks
  • Enhanced control over network routing and security
  • Compliance with strict data governance requirements

Data flow and processing

Training data flow

Understanding how data flows through the platform during model training and fine-tuning:
Training Data Flow:
Data Sources → Data Validation → Preprocessing → Training → Model Registry
     ↓              ↓              ↓              ↓           ↓
  Multiple       Schema         Feature       Distributed  Versioned
  Formats        Validation     Engineering   Computing    Storage
Data processing characteristics:
  • Multi-format support: JSON, CSV, Parquet, images, audio, video
  • Validation pipelines: Automated data quality and schema validation
  • Preprocessing: Built-in data transformation and feature engineering
  • Distributed training: Automatic parallelization across compute resources

Inference data flow

How data flows during model inference and response generation:
Inference Flow:
Client Request → Authentication → Content Safety → Model → Response Processing → Client
      ↓              ↓              ↓              ↓            ↓              ↓
   API/SDK        Token/Key      Input Filter   GPU/CPU     Output Filter   JSON/Stream
Performance optimizations:
  • Caching: Intelligent caching of model weights and intermediate results
  • Batching: Automatic request batching for improved throughput
  • Load balancing: Traffic distribution across multiple model instances
  • Edge optimization: Regional deployment for reduced latency

Security architecture

Identity and access management

Azure AI Foundry integrates with Azure Active Directory (now Microsoft Entra ID) for comprehensive identity management:
Identity Architecture:
Users/Apps → Microsoft Entra ID → RBAC → Azure AI Foundry Resources
     ↓              ↓              ↓              ↓
  Identity      Authentication  Authorization  Resource Access
  Provider      & MFA           Policies       Control
Security features:
  • Multi-factor authentication: Required for sensitive operations
  • Conditional access: Context-aware access policies
  • Privileged access: Just-in-time access for administrative operations
  • Audit logging: Comprehensive logging of all access and operations

Data protection and encryption

Data is protected at multiple layers throughout the platform: Encryption at rest:
  • Azure Storage Service Encryption (SSE) with customer-managed keys
  • Transparent Data Encryption (TDE) for databases
  • Key management through Azure Key Vault
Encryption in transit:
  • TLS 1.2+ for all API communications
  • VPN/ExpressRoute for private network connections
  • mTLS for service-to-service communication
Data isolation:
  • Tenant-specific encryption keys
  • Logical isolation in multi-tenant services
  • Physical isolation for dedicated deployments

Monitoring and observability

Platform telemetry

Azure AI Foundry provides comprehensive monitoring and observability capabilities:
Monitoring Architecture:
Applications → Azure Monitor → Log Analytics → Alerts & Dashboards
     ↓              ↓              ↓              ↓
   Metrics        Collection     Storage &      Visualization
   & Logs         & Routing      Analysis       & Alerting
Telemetry categories:
  • Performance metrics: Latency, throughput, error rates
  • Resource utilization: CPU, memory, GPU usage
  • Business metrics: API calls, token usage, costs
  • Security events: Authentication failures, policy violations

Application insights

Built-in integration with Azure Application Insights provides deep application monitoring:
  • Request tracing: End-to-end request tracking across services
  • Dependency mapping: Automatic discovery of service dependencies
  • Performance profiling: Code-level performance analysis
  • Custom telemetry: Application-specific metrics and events

Scalability and performance

Auto-scaling mechanisms

Azure AI Foundry implements multiple auto-scaling strategies: Horizontal scaling:
  • Automatic instance provisioning based on load
  • Regional load balancing for global applications
  • Compute cluster auto-scaling for training workloads
Vertical scaling:
  • Dynamic resource allocation within instances
  • GPU memory optimization for large models
  • Storage performance scaling based on I/O patterns

Performance optimization strategies

Model optimization:
  • Quantization and pruning for reduced model size
  • ONNX runtime optimization for inference performance
  • Hardware-specific optimizations (GPU, CPU, NPU)
Infrastructure optimization:
  • Intelligent workload placement across regions
  • Predictive scaling based on usage patterns
  • Resource pooling and sharing for efficiency

Integration patterns

API-first design

Azure AI Foundry follows API-first design principles, enabling seamless integration:
Integration Patterns:
External Systems → REST APIs → Service Layer → AI Models
      ↓              ↓           ↓              ↓
   Various        Standard    Business       Model
   Clients        Protocols   Logic          Inference
Integration approaches:
  • Synchronous APIs: Real-time request/response patterns
  • Asynchronous APIs: Queue-based processing for long-running tasks
  • Streaming APIs: Real-time data streaming and processing
  • Webhook APIs: Event-driven integration patterns

Ecosystem connectivity

The platform provides extensive connectivity to the broader Azure and Microsoft ecosystem: Azure services integration:
  • Azure Cosmos DB for document storage
  • Azure Cognitive Search for vector search
  • Azure Service Bus for messaging
  • Azure Functions for serverless computing
Microsoft 365 integration:
  • SharePoint and OneDrive for document processing
  • Teams for collaborative AI applications
  • Outlook for email-based AI workflows
  • Power Platform for low-code AI solutions

Deployment architectures

Single-region deployment

Suitable for development, testing, and applications with regional user bases:
Single Region:
Load Balancer → App Gateway → AI Foundry Services → Storage
      ↓              ↓              ↓              ↓
   Traffic        SSL Termination  Processing     Data
   Distribution   & Routing       & Inference    Persistence

Multi-region deployment

Required for global applications, disaster recovery, and compliance:
Multi-Region:
Global Load Balancer → Regional Deployments → Data Replication
         ↓                    ↓                    ↓
   Traffic Routing      Independent Regions   Consistent Data
   & Failover          with Local Storage     Across Regions
Multi-region considerations:
  • Data residency: Ensuring data stays within required geographic boundaries
  • Latency optimization: Routing users to the nearest region
  • Disaster recovery: Automated failover between regions
  • Consistency: Managing data consistency across distributed deployments

Future architecture evolution

Emerging technologies

Azure AI Foundry architecture continues to evolve with emerging technologies: Edge computing:
  • Model deployment to edge devices and locations
  • Hybrid cloud-edge processing scenarios
  • Offline-capable AI applications
Quantum computing:
  • Integration with Azure Quantum services
  • Quantum-enhanced AI algorithms
  • Hybrid classical-quantum processing
Confidential computing:
  • Hardware-based trusted execution environments
  • Processing encrypted data without decryption
  • Enhanced privacy for sensitive AI workloads
Understanding these architectural principles helps you design applications that leverage Azure AI Foundry’s capabilities while meeting your specific requirements for scale, security, and performance.