Monitor Performance
Learn how to set up comprehensive monitoring for your Azure AI Foundry applications to ensure optimal performance, track usage patterns, and proactively identify issues before they impact users.Overview
Effective monitoring involves tracking multiple layers:- Application performance - Response times, error rates, availability
- AI model performance - Token usage, model latency, quality metrics
- Infrastructure performance - CPU, memory, network, scaling events
- Business metrics - User engagement, cost per interaction, conversion rates
Setting up Application Insights
Application Insights provides comprehensive application performance monitoring for Azure applications.Basic setup
Custom telemetry for AI operations
Track specific AI-related metrics:Monitoring dashboards
Create Application Insights workbooks
Navigate to Application Insights → Workbooks → New to create custom dashboards. AI Performance Workbook JSON (simplified):Key metrics to track
Performance Metrics:- Average response time by model
- 95th percentile response time
- Request rate (requests per second)
- Error rate and error types
- Token processing rate
- Cost per conversation
- Daily/monthly active users
- User satisfaction scores
- Feature adoption rates
- CPU and memory utilization
- Auto-scaling events
- Database query performance
- Cache hit rates
Real-time alerting
Set up alerts in Azure Monitor
High error rate alert:Integrate with notification systems
Slack integration:Advanced monitoring techniques
Distributed tracing
Track requests across multiple services:Custom metrics and logs
Implement structured logging:Performance benchmarking
Implement automated performance testing:Cost monitoring and optimization
Track and analyze costs
Best practices
1. Establish baseline metrics
Track these key performance indicators from day one:- Response time percentiles (P50, P95, P99)
- Error rates by error type and endpoint
- Token usage patterns by model and user segment
- Cost per interaction and daily spending trends
2. Implement progressive alerting
Set up alerts with different severity levels:- Info: 80% of budget used, response time above P95
- Warning: 90% of budget used, error rate above 1%
- Critical: Budget exceeded, error rate above 5%, service unavailable
3. Monitor user experience
Track metrics that directly impact users:- Time to first token for streaming responses
- Conversation quality scores from user feedback
- Feature adoption rates and usage patterns
- User retention and engagement metrics
4. Automate responses
Implement automated responses to common issues:- Auto-scaling based on queue depth or response time
- Circuit breakers for upstream service failures
- Fallback responses when AI services are unavailable
- Cost controls that pause expensive operations when budgets are exceeded
Common monitoring patterns
Health checks for AI services
Synthetic monitoring
Next steps
Optimize Costs
Learn strategies to reduce AI operation costs
Scale Applications
Handle increasing traffic and usage patterns
Effective monitoring is an ongoing process. Start with basic metrics and gradually add more sophisticated monitoring as your application grows. Remember that monitoring itself has costs - balance the depth of monitoring with the value it provides.

