Skip to main contentAI Models Overview
Azure AI Foundry provides access to a comprehensive catalog of AI models from Microsoft, leading AI companies, and the open-source community. Understanding the different types of models and their capabilities helps you choose the right tools for your applications.
Model categories
Language models
Large Language Models (LLMs) that understand and generate human-like text:
Chat completion models:
- GPT-4o, GPT-4o mini - Advanced reasoning and conversation
- Claude 3.5 Sonnet - Strong analytical capabilities
- Llama 3.1 series - Open-source alternatives
Specialized language models:
- Code generation models for programming tasks
- Domain-specific models for legal, medical, and financial text
- Multilingual models for global applications
Vision models
Models that process and understand visual content:
Image understanding:
- GPT-4 Vision - Analyze and describe images
- Florence models - Object detection and recognition
- Custom vision models for specific use cases
Image generation:
- DALL-E 3 - Create images from text descriptions
- Stable Diffusion - Open-source image generation
- Custom image models for branded content
Multimodal models
Models that work across text, images, audio, and video:
Vision-language models:
- GPT-4o - Combined text and image understanding
- LLaVA models - Open-source vision-language capabilities
Audio processing:
- Whisper - Speech-to-text transcription
- Azure Speech Services - Text-to-speech synthesis
- Audio classification and analysis models
Model deployment patterns
Serverless APIs
Best for:
- Variable workloads
- Getting started quickly
- Cost-effective experimentation
Characteristics:
- Pay-per-use pricing
- Automatic scaling
- Shared infrastructure
- Managed by Azure
Managed compute
Best for:
- Consistent workloads
- Predictable performance
- Enhanced security requirements
Characteristics:
- Dedicated resources
- Customizable configurations
- Predictable costs
- Customer-controlled scaling
Choosing the right model
Consider your use case
Content generation:
- Blog posts, marketing copy → GPT-4o, Claude 3.5
- Code development → GPT-4o, CodeLlama
- Creative writing → GPT-4o, Claude 3.5
Analysis and extraction:
- Document processing → GPT-4o with vision
- Data analysis → GPT-4o, Claude 3.5
- Sentiment analysis → Specialized language models
Interactive applications:
- Chatbots → GPT-4o mini (cost-effective)
- Virtual assistants → GPT-4o (high capability)
- Customer support → Fine-tuned models
Latency requirements:
- Real-time chat → Smaller, faster models
- Batch processing → Larger, more capable models
- Streaming responses → Models with streaming support
Quality needs:
- High-stakes applications → GPT-4o, Claude 3.5
- General purpose → GPT-4o mini
- Specialized domains → Fine-tuned models
Cost constraints:
- High volume, simple tasks → Smaller models
- Complex reasoning → Larger models
- Mixed workloads → Model routing strategies
Model capabilities and limitations
Understanding model strengths
GPT-4o series:
- Excellent reasoning and problem-solving
- Strong code generation capabilities
- Good multilingual support
- Vision understanding (GPT-4o)
Claude 3.5 series:
- Strong analytical thinking
- Excellent for complex reasoning
- Good safety characteristics
- Large context windows
Open-source models:
- Customizable and fine-tunable
- No vendor lock-in
- Community-driven improvements
- Cost-effective for specific use cases
Common limitations
Context length:
- Most models have token limits
- Longer conversations may lose context
- Consider conversation management strategies
Knowledge cutoffs:
- Models trained on data up to specific dates
- May not know recent events
- Consider RAG for current information
Bias and fairness:
- Models reflect training data biases
- Important for sensitive applications
- Use evaluation tools to assess bias
Model lifecycle management
Versioning and updates
Model versions:
- Each model has multiple versions
- Newer versions often improve capabilities
- Plan for version transitions
Deployment strategies:
- Blue-green deployments for zero downtime
- Gradual rollouts for risk mitigation
- A/B testing for performance comparison
Monitoring and optimization
Performance metrics:
- Latency and throughput
- Error rates and availability
- Cost per request/token
Quality assessment:
- Response relevance and accuracy
- User satisfaction scores
- Automated evaluation metrics
Getting started with models
Exploration approach
- Start with the playground - Test models interactively
- Try different prompts - Understand model behavior
- Compare models - Find the best fit for your use case
- Measure performance - Establish baseline metrics
- Scale gradually - Move from testing to production
Best practices
Prompt engineering:
- Write clear, specific instructions
- Provide examples for better results
- Use consistent formatting
- Test different approaches
Safety and governance:
- Implement content filtering
- Monitor for inappropriate outputs
- Set up usage alerts and limits
- Document model usage policies
Understanding these fundamentals helps you make informed decisions about which models to use and how to deploy them effectively in your applications.