Skip to main content

AI Models Overview

Azure AI Foundry provides access to a comprehensive catalog of AI models from Microsoft, leading AI companies, and the open-source community. Understanding the different types of models and their capabilities helps you choose the right tools for your applications.

Model categories

Language models

Large Language Models (LLMs) that understand and generate human-like text: Chat completion models:
  • GPT-4o, GPT-4o mini - Advanced reasoning and conversation
  • Claude 3.5 Sonnet - Strong analytical capabilities
  • Llama 3.1 series - Open-source alternatives
Specialized language models:
  • Code generation models for programming tasks
  • Domain-specific models for legal, medical, and financial text
  • Multilingual models for global applications

Vision models

Models that process and understand visual content: Image understanding:
  • GPT-4 Vision - Analyze and describe images
  • Florence models - Object detection and recognition
  • Custom vision models for specific use cases
Image generation:
  • DALL-E 3 - Create images from text descriptions
  • Stable Diffusion - Open-source image generation
  • Custom image models for branded content

Multimodal models

Models that work across text, images, audio, and video: Vision-language models:
  • GPT-4o - Combined text and image understanding
  • LLaVA models - Open-source vision-language capabilities
Audio processing:
  • Whisper - Speech-to-text transcription
  • Azure Speech Services - Text-to-speech synthesis
  • Audio classification and analysis models

Model deployment patterns

Serverless APIs

Best for:
  • Variable workloads
  • Getting started quickly
  • Cost-effective experimentation
Characteristics:
  • Pay-per-use pricing
  • Automatic scaling
  • Shared infrastructure
  • Managed by Azure

Managed compute

Best for:
  • Consistent workloads
  • Predictable performance
  • Enhanced security requirements
Characteristics:
  • Dedicated resources
  • Customizable configurations
  • Predictable costs
  • Customer-controlled scaling

Choosing the right model

Consider your use case

Content generation:
  • Blog posts, marketing copy → GPT-4o, Claude 3.5
  • Code development → GPT-4o, CodeLlama
  • Creative writing → GPT-4o, Claude 3.5
Analysis and extraction:
  • Document processing → GPT-4o with vision
  • Data analysis → GPT-4o, Claude 3.5
  • Sentiment analysis → Specialized language models
Interactive applications:
  • Chatbots → GPT-4o mini (cost-effective)
  • Virtual assistants → GPT-4o (high capability)
  • Customer support → Fine-tuned models

Performance considerations

Latency requirements:
  • Real-time chat → Smaller, faster models
  • Batch processing → Larger, more capable models
  • Streaming responses → Models with streaming support
Quality needs:
  • High-stakes applications → GPT-4o, Claude 3.5
  • General purpose → GPT-4o mini
  • Specialized domains → Fine-tuned models
Cost constraints:
  • High volume, simple tasks → Smaller models
  • Complex reasoning → Larger models
  • Mixed workloads → Model routing strategies

Model capabilities and limitations

Understanding model strengths

GPT-4o series:
  • Excellent reasoning and problem-solving
  • Strong code generation capabilities
  • Good multilingual support
  • Vision understanding (GPT-4o)
Claude 3.5 series:
  • Strong analytical thinking
  • Excellent for complex reasoning
  • Good safety characteristics
  • Large context windows
Open-source models:
  • Customizable and fine-tunable
  • No vendor lock-in
  • Community-driven improvements
  • Cost-effective for specific use cases

Common limitations

Context length:
  • Most models have token limits
  • Longer conversations may lose context
  • Consider conversation management strategies
Knowledge cutoffs:
  • Models trained on data up to specific dates
  • May not know recent events
  • Consider RAG for current information
Bias and fairness:
  • Models reflect training data biases
  • Important for sensitive applications
  • Use evaluation tools to assess bias

Model lifecycle management

Versioning and updates

Model versions:
  • Each model has multiple versions
  • Newer versions often improve capabilities
  • Plan for version transitions
Deployment strategies:
  • Blue-green deployments for zero downtime
  • Gradual rollouts for risk mitigation
  • A/B testing for performance comparison

Monitoring and optimization

Performance metrics:
  • Latency and throughput
  • Error rates and availability
  • Cost per request/token
Quality assessment:
  • Response relevance and accuracy
  • User satisfaction scores
  • Automated evaluation metrics

Getting started with models

Exploration approach

  1. Start with the playground - Test models interactively
  2. Try different prompts - Understand model behavior
  3. Compare models - Find the best fit for your use case
  4. Measure performance - Establish baseline metrics
  5. Scale gradually - Move from testing to production

Best practices

Prompt engineering:
  • Write clear, specific instructions
  • Provide examples for better results
  • Use consistent formatting
  • Test different approaches
Safety and governance:
  • Implement content filtering
  • Monitor for inappropriate outputs
  • Set up usage alerts and limits
  • Document model usage policies
Understanding these fundamentals helps you make informed decisions about which models to use and how to deploy them effectively in your applications.