Skip to main content

Explore AI Models in Azure AI Foundry

Learn about different types of AI models and their capabilities by exploring the model catalog and testing them in interactive playgrounds. This tutorial guides you through discovering, evaluating, and comparing models for your specific use cases.

What you’ll learn

By the end of this tutorial, you will understand:
  • How to navigate the Azure AI Foundry model catalog
  • The differences between various AI model types
  • How to test models with your own prompts
  • How to compare model performance for your use case
  • Best practices for model selection

Prerequisites

Step 1: Navigate the model catalog

Let’s start by exploring what models are available in Azure AI Foundry.
  1. In your Azure AI Foundry project, go to “Models”
  2. Click “Explore all models” to see the full catalog
  3. Notice the different categories:
    • Language models - For text generation and understanding
    • Vision models - For image processing and generation
    • Audio models - For speech and audio processing
    • Multimodal models - For combining text, images, and audio
Take a moment to browse through each category and read the model descriptions.

Step 2: Compare language models

Let’s compare different language models to understand their strengths.

Test a large, capable model (GPT-4o)

  1. Find GPT-4o in the catalog
  2. Click “Try in playground” (or deploy if needed)
  3. Test it with this complex reasoning prompt:
You're planning a dinner party for 8 people. 3 are vegetarian, 2 have gluten allergies, and 1 is allergic to nuts. Plan a complete menu (appetizer, main course, dessert) that everyone can enjoy, including shopping list and preparation timeline.
Notice how the model:
  • Handles complex constraints
  • Provides detailed, structured responses
  • Shows strong reasoning capabilities

Test a smaller, efficient model (GPT-4o mini)

  1. Switch to GPT-4o mini in the playground
  2. Try the same dinner party prompt
  3. Compare the responses:
    • Speed difference
    • Level of detail
    • Accuracy of the solution

Test an open-source alternative

  1. Try Llama 3.1 8B or similar open-source model
  2. Use the same prompt
  3. Observe the differences in:
    • Response style
    • Completeness
    • Processing time
What you should notice:
  • Larger models often provide more detailed responses
  • Smaller models are faster and more cost-effective
  • Open-source models offer customization opportunities

Step 3: Explore specialized capabilities

Code generation

Test how different models handle programming tasks: Prompt for all models:
Create a Python function that:
1. Takes a list of dictionaries representing students with 'name', 'age', and 'grades' (list of numbers)
2. Calculates the average grade for each student
3. Returns the top 3 students with highest averages
4. Include error handling and documentation

Provide example usage.
Compare:
  • Code quality and completeness
  • Documentation clarity
  • Error handling approach
  • Example quality

Creative writing

Test creative capabilities with this prompt:
Write a short story (300 words) about an AI that discovers it can dream. The story should:
- Be written in first person from the AI's perspective
- Include sensory details about the dream experience
- Have an emotional arc
- End with a surprising realization
Observe:
  • Creativity and originality
  • Emotional depth
  • Narrative structure
  • Writing style

Analysis and reasoning

Test analytical capabilities:
Analyze this business scenario:
A coffee shop has 50 customers per day on weekdays and 80 on weekends. Average spend is $6. Rent is $3000/month, staff costs $8000/month, supplies are 30% of revenue. 

Should they:
1. Extend weekday hours (adds $500 monthly cost, might increase weekday customers by 20%)
2. Add weekend catering service (adds $1000 setup cost, $300 monthly, might add $2000 weekend revenue)
3. Do nothing

Provide analysis with calculations.
Compare:
  • Mathematical accuracy
  • Business reasoning
  • Consideration of trade-offs
  • Clarity of recommendations

Step 4: Test vision capabilities

If available, explore vision-enabled models:

Upload an image and test understanding

  1. Go to GPT-4o (or another vision-enabled model)
  2. Upload an image - try different types:
    • A photograph of a scene
    • A chart or graph
    • A handwritten note
    • An artwork
  3. Ask questions like:
    • “Describe what you see in detail”
    • “What mood or emotion does this convey?”
    • “If this is a graph, what insights can you draw?”

Test multimodal reasoning

Try combining image analysis with other tasks:
[Upload an image of a room]
Based on this room, suggest:
1. Three design improvements under $500
2. Color scheme changes that would make it feel more spacious
3. Furniture rearrangements for better flow

Explain your reasoning for each suggestion.

Step 5: Evaluate for your specific use case

Now think about your own application needs:

Define your requirements

Consider:
  • Response quality needs - How accurate must responses be?
  • Speed requirements - Real-time or batch processing?
  • Cost constraints - High volume or occasional use?
  • Customization needs - Generic or domain-specific?

Test with your domain

Create prompts that match your actual use case: For customer service:
A customer emails: "I ordered a blue shirt last week but received a red one. I need it for an event tomorrow. What can you do to help?"

Provide a helpful, empathetic response that:
- Acknowledges the problem
- Offers immediate solutions
- Prevents future issues
For content creation:
Write a product description for an eco-friendly yoga mat that:
- Highlights sustainability features
- Appeals to health-conscious consumers
- Includes key specifications
- Has a compelling call-to-action
- Stays under 150 words
For data analysis:
Given this sales data pattern: Q1: $100k, Q2: $120k, Q3: $80k, Q4: $150k

Analyze trends, identify potential causes for Q3 dip, and recommend strategies for Q1 next year. Consider seasonal factors, market conditions, and business cycles.

Step 6: Document your findings

Create a comparison table for your testing:
ModelSpeedQualityCostBest For
GPT-4oSlowExcellentHighComplex reasoning
GPT-4o miniFastGoodLowHigh volume tasks
Llama 3.1MediumGoodMediumCustomizable scenarios

What you’ve learned

Through this hands-on exploration, you’ve discovered: Model variety - Different models excel at different tasks ✅ Performance trade-offs - Speed vs. quality vs. cost considerations ✅ Capability assessment - How to evaluate models for your specific needs ✅ Testing methodology - Systematic approaches to model comparison

Key insights to remember

No one-size-fits-all: Different tasks may require different models, even within the same application. Context matters: Model performance varies significantly based on prompt quality and task complexity. Cost vs. capability: Smaller models can often handle simpler tasks effectively at much lower cost. Testing is essential: Always test models with your actual use cases before making decisions.

Next steps

Now that you understand model capabilities, you’re ready to:
This tutorial focused on experiential learning - you learned about AI models by actually using them. The hands-on comparison gave you intuitive understanding of their capabilities and trade-offs. Use this knowledge to make informed decisions about which models to deploy in your applications.