Skip to main content
Learn how to fine-tune models in Microsoft Foundry for your datasets and use cases. Fine-tuning enables:
  • Higher-quality results than what you can get just from prompt engineering.
  • The ability to train on more examples than what can fit into a model’s request context limit.
  • Token savings due to shorter prompts.
  • Lower-latency requests, particularly when you’re using smaller models.
In contrast to few-shot learning, fine-tuning improves the model by training on more examples than what fits in a prompt. Because weights adapt to your task, you include fewer examples or instructions. Including less reduces tokens per call and potentially lowers cost and latency. We use low-rank adaptation (LoRA) to fine-tune models in a way that reduces their complexity without significantly affecting their performance. This method works by approximating the original high-rank matrix with a lower-rank one. Fine-tuning a smaller subset of important parameters during the supervised training phase makes the model more manageable and efficient. For users, it also makes training faster and more affordable than other techniques. In this article, you learn how to:
  • Choose appropriate datasets and formats for fine-tuning.
  • Trigger a fine-tuning job, monitor the status, and fetch results.
  • Deploy and evaluate a fine-tuned model.
  • Clean up your resources when you no longer need them.