Date: September 23, 2025

Topic: Fine-Tuning

Recall

Notes

Fine-Tuning

When to Fine-tune


Through fine-tuning, we introduce examples that bias the model towards the fine-tuned dataset.

GPT2 Fine-Tuning Example

Without Fine-tuning

image.png

After Fine-tuning on Tolkien’s Silmarillion (Never uses the word “fox”)

image.png


Through fine-tuning, we are able to change the distributions of the predicted probabilities. For example, instead of the word “company”, the word “elves” is now more likely.

Encoder/Decoder Computation Graph

image.png


Prompting vs. Fine-Tuning


<aside> 📌 SUMMARY: Fine-tuning is useful when we have a small dataset that we want the model to bias to, but not large enough to train a model from scratch. This helps drive the distribution of outputs towards that dataset. If we have very large models, chances are such specific datasets are already included in the model’s distribution and through careful prompting, we can drive it towards those outputs instead.

</aside>


Date: September 23, 2025

Topic: Instruction Tuning

Recall

Instruction tuning is a form of fine-tuning that aligns the model towards a question-answer format, with humans correcting the output so the model is less likely to treat the prompt as a text continuation problem.


<aside> 📌 SUMMARY: Instruction tuning helps align the model towards human conversation structure instead of being solely text generating.

</aside>


Date: September 24, 2025

Topic: Reinforcement Learning