This week’s selection is “A Beginner’s Guide To Using A Circular Sock Knitting Machine: Machine With Metal Needles” by Harry Rogers. This is a book about building a sock-knitting machine. Yes, a ...
Supervised fine-tuning teaches a model from example outputs. Reinforcement learning (RL) teaches from *rewards* -- the model generates its own outputs, and a reward function scores them. The model ...
Tune KL penalty, group size, and advantage normalization. RL training has several hyperparameters beyond learning rate that critically affect stability and performance. This tutorial covers the most ...