Research Problems in LLM Pretraining

What is pretraining

Pretraining = solve a single optimization problem (minimize next-token prediction loss) at unprecedented scale.

next_token_prediction_hd (2).gif

Design Space in Pretraining

High-dimensional search space, expensive per-trial cost. Interactions between areas make joint optimization intractable.

design_space_pretraining (1).png

1. Scaling Laws

2. Model Architecture

3. Data Selection & Mixture