bonsai
← Curriculum
Foundations
~18 min
eval-design
datasets
rubrics

Designing evals that actually catch regressions

From vibes to a dataset that pays rent.

Step 1 of 14

Most teams start with 'vibes-based' evals — a few prompts the founder runs by hand. That's fine for week one and a liability by month three. A real eval set is a curated dataset, an explicit rubric, and a scoring function with known noise characteristics.