← Curriculum
Foundations
~18 mineval-design
datasets
rubrics
Designing evals that actually catch regressions
From vibes to a dataset that pays rent.
Step 1 of 14
Most teams start with 'vibes-based' evals — a few prompts the founder runs by hand. That's fine for week one and a liability by month three. A real eval set is a curated dataset, an explicit rubric, and a scoring function with known noise characteristics.