← Curriculum
Intermediate
~15 minscoring
judges
calibration
LLM-as-judge: useful, biased, calibratable
Make a model grade another model — without lying to yourself.
Step 1 of 14
LLM-as-judge is the most cost-effective way to score open-ended generations at scale. It is also the most common source of false confidence in AI QA. Treat your judge like an unreliable contractor — measure it, calibrate it, constrain it.