← Labs
Intermediate
~10 minrequires API key
Build an LLM-as-judge
Author a rubric, judge a real generation, see the bias.
Author a structured rubric, generate a candidate response with Claude, then run a judge call that scores the response per-criterion with evidence quotes. Toggle a 'verbose response' to see verbosity bias in action.
Learning objectives
- ·Write a per-criterion rubric instead of a holistic score.
- ·Force structured JSON output from the judge.
- ·Observe verbosity bias by comparing scores on short vs. long responses.
1. Generate a candidate response
Tip: try generating once with concise, score it, then regenerate with verboseand score the same content again. Verbose responses often score higher even when the underlying claims are the same — that's verbosity bias.