Build an LLM-as-judge

Author a rubric, judge a real generation, see the bias.

Author a structured rubric, generate a candidate response with Claude, then run a judge call that scores the response per-criterion with evidence quotes. Toggle a 'verbose response' to see verbosity bias in action.

Learning objectives

·Write a per-criterion rubric instead of a holistic score.
·Force structured JSON output from the judge.
·Observe verbosity bias by comparing scores on short vs. long responses.

1. Generate a candidate response

User prompt

Style

Candidate response (you can edit this)

Tip: try generating once with concise, score it, then regenerate with verboseand score the same content again. Verbose responses often score higher even when the underlying claims are the same — that's verbosity bias.

Build an LLM-as-judge

1. Generate a candidate response

2. Author the rubric

3. Judge