← Labs
Intermediate
~10 minrequires API key
Pairwise & positional bias
Same content, swapped order — watch the judge change its mind.
Two candidate responses (A and B), a pairwise judge that picks a winner, and a battery that runs both orderings. The lab reports A/B win counts and the positional gap — how often the first-shown candidate wins regardless of content.
Learning objectives
- ·Recognize positional bias as a distinct failure mode of pairwise judges.
- ·Use order-swapped batteries to separate content quality from position effects.
- ·Decide when pairwise scoring is appropriate vs. rubric-based scoring.
1. Set up the comparison
2. Run the battery
The lab runs the judge 3 times with A shown first, then 3 times with B shown first. Identical content, swapped order. If the judge is unbiased, win-rates should match across orderings.