bonsai
← Labs
Intermediate
~10 min
requires API key

Pairwise & positional bias

Same content, swapped order — watch the judge change its mind.

Two candidate responses (A and B), a pairwise judge that picks a winner, and a battery that runs both orderings. The lab reports A/B win counts and the positional gap — how often the first-shown candidate wins regardless of content.

Learning objectives
  • ·Recognize positional bias as a distinct failure mode of pairwise judges.
  • ·Use order-swapped batteries to separate content quality from position effects.
  • ·Decide when pairwise scoring is appropriate vs. rubric-based scoring.

1. Set up the comparison

2. Run the battery

The lab runs the judge 3 times with A shown first, then 3 times with B shown first. Identical content, swapped order. If the judge is unbiased, win-rates should match across orderings.