Pairwise & positional bias

Same content, swapped order — watch the judge change its mind.

Two candidate responses (A and B), a pairwise judge that picks a winner, and a battery that runs both orderings. The lab reports A/B win counts and the positional gap — how often the first-shown candidate wins regardless of content.

Learning objectives

·Recognize positional bias as a distinct failure mode of pairwise judges.
·Use order-swapped batteries to separate content quality from position effects.
·Decide when pairwise scoring is appropriate vs. rubric-based scoring.

1. Set up the comparison

Prompt

Candidate A

Candidate B

Criterion the judge will apply

2. Run the battery

Runs per order

The lab runs the judge 3 times with A shown first, then 3 times with B shown first. Identical content, swapped order. If the judge is unbiased, win-rates should match across orderings.