bonsai
← Curriculum
Advanced
~18 min
red-team
jailbreaks
prompt-injection

Red-teaming and adversarial testing

If you don't break it, your users will.

Step 1 of 13

Red-teaming is structured adversarial testing: deliberately trying to make the system produce unsafe, off-policy, or incorrect outputs. Done well, it produces a regression suite that catches future failures. Done poorly, it's anecdote collection.