bonsaiCultivate AI you can trust

Foundations

5 questions · ~5 min

Foundations of QA for AI

Mental models, definitions, and the seven-layer taxonomy.

Progress

0 of 5 answered · 0 correct

Q1

You add a new prompt revision and the eval pass rate drops from 86% to 83% on a 50-case set. The team's standard deviation across re-runs is ~4 points. What's the right call?

Q2

Which is NOT a typical bias of LLM-as-judge?

Q3

Your eval set is 30 hand-picked edge cases from team intuition. Pass rate is 90%. What's the most important next step?

Q4

Which sentence best captures the difference between groundedness and faithfulness in RAG eval?

Q5

You're evaluating a tool-using agent. Final-task success rate is 85%. What's a critical signal you're probably missing?