The QA-for-AI curriculum
A working engineer's path through the field. Foundations to frontier — read in any order, but if you're new, start with What is QA for AI?
Foundations
3 lessonsWhat is QA for AI?
Why testing non-deterministic systems demands a new playbook.
Designing evals that actually catch regressions
From vibes to a dataset that pays rent.
Human labeling and calibration
Your eval set is only as honest as the humans who labeled it.
Intermediate
5 lessonsLLM-as-judge: useful, biased, calibratable
Make a model grade another model — without lying to yourself.
Evaluating structured outputs
Parse rate is not correctness — they're two different evals.
Evaluating RAG: retrieval and generation are different problems
If you grade end-to-end you'll never know what's broken.
CI for prompts, models, and tools
Treat prompts like code — but accept that the build is probabilistic.
Cost and latency as quality signals
A perfect answer the user never waited for is a failed answer.
Advanced
3 lessonsAgent evals: trajectories, not outcomes
When the system uses tools, only grading the final answer is malpractice.
Red-teaming and adversarial testing
If you don't break it, your users will.
Drift, observability, and the production loop
Pre-launch evals are necessary; production telemetry is sufficient.