← Curriculum
Advanced
~19 minagents
tool-use
trajectories
Agent evals: trajectories, not outcomes
When the system uses tools, only grading the final answer is malpractice.
Step 1 of 13
Agentic systems plan, call tools, observe, and iterate. A correct final answer can hide an inefficient or unsafe trajectory. An incorrect final answer can come from a perfect plan that ran into a flaky tool. Outcome-only evals are malpractice for agents.