bonsai
← Curriculum
Advanced
~19 min
agents
tool-use
trajectories

Agent evals: trajectories, not outcomes

When the system uses tools, only grading the final answer is malpractice.

Step 1 of 13

Agentic systems plan, call tools, observe, and iterate. A correct final answer can hide an inefficient or unsafe trajectory. An incorrect final answer can come from a perfect plan that ran into a flaky tool. Outcome-only evals are malpractice for agents.