bonsai
Blog

Field notes on AI quality

Opinions, observations, and arguments from the front line of evaluating AI systems in production. Written for the engineers and leaders who have to ship.

Latest
May 4, 2026 · ~7 min read
evals
strategy
thought-leadership

The eval set is the product

Models swap. Prompts get rewritten. Harnesses get rebuilt. The eval set is the only artifact that compounds. Most teams treat it like test infrastructure — and pay for it twice.

By James Kip

More posts