Frontier topics: multi-modal, long-horizon, self-improving evals

Where the field is heading in 2026 and beyond.

Step 1 of 14

The QA-for-AI playbook is being rewritten as systems become multi-modal, take longer-horizon actions, and increasingly evaluate themselves. This is the survey of where the field actually is in 2026 — what's working, what's broken, and what the field is still figuring out.

← Cost and latency as quality signals