Hands-on labs, wired to live Claude calls
Each lab calls a server route that proxies to the Anthropic API with prompt caching, structured outputs, and adaptive thinking. Bring your own API key — it's stored in your browser only and sent through on each lab call. Calls run on the model claude-opus-4-7.
Your Anthropic API key
Stored only in this browser's localStorage. Sent to this app's server only when a lab makes a Claude call, used once, and never logged or persisted server-side. Use a key scoped to a sandbox workspace.
Build an LLM-as-judge
Author a rubric, judge a real generation, see the bias.
Rubric builder & dry-run
Iterate on a rubric and see how scores change live.
Prompt-injection sandbox
Attack a system prompt — and see the defenses.
Groundedness checker
Score per-claim faithfulness against a context window.
Pairwise & positional bias
Same content, swapped order — watch the judge change its mind.
Agent trajectory critique
Grade the agent's path, not just its answer.