Give Claude a Verifier, Not Just a Task: /goal, Stop Hooks, and Evidence-Based Confirmation

Chris Harper

2 min read

Jun 24, 2026 · 21:09 UTC

Workflow

Claude Code

Best Practices

TL;DR: Claude's default "done" signal is visual — the task looks complete. A /goal condition, Stop hook, or reviewer subagent closes the loop so Claude catches its own mistakes before you do.

The most common source of subtle bugs in autonomous Claude Code sessions isn't hallucination — it's the absence of a mechanically-failing check. Without a verifier, Claude stops when the work looks done, and you become the review loop.

Three tiers of verifier, in increasing setup cost:

Tier 1: In-prompt verification (zero setup)

Describe what pass looks like in the task itself:

Implement validateEmail(). Test cases: user@example.com → true, invalid → false, user@.com → false.
Run the tests after implementing and iterate until they all pass.

Works immediately. Use for any task where you can describe the acceptance criteria in one sentence.

Tier 2: /goal conditions (session-level gate)

Set a condition Claude must satisfy before stopping:

/goal All unit tests pass, TypeScript reports zero errors, and pnpm build exits 0

A separate evaluator re-checks after every turn. Claude keeps working until the condition holds.

Tier 3: Stop hooks (deterministic gate)

Add to .claude/settings.json:

{
  "hooks": {
    "Stop": [{"command": "pnpm test && pnpm build"}]
  }
}

The hook runs before Claude can end a turn. If it fails, Claude gets the output and retries. Claude Code overrides after 8 consecutive failures — a built-in safety valve. Use this for CI-equivalent gates on unattended runs.

Evidence over assertions

Pair any verifier with this habit: ask Claude to show evidence, not just claim success.

Fix the session timeout bug. When done, show me the failing test you wrote and
the output confirming it passes — don't just say it's fixed.

Reviewing evidence (test output, build exit code, a screenshot) is faster than re-running verification yourself and works for sessions you weren't watching.

For autonomous runs: chain a reviewer subagent

When done, use a subagent to review the diff against the requirements — report
gaps in correctness or coverage only, not style preferences.

The reviewer runs in a fresh context with no bias toward the implementation. Gaps come back to the main session for immediate fixes — no copying between windows.