Let Claude Decide How Hard to Think: Adaptive Reasoning and Task Budgets in Your Agent Loop

Chris Harper

2 min read

Jul 5, 2026 · 04:15 UTC

Workflow

Agents

LLM

Best Practices

Replace fixed extended-thinking budgets with thinking: {type: "adaptive"} — Claude scales reasoning depth per request, and the task_budget beta header prevents runaway agent loops.

If you've used Claude's extended thinking API, you've hit the guessing game: budget_tokens: 4000 is wasteful on easy questions and too small on hard ones. Adaptive thinking eliminates the guess.

Drop-in replacement

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-8-20250922",
    max_tokens=16384,
    thinking={"type": "adaptive", "effort": "high"},
    messages=[{"role": "user", "content": your_prompt}]
)

The effort levels (low, medium, high, max) nudge how aggressively Claude applies reasoning — use high for complex agent tasks, medium for day-to-day coding, low for routing and classification where speed beats depth. On Opus 4.7+, xhigh is also available.

Note: For Opus 4.7 and later models, the old budget_tokens field returns a 400 error — switch to adaptive + effort entirely.

Why it matters for agents: interleaved thinking

Adaptive mode auto-enables interleaved thinking — Claude reasons between tool calls, not just at the start of the turn. A search-retrieve-summarize loop becomes:

Think → plan the search
Call search(query) → get results
Think → evaluate results, plan next call
Call read(url) → get content
Think → draft summary
Output

Without interleaved thinking, Claude forms its whole plan upfront before seeing any tool results. Adaptive mode fixes this at no extra config cost.

task_budget beta: cap an entire loop

For long-running agent loops, add the task-budgets-2026-03-13 beta header and enable task budgets:

response = client.beta.messages.create(
    model="claude-opus-4-8-20250922",
    max_tokens=16384,
    thinking={"type": "adaptive", "effort": "high"},
    task_budget={"type": "enabled"},
    betas=["task-budgets-2026-03-13"],
    messages=conversation_history,
)

Claude sees a server-injected countdown showing remaining tokens across the loop — it self-moderates and wraps up gracefully when budget runs low, rather than truncating mid-thought. Check usage.task_budget_tokens_used in the response to profile your loops.

Sources: Adaptive thinking — Anthropic Platform Docs | Task budgets — Anthropic Platform Docs

CloudCodeTree

Let Claude Decide How Hard to Think: Adaptive Reasoning and Task Budgets in Your Agent Loop

Drop-in replacement

Why it matters for agents: interleaved thinking

task_budget beta: cap an entire loop