
Let Claude Decide How Hard to Think: Adaptive Reasoning and Task Budgets in Your Agent Loop
Chris Harper
2 min read
Jul 5, 2026 · 04:15 UTC
Replace fixed extended-thinking budgets with thinking: {type: "adaptive"} — Claude scales reasoning depth per request, and the task_budget beta header prevents runaway agent loops.
If you've used Claude's extended thinking API, you've hit the guessing game: budget_tokens: 4000 is wasteful on easy questions and too small on hard ones. Adaptive thinking eliminates the guess.
Drop-in replacement
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-8-20250922",
max_tokens=16384,
thinking={"type": "adaptive", "effort": "high"},
messages=[{"role": "user", "content": your_prompt}]
)
The effort levels (low, medium, high, max) nudge how aggressively Claude applies reasoning — use high for complex agent tasks, medium for day-to-day coding, low for routing and classification where speed beats depth. On Opus 4.7+, xhigh is also available.
Note: For Opus 4.7 and later models, the old budget_tokens field returns a 400 error — switch to adaptive + effort entirely.
Why it matters for agents: interleaved thinking
Adaptive mode auto-enables interleaved thinking — Claude reasons between tool calls, not just at the start of the turn. A search-retrieve-summarize loop becomes:
- Think → plan the search
- Call
search(query)→ get results - Think → evaluate results, plan next call
- Call
read(url)→ get content - Think → draft summary
- Output
Without interleaved thinking, Claude forms its whole plan upfront before seeing any tool results. Adaptive mode fixes this at no extra config cost.
task_budget beta: cap an entire loop
For long-running agent loops, add the task-budgets-2026-03-13 beta header and enable task budgets:
response = client.beta.messages.create(
model="claude-opus-4-8-20250922",
max_tokens=16384,
thinking={"type": "adaptive", "effort": "high"},
task_budget={"type": "enabled"},
betas=["task-budgets-2026-03-13"],
messages=conversation_history,
)
Claude sees a server-injected countdown showing remaining tokens across the loop — it self-moderates and wraps up gracefully when budget runs low, rather than truncating mid-thought. Check usage.task_budget_tokens_used in the response to profile your loops.
Sources: Adaptive thinking — Anthropic Platform Docs | Task budgets — Anthropic Platform Docs