
Process 100,000 Claude Requests at Half Price: The Message Batches API
Chris Harper
2 min read
Jun 28, 2026 · 20:32 UTC
TL;DR: Submit up to 100,000 Claude API requests in a single batch, pay 50% of standard rates, get results back within an hour — no streaming required.
If you're running document extraction pipelines, generating test suites, or classifying content at scale, Message Batches halves your API bill. Here's the minimal working pattern:
import anthropic, time
from anthropic.types.message_create_params import MessageCreateParamsNonStreaming
from anthropic.types.messages.batch_create_params import Request
client = anthropic.Anthropic()
# Submit up to 100,000 requests in one call
batch = client.messages.batches.create(
requests=[
Request(
custom_id="doc-001", # your join key — results arrive unordered
params=MessageCreateParamsNonStreaming(
model="claude-sonnet-4-6",
max_tokens=256,
messages=[{"role": "user", "content": "Extract the invoice total from: ..."}],
),
),
# ... more requests
]
)
# Poll until done (most batches finish in under 1 hour)
while True:
status = client.messages.batches.retrieve(batch.id)
if status.processing_status == "ended":
break
time.sleep(60)
# Iterate results — match back via custom_id
for item in client.messages.batches.results(batch.id):
if item.result.type == "succeeded":
print(item.custom_id, item.result.message.content[0].text)
Key numbers:
- Up to 100,000 requests or 256 MB per batch
- 50% off all input and output tokens (e.g. Sonnet 4.6 drops to $1.50/$7.50 per MTok)
- Hard limit: 24 hours processing; results stay accessible for 29 days
- Streaming, Fast mode, and threads are not supported in batches
Double savings with prompt caching. If your requests share a large shared prefix (a system prompt, a reference document), add cache_control blocks — batch discounts and cache discounts stack. Cache entries expire after 5 minutes so keep a steady flow of batch requests to maintain a warm cache.
Use custom_id as your join key. Results arrive in arbitrary order. Assign a meaningful id to each request up front and use it to stitch results back to your input records — don't rely on ordering.
Sources: Message Batches API — Anthropic Docs