Process 100,000 Claude Requests at Half Price: The Message Batches API

Chris Harper

2 min read

Jun 28, 2026 · 20:32 UTC

Workflow

Developer Tools

Best Practices

TL;DR: Submit up to 100,000 Claude API requests in a single batch, pay 50% of standard rates, get results back within an hour — no streaming required.

If you're running document extraction pipelines, generating test suites, or classifying content at scale, Message Batches halves your API bill. Here's the minimal working pattern:

import anthropic, time
from anthropic.types.message_create_params import MessageCreateParamsNonStreaming
from anthropic.types.messages.batch_create_params import Request

client = anthropic.Anthropic()

# Submit up to 100,000 requests in one call
batch = client.messages.batches.create(
    requests=[
        Request(
            custom_id="doc-001",          # your join key — results arrive unordered
            params=MessageCreateParamsNonStreaming(
                model="claude-sonnet-4-6",
                max_tokens=256,
                messages=[{"role": "user", "content": "Extract the invoice total from: ..."}],
            ),
        ),
        # ... more requests
    ]
)

# Poll until done (most batches finish in under 1 hour)
while True:
    status = client.messages.batches.retrieve(batch.id)
    if status.processing_status == "ended":
        break
    time.sleep(60)

# Iterate results — match back via custom_id
for item in client.messages.batches.results(batch.id):
    if item.result.type == "succeeded":
        print(item.custom_id, item.result.message.content[0].text)

Key numbers:

Up to 100,000 requests or 256 MB per batch
50% off all input and output tokens (e.g. Sonnet 4.6 drops to $1.50/$7.50 per MTok)
Hard limit: 24 hours processing; results stay accessible for 29 days
Streaming, Fast mode, and threads are not supported in batches

Double savings with prompt caching. If your requests share a large shared prefix (a system prompt, a reference document), add cache_control blocks — batch discounts and cache discounts stack. Cache entries expire after 5 minutes so keep a steady flow of batch requests to maintain a warm cache.

Use custom_id as your join key. Results arrive in arbitrary order. Assign a meaningful id to each request up front and use it to stitch results back to your input records — don't rely on ordering.

Sources: Message Batches API — Anthropic Docs

CloudCodeTree

Process 100,000 Claude Requests at Half Price: The Message Batches API