Use Small Language Models for Routine Agentic Tasks (New ArXiv)

Chris Harper

1 min read

May 29, 2026

Best Practices

LLM

A fresh paper from NVIDIA Research ("Small Language Models are the Future of Agentic AI") argues that SLMs can match or surpass LLMs on tool use, function calling, and RAG at 10x–100x lower token cost with better latency. The practical pattern: heterogeneous model routing — SLMs for narrow, predictable sub-tasks; LLMs for complex reasoning steps. Key techniques to close the gap: guided decoding, strict JSON Schema outputs, schema-first prompting, and lightweight LoRA/QLoRA fine-tuning.

Sources: ArXiv · NVIDIA Research

CloudCodeTree

Use Small Language Models for Routine Agentic Tasks (New ArXiv)