Subject

Agents & Tool Use

Building LLM agents that actually do useful work: the agent loop, tool calling across major APIs, the Model Context Protocol, and the failure modes that make agents the wrong shape for many problems.

A dark workbench with a robotic hand selecting tools from a labeled rack

An LLM agent sounds harder than it is. The standard shape, taken across the major lab docs and the published practitioner write-ups, fits in a paragraph.

You have a model, a set of tools, and a conversation. On each turn the model produces either a tool call or a final answer. If it is a tool call, you run the tool, append the result to the conversation, and ask the model again. Loop until the model returns a final answer or until you hit a safety cap. That is the whole architecture. It is usually called the ReAct loop (reason + act), from a 2022 paper that named the shape; every major API supports it natively today.

The minimal loop

In pseudocode:

messages = [system_prompt, user_message]

while True:
    response = model.call(messages, tools=tool_specs)

    if response is final_text:
        return response

    for call in response.tool_calls:
        result = execute_tool(call.name, call.arguments)
        messages.append(tool_result(call.id, result))

    if step_count > max_steps:
        return "I could not finish in the allowed number of steps."

Three or four tool calls is a typical task. Twenty is a warning sign. Past a hundred you are usually paying for confusion, not progress.

The pieces that actually matter

The tools. Each tool has a name, a one-sentence description, a JSON schema for arguments, and a strict return shape. The model picks tools based mostly on the descriptions. Vague descriptions produce vague calling. Good descriptions read like the first line of a man page: what it does, what it takes, what it returns. Give the model the smallest set of tools that covers the task.

The system prompt. The role, the scope, the refusal behavior, the output format. Anthropic, OpenAI, and Google all weight system messages higher than user messages, which is exactly what you want for the parts that should not drift across the loop.

The context budget. The conversation grows by every tool call and result. By turn 10 of a chatty loop you can be carrying tens of thousands of tokens of intermediate observations. Trim aggressively: summarize old observations, drop completed scratchpad sections, keep only what the next decision actually needs.

The cost ceiling. Always cap max steps. Always cap dollars per task. Always cap risky tool usage (writes, network calls, anything that costs money or sends messages). An agent without caps is one bad loop away from a six-figure mistake.

When to graduate from the basic loop

Three patterns earn their complexity:

Planner + executor. A more capable model produces a structured plan; a cheaper model executes each step in its own ReAct loop. Pays off when tasks have many steps or when planning and execution benefit from different model sizes.

Critic / self-review. A separate model call inspects partial output, catches hallucinations or unsafe actions, and either flags them or asks the planner to revise. Pays off when wrong answers are expensive and detection is cheaper than prevention.

Multi-agent split. Several agents with different roles communicate via a shared scratchpad or a message bus. Pays off rarely. Most "multi-agent" architectures we have seen would be better as one well-scoped agent with smarter tools.

Build the simple loop first. Add a planner when you can name the specific failure that demands one. Add a critic when you can show a wrong answer that the critic would have caught. Resist multi-agent until you have a real, documented reason.

Where to read the primary docs

The shape across the three vendors is now genuinely standard. If you can build the loop with one API, the others are minor adaptations.

Forthcoming

  • Tool Definitions That Actually Work
  • Agent Evals Explained
  • Multi Agent Vs Single Agent

Where to go next

A short editorial reading list. Pick whichever fits how you like to learn.

  • NerdSip: 5-minute AI micro-course on almost any topic, on iOS and Android