How do I know if my task is single-prompt vs agent shaped?

Ask: can a competent human do this in one read-and-respond pass with no external lookups? If yes, single prompt. Does it need 2–3 deterministic steps that always happen in the same order? Prompt chain. Does the model need to choose what to do next based on a previous tool result? That is where agents start to earn their cost.

Is RAG always cheaper than an agent?

For document-shaped tasks (question answering over a corpus, summarization over a knowledge base), almost always yes. RAG retrieves, the model answers, you are done. Agents add cost only when retrieval alone cannot decide what to do next. If your task is "find the answer in these docs," start with RAG.

What is the most common agent failure I should worry about?

Cost runaway, by a wide margin. An agent without strict max-step and max-cost caps will, sooner or later, find a loop and burn money in it. The runner-up is tool-selection error — the model picks the wrong tool or invokes the right tool with wrong arguments, often after a confidently-stated reasoning step. Caps protect against both.

When Not to Use an Agent

Agents are useful. Agents are also expensive, slow, and harder to reason about than the simpler alternatives. Most "we should build an agent for this" conversations would land somewhere better if the team paused to ask whether a single prompt, a prompt chain, or a RAG setup would do the same job for an order of magnitude less effort.

When a simpler shape wins

Single prompt. If a task can be done in one call with structured output, do it in one call. Examples: extract action items from a transcript, classify a ticket, rewrite a paragraph, translate a sentence. A reasoning-mode model handles "think before answering" internally; you do not need an external loop to coordinate it.

Prompt chain. If a task has 2–4 deterministic steps that always happen in the same order ("first extract, then enrich, then summarize"), a prompt chain is more reliable than an agent. Each step is its own narrow prompt with a clear evaluation. Failures attribute cleanly to the step that failed.

Retrieval-augmented generation. If a task is "answer questions about this knowledge base," RAG plus a reasoning model beats most agents. Retrieve the relevant chunks, send them with the question, generate the answer. No tool-selection loop, no planner, no orchestrator. The model never has to decide "should I call the search tool?" because you always do.

Workflow graph. If a task has branching but the branching is deterministic ("if this is a refund request, go down path A; otherwise, path B"), a workflow graph beats an agent. Use the model for the classification call; use code for the routing.

In each of those cases, an agent will produce the same output as the simpler shape, but at higher cost, higher latency, more places to fail, and less observability.

When an agent actually earns it

A real agent shape pays off when:

The task has many steps and the order is not predictable in advance.
Each step's choice of tool depends on previous steps' results.
The task may take 2–20 tool calls and you cannot tell which up front.
The model genuinely needs to plan, reflect, and adapt — not just execute a recipe.

Examples that fit: debugging a system from a vague bug report, writing and refining code while running tests, doing open-ended research over the web, automating a multi-step internal workflow with branching.

If you cannot describe what makes your task look like that, you probably do not have an agent task.

The failure modes you will hit

Three failures show up repeatedly with production agents:

Loop-stuck behavior. The model keeps calling the same tool with slightly different arguments, never converging. Usually a sign that the tool result is missing the information the model needs, or that the model's understanding of the task is wrong. Cap max steps. Log every step. When stuck, escalate to a human rather than running for another ten turns.

Tool-selection errors. The model picks the wrong tool, or the right tool with wrong arguments, often after a confidently-stated reasoning step. Mitigations: give fewer tools at a time, write sharper tool descriptions, and (where possible) constrain the tool list per task type via a router rather than exposing every tool every turn.

Cost explosions. The conversation grows by every tool call and result. The model rereads everything on every turn. A 30-step loop on a frontier model can quietly hit four-figure bills. Always set a max-spend cap. Always trim observation history. Always alert when a single task exceeds expected cost.

A working test

Before you build an agent, try this:

Write the simplest one-prompt version of the task. Run it on 10–20 representative inputs.
If it fails on a noticeable fraction, write the simplest two- or three-step chain version. Re-test.
If it still fails, add a retrieval step. Re-test.
Only if all three of the above fail do you have a problem that needs an agent.

Most projects discover at step 1 or 2 that they did not need an agent at all.

The honest summary

Agents are a useful tool with a real overhead curve. Use them when the task genuinely earns the overhead. The minute you can replace the agent with a chain or a single prompt without losing quality, the simpler shape is the better engineering. The marketing pressure is in the other direction; resist it.

When Not to Use an Agent

Article summary

When a simpler shape wins

When an agent actually earns it

The failure modes you will hit

A working test

The honest summary

Frequently asked questions

See also

Where to go next