Prompt Chaining — Smaller Steps, Better Answers

Article summary

A chain is two or more prompts where each step's output is the next step's input. Chains beat single prompts when the task has natural steps the model would otherwise blur together, when intermediate output is useful on its own, or when each step needs different context. Chains are pure overhead when the model could do it in one pass.

A prompt chain is a tiny pipeline. Step one takes the input, produces output A. Step two takes output A (and maybe the original input), produces output B. Repeat as needed. The whole thing looks like a function composed of model calls.

When chaining earns its weight

A chain pays off in three cases:

The steps need different context. Step one is "summarize this 80-page report." Step two is "based on that summary, recommend the three biggest decisions for a CFO." The second prompt does not want the full report; it wants the summary. Stuffing both into one prompt makes the model average across them.
An intermediate result is useful. If you are extracting structured data, the parsed JSON might be used by other systems even when no further LLM call happens. Chaining gives you a clean handoff point.
One step is cheap and fragile, the next is expensive and unforgiving. Validate cheaply first ("is this query in scope?"), only call the expensive model after.

When chaining is overhead

A chain is unnecessary when the task fits in one prompt without forcing the model to context-switch. Rewriting a paragraph, answering a single question, drafting an email — these do not need pipelines. Chaining them just adds latency, doubles the cost, and creates more places to fail.

A useful test: would you, as a human doing this task, want to write a draft, hand it to a colleague, and have them do the next bit? If yes, chain. If you would just do it in your head, one prompt is enough.

A working example

A common shape: extract entities, then enrich.

# Step 1: extract
prompt_1 = """
Extract every company mentioned in the email below.
Return JSON: { "companies": ["..."] }

Email:
""" + email

companies = call_model(prompt_1)

# Step 2: enrich each, in parallel
for company in companies:
    prompt_2 = f"""
    Write a one-sentence summary of {company}, focused on what they do.
    Return JSON: {{ "name": "...", "summary": "..." }}
    """
    summary = call_model(prompt_2)

Two prompts, each narrow. Step one cares only about extraction. Step two cares only about a one-sentence summary of one company. Each step has a tight evaluation: "did it pull all the companies?" and "is the summary accurate and one sentence?" If you tried both in one prompt, you would get sloppier output and harder failure attribution.

What chains don't fix

Chains are a structural pattern, not a knowledge fix. If the model is wrong about the underlying facts, no amount of chaining helps. Pair chains with retrieval (RAG) when the wrongness is about information, not about thinking.

The handy rule: chain when each step has a different job, not when each step has a different paragraph.

Frequently asked questions

Does chaining beat one prompt with good structure?

Sometimes. Chain when a step needs different context than the next step, or when an intermediate result has value on its own, or when an upstream mistake is much cheaper to catch in isolation. Otherwise one well-shaped prompt is faster, cheaper, and just as accurate.

How long can a chain get?

Two to four steps in most production setups. Past that, errors compound and latency dominates. If you find yourself building a 9-step chain, that is usually a sign you should be using an agent loop with tool calls, not a hard-coded pipeline.

Do I need a framework?

No. A chain is just two or three function calls and some parsing in between. Frameworks (LangChain, LlamaIndex, etc.) help when you also want retrieval, retries, observability, and standardization across many chains. For one chain, plain functions are clearer.