A prompt chain is a tiny pipeline. Step one takes the input, produces output A. Step two takes output A (and maybe the original input), produces output B. Repeat as needed. The whole thing looks like a function composed of model calls.
When chaining earns its weight
A chain pays off in three cases:
- The steps need different context. Step one is "summarize this 80-page report." Step two is "based on that summary, recommend the three biggest decisions for a CFO." The second prompt does not want the full report; it wants the summary. Stuffing both into one prompt makes the model average across them.
- An intermediate result is useful. If you are extracting structured data, the parsed JSON might be used by other systems even when no further LLM call happens. Chaining gives you a clean handoff point.
- One step is cheap and fragile, the next is expensive and unforgiving. Validate cheaply first ("is this query in scope?"), only call the expensive model after.
When chaining is overhead
A chain is unnecessary when the task fits in one prompt without forcing the model to context-switch. Rewriting a paragraph, answering a single question, drafting an email — these do not need pipelines. Chaining them just adds latency, doubles the cost, and creates more places to fail.
A useful test: would you, as a human doing this task, want to write a draft, hand it to a colleague, and have them do the next bit? If yes, chain. If you would just do it in your head, one prompt is enough.
A working example
A common shape: extract entities, then enrich.
# Step 1: extract
prompt_1 = """
Extract every company mentioned in the email below.
Return JSON: { "companies": ["..."] }
Email:
""" + email
companies = call_model(prompt_1)
# Step 2: enrich each, in parallel
for company in companies:
prompt_2 = f"""
Write a one-sentence summary of {company}, focused on what they do.
Return JSON: {{ "name": "...", "summary": "..." }}
"""
summary = call_model(prompt_2)
Two prompts, each narrow. Step one cares only about extraction. Step two cares only about a one-sentence summary of one company. Each step has a tight evaluation: "did it pull all the companies?" and "is the summary accurate and one sentence?" If you tried both in one prompt, you would get sloppier output and harder failure attribution.
What chains don't fix
Chains are a structural pattern, not a knowledge fix. If the model is wrong about the underlying facts, no amount of chaining helps. Pair chains with retrieval (RAG) when the wrongness is about information, not about thinking.
The handy rule: chain when each step has a different job, not when each step has a different paragraph.