Is "context engineering" really different, or just a rebrand of prompt engineering?

It is different. Prompt engineering optimizes the wording of one instruction. Context engineering decides which documents, which prior turns, which tool results, which system rules, in which order, at what total token cost. The first is sentence-level work; the second is application architecture. They overlap but they are not the same job.

The phrase has been popularized by practitioners over 2024–2025 — Drew Breunig, Andrej Karpathy, and others have used it in public talks and write-ups. Anthropic's own docs use adjacent terms like "input engineering" and "context management" to describe the same discipline. There is no single canonical paper; treat it as a community-converged term rather than an academic one.

Do I still need prompt engineering?

Yes, but for less than you used to. The four fundamentals (instruction, context, examples, output spec) still apply at the prompt level. What changed is that for non-trivial applications, most of the variance in output quality now comes from the context layer — what you put in the window — not from the wording. Wording matters; context matters more.

Prompt Engineering vs. Context Engineering

For the first few years of working with large language models, the job was prompt engineering: read the task, write the cleverest instruction, ship the model. As context windows grew from 4k to 200k to a million tokens, and as tool use and retrieval became standard, the bottleneck moved. Output quality stopped depending mostly on how you worded the instruction and started depending mostly on what else lived in the window alongside it.

That second job is what people now call context engineering.

The working definition

Prompt engineering is sentence-level. Context engineering is system-level.

Concretely, context engineering covers:

Selection — which retrieved chunks, which past conversation turns, which tool results, which examples actually make it into the window.
Ordering — where each piece goes. Instruction first or last? Retrieved chunks before or after the question? System prompt how long?
Compaction — when the history overflows, what gets summarized, what gets dropped, what gets kept verbatim.
Allocation — system prompt vs user prompt vs assistant history vs tool results. How many tokens does each get.
Memory — what persists across sessions, what is rebuilt each time, what gets indexed for retrieval later.

A prompt engineer asks "what should the instruction say?" A context engineer asks "what should the whole input look like by the time the model sees it?" If you are coming up to speed on either discipline, an app built to make you sharper is a low-commitment way to build the foundation before the deeper articles below.

Why the shift happened

Three things converged.

Context windows grew, but using them well stayed hard. Frontier models advertise 200k, 1M, even 10M tokens. Research like Liu et al.'s "Lost in the Middle" showed that the advertised window and the usable window are not the same number. Filling a long context is easy; making the model actually use the middle of it is not. Choosing what to include became more important than how to phrase the include.

Retrieval became standard. Once your application is reading from a knowledge base, the question stops being "what is the best phrasing?" and starts being "which five passages out of 50,000 should the model see?" That is a context problem.

Agents and tool use multiplied the input. Every tool call produces an observation that becomes part of the next turn's context. By turn 10 of a chatty loop, you can be carrying tens of thousands of tokens of intermediate state, and most of the model's output quality depends on what survived from earlier turns. That is a context problem too.

What a context-engineering mindset looks like

A few practical heuristics that fall out of the discipline:

Treat the context window as a budget, not a buffer. You have N tokens. Allocate them deliberately. Track where they go. When you overflow, decide what to cut rather than letting the API decide.
Position matters. Critical instructions at the top of the system message. Critical context immediately before the model's turn. The middle of a long context is the part you trust least.
Smaller is often better. Adding more retrieved chunks does not monotonically improve answers; past a few well-chosen passages, additional chunks usually hurt. Test before you stuff.
Compact aggressively in long sessions. Summaries beat verbatim history once you cross a few thousand tokens of chat. Drop tool observations that nothing downstream needs.
Make context construction observable. Log what went into the window. When the model is wrong, ninety percent of the time you can see why by reading the actual input it saw — not the input you thought it saw.

The rest of this cluster

Each linked article goes deeper on one piece:

How to Choose What Goes in the Context Window — the selection problem and the ordering rules.
Context Rot, Explained — what goes wrong when contexts get long.
Memory Systems for LLM Apps — short-term and long-term patterns.

Prompt engineering will not go away — the four levers from How to Prompt AI Effectively still matter. They just stopped being the whole job.

Comments 2

u/ctx_c3 · 1 month ago

the rename annoyed me at first but its a real distinction once your app has retrieval, tools and history all fighting over the same window
u/mia12 · 1 month ago

honestly context engineering is just prompt engineering once your app gets real. the line between them is blurry but i get why you split them out

Prompt Engineering vs. Context Engineering

Article summary

The working definition

Why the shift happened

What a context-engineering mindset looks like

The rest of this cluster

Frequently asked questions

See also

Where to go next

Comments 2