A chat-shaped LLM product is two things at once: a place where the user can have a conversation, and a place where the user can edit, retry, and rewind that conversation. The first part is obvious. The second part is where products diverge, because the editing mechanics are also context-management decisions in disguise.

This article works through the three or four mechanics that every chat product needs, the design choices that shape how they feel, and the trade-offs in cost and clarity that each one carries.

The mental model: conversation as editable text, not as a log

The wrong frame is "conversation = irreversible log of turns." That is how chat apps for humans work (Slack, iMessage), and it leaks into LLM products by default because the codebase looks similar. In an LLM product, every turn was generated by a probabilistic process, and the user is going to want to redo, edit, or fork some of them. Designing as if the log is irreversible forces users to start over for every problem.

The right frame is "conversation = editable script that the model is helping write." The user can rewind, rewrite, and replay. The history is mutable. The mechanics below are the affordances that make that editability concrete.

Mechanic 1: regenerate the last response

The cheapest and most universal mechanic. The user clicks "regenerate" (or the equivalent affordance) and the model produces a new response to the same input. Everything before the regeneration point is preserved.

Design choices to make:

  • Show or hide the previous response? ChatGPT replaces it with the new one; Claude.ai puts both behind a "1 of 2" pager so the user can flip between them. The pager is friendlier — losing a response you liked while regenerating is a frustration nobody needs.
  • Allow a different model on regenerate? "Regenerate with [larger model]" or "Regenerate with [faster model]" is a power-user move that pays back in feedback density. ChatGPT exposes this; Claude.ai has historically not.
  • Preserve cache? The conversation prefix is unchanged on regenerate, so prompt caching still works. Only the final-turn output differs. This is the cheapest editing mechanic for that reason.

The single most common product-quality complaint about chat LLMs in 2024–2026 user research was "the answer was almost right and I had no way to nudge it." A regenerate button — plus the next two mechanics — is the answer.

Mechanic 2: edit and resubmit the user message

The user realizes their last message was unclear, too long, missing context, or just wrong. They want to fix it without starting a new chat. Edit-and-resubmit does this: the user message becomes editable in place, the user saves the edit, the model re-responds, and the original (now-stale) response is discarded.

Design choices:

  • Should the discarded response stay accessible? Most products discard it; some keep it behind an "earlier version" toggle. The toggle is rarely used in practice and clutters the UI; discarding is fine if the action is undoable.
  • Should the edit be confirmed? A confirmation dialog ("This will discard the current response, continue?") is annoying on every edit and useful exactly once. Skip it; rely on undo instead.
  • What about edits to user messages that are not the last one? ChatGPT and Claude both support this. Editing a message three turns back invalidates everything after it. This is the case where users most need a clear visual signal that "everything after this point will be regenerated" — make the discarded turns visibly fade or disappear during the confirmation.
  • Cache implications. Editing a message invalidates the cache for everything from that message onward. For long conversations with expensive caching, an edit can be a meaningful re-cost. Worth being aware of, not worth surfacing to the user.

Mechanic 3: branching (forking from a turn)

Instead of replacing or discarding the previous response, the user creates a parallel thread from a chosen turn. The original thread continues to exist; the new one diverges and explores an alternative.

Where branching shines:

  • Coding agents. "Try this fix in branch A; try this other fix in branch B; compare the results." Cursor's agent mode and Aider both support this for exactly this reason.
  • Research tools. "What if I asked this slightly differently? What if I asked for a different format?"
  • Creative writing. "Generate three openings; keep the one I like; discard the rest."

Where branching is overkill:

  • General-purpose chat. The mental model is "one conversation"; branches turn it into a tree, and most users do not want to manage a tree.
  • Customer support and similar single-thread workflows. A linear history is the right metaphor.

Design choices when you ship it:

  • Make the branch visible. A small tree visualization or a tab strip showing "Thread A / Thread B" is the minimum. Hidden branches confuse users.
  • Default branches to the same model and parameters as the parent, with an explicit affordance to change.
  • Cache benefits. Branches share the cached prefix up to the branch point. Forking is cheaper than starting fresh for that reason.
  • Cleanup. Users accumulate branches they no longer care about; provide a way to merge a branch back to main or to discard branches without ceremony.

Mechanic 4: undo

The escape hatch. The user did something — edited a message, regenerated, branched, deleted — and wants the previous state back. The most common reason: a regenerate came out worse than the original.

Most products skip this. They should not. The cost is small (one extra state to remember) and the UX win is real, because it lets users experiment without anxiety. Without undo, every regenerate is a small risk — the new response might be worse and the good one is gone.

Implementation pattern: keep the immediate previous state of the conversation in memory, expose it as "undo" via a button or keyboard shortcut. Do not try to ship a full multi-step undo stack — it is more product surface than the value justifies. One step back is enough.

Cost-aware editing

Each mechanic has a different cost profile:

  • Regenerate: Cache stays valid for the conversation prefix. You pay for one new last-turn response. Cheapest.
  • Edit and resubmit (last message): Cache stays valid up to the prior model response, invalidates from the edited user message. You pay for one new last-turn response.
  • Edit a turn in the middle: Cache invalidates from the edited turn onward. You pay for re-generating every turn after the edit, if your product chooses to do so (some do; some just discard them and let the user continue manually).
  • Branch: Cache valid up to the branch point. Each branch pays separately from there.

For high-volume products with caching enabled, this is worth a back-of-envelope cost model. The expensive operation is editing far back in a long conversation; the cheap one is regenerate. Designing the affordances so the cheap operation feels like the default helps both the bill and the user's sense of how the product behaves.

Conversation hygiene: when to summarize, when to clear

Even with good editing mechanics, conversations get long. Context costs grow, latency grows, the model's attention drifts. Two affordances worth offering:

  • "Start a new conversation" — clean break. Should be visible enough that users find it without hunting; not so prominent that they hit it by accident.
  • "Summarize and continue" — the model writes a paragraph summarizing the conversation so far; the conversation continues with that summary in place of the verbatim history. Useful when the user wants to keep going but the context is getting expensive or the model is losing the thread.

The latter is more work to ship well (you have to design how the summary appears in the UI, what the user can edit, what gets preserved) and is worth doing once you have a class of users who hit context limits in normal use.

What ChatGPT, Claude, and Cursor each get right

A quick comparison of how three well-shipped products handle these mechanics:

  • ChatGPT ships regenerate-with-model-choice and edit-message well. Branching is missing for most contexts; the workaround is "new chat from this message."
  • Claude.ai ships regenerate as a pager (1 of N) which preserves prior attempts, and supports message editing. Branching also missing in the main chat surface.
  • Cursor ships branching as a first-class feature in agent mode because the use case demands it. Regenerate is implicit in the diff-accept flow.

There is no single right answer; the mechanics map to the product's use case. The mistake is shipping a chat product with none of these and hoping users will "start a new conversation" every time they get an imperfect response. They will not. They will just stop using it.