Every shipped LLM product hits the same set of failure states, and the difference between a polished product and a frustrating one is largely whether those states were designed for or not. A generic "Something went wrong" with a refresh button is the worst possible answer to almost any LLM failure. It abandons the user at the moment they need the most help.
This article is the catalog of failure states worth designing for, and the copy and recovery patterns that work for each.
The eight failure shapes
The error space for an LLM call breaks roughly into eight named shapes. Each one comes from a different layer (provider, network, model behavior, your own constraints), and each one has a different recovery path.
1. Refusal. The model declined to answer. Could be a safety refusal, a capability refusal ("I cannot do that"), or a "this is outside my knowledge" hedge. The right state: name what the model would not do, suggest a refined question, and (for genuine capability gaps) offer to escalate to a more capable model if your product supports it.
2. Content filter. The provider's safety system blocked the request or the response. Distinct from refusal because the user did not get a model-generated answer at all. Copy needs to be careful — say what triggered the filter without being preachy, and let the user rephrase. Avoid the "your content violated our policies" tone; users find it patronizing and often the filter was over-triggered.
3. Rate limit. The provider returned 429. Either you (the app) are over your quota, or the user is rate-limited at your application layer. Recovery is "try again in N seconds" — and if you know N, say it. If the rate limit is per-user (you implement it), give the user a concrete window or upgrade path. If it is per-app, an honest "we are at capacity, try again in a minute" beats blaming the user.
4. Timeout. The model took too long. Often the underlying provider is overloaded and responses are crawling; sometimes the call is just legitimately long. Recovery: offer to retry, optionally with a faster model. Do not auto-retry silently with the same model — you will eat cost and the user will not know what is happening.
5. Context overflow. The conversation got long enough to exceed the model's context window. Recovery: offer to start a new conversation, or to summarize the current one. This is the one place a "summarize and continue" feature pays off, and the lack of one is a common reason users hit a dead end.
6. Model overloaded. Provider-specific (Anthropic's overloaded_error, OpenAI's 503 service_unavailable). The provider is healthy enough to receive the request but cannot fulfill it right now. Recovery: retry with backoff is fine for backend jobs; for user-facing calls, surface "the model is busy, try again in a minute or switch to [smaller model]" and let the user decide.
7. Network drop. The connection dropped mid-stream. Common on flaky mobile networks. Recovery: keep the partial response visible (it is genuinely useful), explain that the response was interrupted, offer to continue or retry.
8. Partial response with completion failure. The model started streaming and then errored mid-response (rare but real — usually upstream provider issue). Same pattern as the network drop: preserve the partial, name the failure, offer a continuation path.
The mistake: generic errors
The single biggest mistake in LLM error UX is collapsing all eight failure shapes into one generic "Something went wrong, please try again" state. It is the default that Next.js, Remix, and most SDK error boundaries produce; it is also useless for the user because:
- It does not name the cause, so the user cannot tell if it is their fault or the system's.
- It does not offer a specific recovery, so the user can only refresh and hope.
- It is identical across all eight states, which trains users to assume "this product breaks a lot" rather than "I hit a known edge case."
Replacing the generic state with eight named states is a one-week project that meaningfully changes how users perceive product reliability. It is not a refactor — it is a copy and routing change.
What good copy looks like
Three principles for error and refusal copy, with examples.
Name what happened. Avoid "Something went wrong." Prefer "This conversation is too long" or "The model is at capacity" or "We could not reach the model."
Offer a concrete next step. Avoid "Please try again." Prefer "Start a new conversation," "Try again in a minute," "Switch to the faster model," "Continue from where the response stopped."
Match tone to severity. A rate-limit on a free tier is not a catastrophe — keep it light. A content-filter trip on a healthcare app needs more care — be specific about what triggered and what to rephrase, without lecturing.
Worth studying: Claude.ai's overloaded message ("Claude's servers are overloaded right now, please try again shortly"). It names the cause, sets expectations, and does not blame the user. The same screen on a less-designed product would say "Error 503."
Refusal copy specifically
Refusals are the trickiest because they are partly intentional (the model is supposed to refuse certain things) and partly over-triggered (the model refuses things it should answer). Three patterns that work:
Show what the model said. The model's own refusal text is often the most useful explanation — "I cannot help with X because Y." Surface it rather than replacing it with a generic message.
Make the rephrase path obvious. "Did you mean to ask about [reformulated version]?" works when you can suggest a less-trigger-y version. Failing that, a "rephrase your question" button is better than a dead end.
Distinguish refusal from filter. A refusal is the model declining; a filter is the safety layer blocking. The copy and the recovery differ — a refusal can sometimes be rephrased into compliance, a filter trip usually cannot. Mixing them confuses users.
Implementation: the error adapter pattern
The pattern that scales across providers and stays maintainable: a small server-side adapter that translates raw provider errors into a stable enum of UI states.
type LlmUiError =
| { kind: "refused"; modelText: string }
| { kind: "filtered"; trigger?: string }
| { kind: "rate_limited"; retryAfterMs?: number }
| { kind: "timeout" }
| { kind: "context_overflow"; suggestedAction: "summarize" | "new_chat" }
| { kind: "overloaded"; fallbackModel?: string }
| { kind: "network_drop"; partial?: string }
| { kind: "unknown"; raw: unknown };
The adapter maps OpenAI 429s, Anthropic overloaded_errors, generic network errors, and your own application errors into this single shape. The front end switches on kind to render the appropriate state. Diagnostics (the raw error) ride along but are hidden by default.
This pattern survives provider changes (you only update the adapter), survives adding new providers (each one gets its own adapter into the same enum), and gives you one place to audit how each failure shape is presented to the user.
What not to do
Failure modes worth avoiding entirely:
- Silent auto-retry on user-facing calls. The user thinks the app is slow; you are spending double on tokens; if both attempts fail, you have a worse error state than if you had just shown the first failure.
- Retry-on-refusal. The model refused for a reason; re-running the same prompt rarely changes the answer and trains users that refusals are arbitrary.
- "Try a different model" without explaining why. Buttons that silently swap models behind the user's back break the user's mental model of what they are talking to.
- Catch-all toast notifications. Errors that disappear after 3 seconds, before the user has read them, with no recovery action. The toast pattern is wrong for any error that the user might need to act on.
The eight-state catalog is not exotic; it is the realistic shape of failure in any production LLM app. Designing for it once is the difference between a product that feels reliable and one that feels broken.