The Prompt Bench | The lab bench for prompting, local models, and what's coming next.

What to learn first

6 articles
Prompting Fundamentals

The mechanics every prompt rests on: system vs. user messages, context windows, zero- and few-shot patterns, templates, and how to give a model what it needs to answer well.
Take me to the Fundamentals hub
Selected articles
4 articles
Prompting Patterns

Reusable techniques like role prompting, tree-of-thoughts, ReAct, prompt chaining, self-consistency, and negative prompting — with the kind of example you can paste and modify.
Take me to the Patterns hub
Selected articles
4 articles
Prompting By Use Case

Long-form, specific guides for the actual things people prompt for: code, writing, summarization, data extraction, classification, and analysis.
Take me to the Use Cases hub
Selected articles
4 articles
Local Models

Running LLMs on your own hardware: how the stack works, which runtimes to pick, what quantization actually changes, and which open-weight models are genuinely usable right now.
Take me to the Local hub
Selected articles
4 articles
Model Benchmarks

Honest head-to-heads between frontier and open-weight models. We disclose the prompts, the temperature, the seed, and the limits — every comparison is timestamped.
Take me to the Bench hub
Selected articles
4 articles
Release Radar

A dated, sourced tracker of new and rumored model releases. Every claim is tagged Confirmed, Strong signal, or Speculation, with a link back to the primary source.
Take me to the Radar hub
Selected articles
4 articles
Agents & Tool Use

Building LLM agents that actually do useful work: the agent loop, tool calling across major APIs, the Model Context Protocol, and the failure modes that make agents the wrong shape for many problems.
Take me to the Agents hub
Selected articles
4 articles
Context Engineering

The discipline that replaced clever prompts once context windows got large: deciding what information goes into the model's input, in what order, how much, and what to leave out. Includes the failure modes of long contexts and the memory patterns production systems use.
Take me to the Context hub
Selected articles
4 articles
Evals & Testing

How working teams actually evaluate LLM applications: building golden sets, designing rubrics, using LLM-as-judge without inheriting its biases, regression-testing prompts in CI, and which of the dozen eval frameworks is worth picking up.
Take me to the Evals hub
Selected articles
4 articles
RAG & Retrieval

Retrieval-augmented generation, end to end: the pipeline that actually fires in production, how to chunk documents without breaking meaning, when to use vector vs keyword vs hybrid retrieval, and when RAG beats long-context (and when it does not).
Take me to the RAG hub
Selected articles
4 articles
AI Coding Tools & Workflow

The tools and workflows working engineers actually use for AI-assisted coding in 2026: Claude Code, Cursor, Aider, Codex, Copilot, and the workflow patterns that turn them from autocomplete novelties into real productivity.
Take me to the Coding Tools hub
Selected articles
4 articles
Multimodal Prompting

Prompting when the input is not just text: images, document scans, audio, and video. How the major multimodal models handle each modality, where they reliably fail, and what changes when you stop typing prompts and start mixing media.
Take me to the Multimodal hub
Selected articles
4 articles
Safety & Guardrails

How working teams defend LLM applications: prompt injection (direct and indirect), the OWASP LLM Top 10, defense-in-depth patterns that actually work, and the guardrail frameworks worth knowing about.
Take me to the Safety hub
Selected articles
4 articles
Patterns Under Pressure

AI tooling that is getting absorbed by foundation models or providers, what won't age well, and how to bet on tools that survive the next two model generations. Opinionated but grounded in the absorption patterns that have already played out.
Take me to the Pressure hub
Selected articles
4 articles
GEO & AI Citation

Generative Engine Optimization: how to get your content cited by Claude, ChatGPT, Perplexity, and Gemini. The discipline that took over from SEO when answer engines started doing the reading for users — what the citation mechanics actually look like, what works, and how to measure it.
Take me to the GEO hub
Selected articles
5 articles
Cost & Performance Engineering

Where your LLM bill actually comes from, and what to do about it. Prompt caching across providers, when to drop to a smaller model, streaming vs batching, latency budgets, and cutting tokens without cutting accuracy.
Take me to the Cost & Performance hub
Selected articles
5 articles
AI Product UX

Designing UIs around LLMs that feel good despite the underlying model being slow, non-deterministic, and occasionally wrong. Streaming patterns, citation rendering, error and refusal states, and the conversation mechanics — regenerate, edit, branch — that separate a polished LLM product from a wrapper around an API.
Take me to the Product UX hub
Selected articles
5 articles
Self-Hosted AI Agents

The 2025–2026 wave of open-source autonomous agent runtimes you run yourself: OpenClaw, Hermes Agent, and where they sit next to vendor-shipped offerings like Claude Code Channels. How they actually work, what each does well, and how to pick one without locking yourself in.
Take me to the Self-Hosted Agents hub
Selected articles
5 articles
Voice & Realtime AI

The full stack of real-time conversational AI: how voice agents actually work end to end, the latency budgets that separate "feels like a person" from "feels like a robot," the platforms (Vapi, Retell, Bland, LiveKit) that ship voice agents in production, and the ASR and TTS layers underneath.
Take me to the Voice & Realtime hub
Selected articles
4 articles
Fine-Tuning & Adaptation

When prompting runs out of road and you need to actually change the model: full fine-tuning versus LoRA and QLoRA, how to build a dataset that teaches instead of confuses, and how distillation shrinks a big model into a fast one you can afford to run.
Take me to the Fine-Tuning hub
Selected articles
4 articles
Embeddings & Vector Search

The numbers that turn meaning into geometry. What embeddings are, how to pick an embedding model, how vector databases store and search them, and how to chunk text so the vectors actually capture what a passage is about.
Take me to the Embeddings hub
Selected articles

A lab bench for the AI you actually use

What to learn first

Selected articles

Selected articles

Selected articles

Selected articles

Selected articles

Selected articles

Selected articles

Selected articles

Selected articles

Selected articles

Selected articles

Selected articles

Selected articles

Selected articles

Selected articles

Selected articles

Selected articles

Selected articles

Selected articles

Selected articles

Selected articles

Recent articles

How to Choose an Embedding Model

Chunking Text for Embeddings

Vector Databases Compared

What Are Embeddings?