Do AI assistants actually read llms.txt?

Not in any documented, formal way as of 2026. Anthropic, OpenAI, Google, and Perplexity have not committed to reading llms.txt as part of their crawling or citation logic. The spec is positioned as a hint, not a directive, and the major providers treat it as such. Some smaller AI tools and agentic systems do read it; many indexing tools and AI dev tools surface it as part of their own crawl.

Then why bother shipping it?

Three reasons. One, it costs five minutes — it is a markdown file at /llms.txt with a short list of important URLs. Two, the ecosystem is moving toward parsing it even where no formal commitment exists; shipping it puts you ahead of the curve. Three, it serves as documentation of your own canonical pages, which is useful internally and which agents and dev tools can use today.

How is llms.txt different from sitemap.xml or robots.txt?

sitemap.xml is a machine-readable list of every URL for search-engine crawlers. robots.txt is an opt-in/opt-out for crawler access. llms.txt is curated — a short, opinionated list of your highest-value pages with descriptions, designed for an LLM to read directly. Sitemap is comprehensive; llms.txt is selective. They serve different purposes and complement each other.

The llms.txt Standard, Explained

llms.txt is a spec proposed by Jeremy Howard in September 2024. The idea is small enough to describe in a sentence: put a markdown file at /llms.txt on your site that lists key URLs with short descriptions, in a format that is easy for LLMs to read without crawling your whole HTML.

The standard is unfussy by design. Worth understanding clearly because the discussion around it tends to either over-sell ("LLMs will read this") or dismiss it entirely ("nobody reads it").

What the spec actually says

From llmstxt.org:

File location: at the root of your domain, https://example.com/llms.txt.
Format: plain text/markdown, structured with simple conventions.
Content: a curated list of important pages, optionally grouped by section, each with:
- A URL
- A short description for LLMs
- Optional tags or notes (preferred-for-summarization, API docs, etc.)

A minimal example, from a site like the one you are reading:

# The Prompt Bench

> Working-engineer guides to prompting, local models, agents, and AI engineering practice.

## Core clusters

- [Prompting Fundamentals](https://thepromptbench.com/prompting-fundamentals/) — Pillar guides to instruction, context, examples, and output spec.
- [RAG & Retrieval](https://thepromptbench.com/rag-and-retrieval/) — How retrieval-augmented generation actually works in production.
- [Agents & Tool Use](https://thepromptbench.com/agents-and-tools/) — Building LLM agents that do useful work.

## Authoritative articles

- [How to Build an LLM Agent](https://thepromptbench.com/agents-and-tools/how-to-build-an-llm-agent/) — Pillar overview of the ReAct loop and production considerations.

That is essentially the whole spec. There is also an extended convention — llms-full.txt — where the full markdown content of all listed pages is concatenated into one file, useful for agents that want to ingest a whole site without crawling.

What it intentionally is not

The spec is explicit about its limits:

It is not legally binding. Unlike robots.txt's "crawl this / do not crawl that" convention, llms.txt does not control access. It is a hint to AI consumers about what is worth reading.
It is not a versioned protocol. No version negotiation, no required fields, no schema validation. The point is simplicity.
It is not a substitute for sitemap.xml or robots.txt. Sitemap is comprehensive for search-engine crawling; robots.txt controls crawler access. llms.txt is the editorial layer: "if you only read a few pages of my site, read these."

Who reads it (and who does not, yet)

As of mid-2026, the formal picture:

Anthropic, OpenAI, Google, Perplexity — no public commitment to reading llms.txt as part of their citation logic. They crawl your site with their own crawlers and select citations based on their own retrieval logic. llms.txt is not an input to that pipeline in any documented way.
AI development tools and agentic systems — many do read it. Coding agents, AI documentation tools, smaller assistants, and several agentic dev frameworks parse llms.txt when present.
Indexing services aimed at AI — services that build LLM-ready indexes from web content (for vector DBs, agentic memory, etc.) frequently honor llms.txt.

Practitioner posts often overstate the major-platform adoption. The honest answer is "not yet, formally" — but the ecosystem of smaller tools is real, and shipping llms.txt is cheap.

Why ship it anyway

Three reasons:

It is cheap. A working llms.txt is 20–50 lines of markdown. Generating it from your existing site structure is a one-evening exercise. The cost is negligible.

The ecosystem is moving. Tools and platforms that want to ingest your site for LLM use are increasingly checking for llms.txt first. Even where the major chat assistants do not read it, smaller integrations do, and the trajectory points at broader adoption.

It clarifies your own thinking. Writing llms.txt forces you to articulate which pages on your site are highest-value for someone trying to understand what you offer. That clarity is useful internally, regardless of which crawlers consume it.

How to write a useful one

A few practical tips:

Curate, do not enumerate. A llms.txt with 200 entries is useless. A llms.txt with the 10–30 pages that actually matter to a reader trying to understand your site is useful. Pick well.
Write the descriptions for the LLM, not the human. What would a model need to know about this page to decide whether to include it in an answer? "Pillar guide to RAG, end-to-end pipeline" is better than "Our RAG article."
Group by section if you have one. ## Tutorials, ## Reference, ## API docs — section headings let an LLM understand the shape of your site.
Link to the canonical URLs. Use the URLs you want cited, not redirects or slugs that might change.
Update it when content changes meaningfully. Stale llms.txt is worse than no llms.txt because it sends agents to dead or wrong pages.

The honest verdict

llms.txt in mid-2026 is in the "useful but under-honored" stage. The standard is sensible, the implementation cost is trivial, and the adoption curve points up even if it has not flattened into universal support. The downside of shipping it is essentially zero. The upside is being early to a standard that may or may not become required.

The Prompt Bench ships one. So should you, if your site has any content worth the small amount of effort it takes to curate the index.

Comments 2

u/llms_l2 · 1 month ago

low effort, high optionality. added ours and promptly forgot about it. if it becomes a real thing were ready, if not no harm done
u/wjones · 1 month ago

we added an llms.txt to our site a while back, low effort and might help. jury is still very much out on whether anything actually reads it tho

The llms.txt Standard, Explained

Article summary