By Greg Nowak. Last updated 2026-07-01.
llms.txt is easy to undersell because it looks like a small text file at the root of a website. It is also easy to oversell. It is not a magic AI visibility button, and it does not replace technical SEO, structured content, robots.txt, or sitemaps.
The better way to think about it is as a curated, Markdown-based guide for language models. It tells AI tools which parts of a site matter, and where to find the material the business is willing to stand behind.
That makes it a CMS governance issue. If your site contains product pages, documentation, pricing, policies, support boundaries, or API references, an llms.txt file can either reduce confusion or create more of it. A current, selective file helps point AI tools toward useful material. A stale or overstuffed one can surface old docs, unclear positioning, expired policy language, or pages that were never meant to be treated as primary guidance.
The file is a map, not a ranking lever
The llms.txt proposal describes a standard location, usually /llms.txt, for a Markdown file that gives language models a concise overview of a site and links to more detailed Markdown resources. The suggested structure is deliberately simple: an H1 for the project or site, an optional summary, supporting context, and H2 sections listing useful files with short descriptions. It also defines an Optional section for secondary links that can be skipped when a shorter context is needed.
That simplicity is the point. Large websites are hard for language models to consume in one pass. Context windows are limited, and turning complex HTML into clean text can be unreliable. A curated file gives models and agents a clearer route to the pages, docs, and policies that matter most.
It is also important to keep the boundaries clear. Sitemaps list indexable human-readable pages. robots.txt communicates acceptable automated access. llms.txt offers a curated overview for language-model use, mainly when a user is seeking assistance. In business terms, it is closer to an editorial index than an SEO switch.
Live examples show the governance choices
Stripe uses llms.txt as a dense documentation index. Its file also opens with operational guidance about checking current package versions instead of relying on memorized version numbers. That is not just link curation. It is a clear instruction about how Stripe wants technical information handled.
Cloudflare uses a different pattern for its developer documentation. Its llms.txt works as a directory across product areas, with individual products linking to their own llms.txt files. For a large documentation estate, that is sensible. A single flat file would quickly become hard to use, so the root file becomes a router to smaller, more manageable indexes.
Vercel is another useful comparison. Its main file stays organized as a documentation index, while full documentation content is handled separately. That distinction matters for teams deciding whether their root llms.txt should be a compact guide, a broad content dump, or a pointer to richer machine-readable material.
Drupal makes the topic especially concrete for CMS teams. The LLM support project describes a recipe for making Drupal sites more AI-friendly, including authoring and output of /llms.txt, Markdown output for entities, and token support for authoring. It also notes that outputting all website content into one llms-full.txt file is not supported because Drupal site content can be too large. That is the right instinct. More content is not automatically better context.
| Decision | Risky version | Better version |
|---|---|---|
| Scope | Export everything the CMS can produce | Select canonical pages, current docs, key policies, and useful support material |
| Structure | Publish one long, mixed list | Use H2 sections that match real product, documentation, or service areas |
| Ownership | Create the file once as a developer task | Assign editorial and technical owners with review before publication |
| Standards | Treat it as a sitemap or robots.txt replacement | Keep it aligned with sitemaps, robots rules, canonical pages, and structured content |
| Testing | Publish without checking model behavior | Test it against real customer, support, product, and developer questions |
| Maintenance | Update it only during redesigns | Review or regenerate it when business-critical CMS content changes |
What to audit before publishing
Start with the editorial question, not the technical one: what should an AI tool understand about this website if it only had a short, curated context?
For a software company, that may include current API docs, SDK guidance, versioning, authentication, webhooks, security, pricing boundaries, and support escalation. For a service business, it may mean service definitions, geography, eligibility, policies, and the pages that should be treated as canonical.
Then decide what does not belong in the primary context. The Optional section exists for a reason. Archive pages, duplicated landing pages, old campaign material, partial translations, thin tag pages, and outdated docs can all make model-assisted answers worse. A useful llms.txt file reduces ambiguity. It should not republish the whole site in a different format.
Finally, check alignment. If llms.txt promotes a page that is no longer canonical, contradicts the sitemap, or conflicts with current robots guidance, it becomes another source of drift. The same problem appears when CMS editors update product pages, policy pages, or docs without triggering a review of AI-facing files. The file needs to sit inside the publishing workflow, not beside it.
A practical workflow
A sensible implementation starts with a content inventory. Identify the pages and Markdown outputs that are safe, current, canonical, and useful for model-assisted answers. Group them around the way customers, buyers, developers, and support teams ask questions, not merely around how the CMS stores posts or nodes.
Where possible, generate the file from the CMS. Drupal now has ecosystem support for llms.txt-oriented output, and WordPress teams can use the same principle: generate from approved content instead of copy-pasting a static file. Automation should make review repeatable, not remove it. Editors approve the scope. Developers validate the format. Business owners confirm that high-risk claims are still current.
Then test the result. The proposal recommends expanding llms.txt into an LLM context file and checking whether models can answer questions about the content. Those checks should use the questions people actually ask before buying, integrating, troubleshooting, or escalating. The goal is not to make the business sound impressive. The goal is to make answers accurate, current, and bounded by material the business trusts.
For GrN.dk clients, the value is in this governance layer. Greg can help audit what belongs in the file, generate and validate it from WordPress or Drupal content, connect it to editorial ownership, and test whether it supports better AI-assisted answers. Done well, llms.txt becomes a small but useful bridge between canonical website content and AI-assisted discovery. Done casually, it is just another public file waiting to go stale.
Related on GrN.dk
- OpenAI File Search Turns Messy Internal Docs Into a Real Retrieval-Governance Project
- Web Search and Citation Controls Turn AI Research Assistants Into a Source-Governance Project
- When Google can call the business, your local data stops being cosmetic
Need help with this kind of work?
Book an llms.txt audit Get in touch with Greg.