Last updated ·8 min read

The Complete Guide to llms.txt

llms.txt is a plain-text file that tells AI models what your site is about — clearly, directly, in a format they can actually use. Here is how to write one.

llms.txt is a plain-text Markdown file placed at the root of your website — at yourdomain.com/llms.txt — that tells AI models and AI-powered crawlers what your site is about, what you offer, and which pages matter most. It is one of the most immediately actionable signals in Generative Engine Optimization, and one of the few that you have complete control over.

Where llms.txt came from

The web has long had conventions for giving machines structured information about a site. robots.txt, introduced in 1994, tells crawlers which pages they may access. sitemap.xml, popularized in the mid-2000s, tells search engines which URLs exist and how they are organized. Both files are machine-readable and placed at predictable root-level paths — conventions that any crawler can rely on.

As large language models began powering search and retrieval tools, a gap became apparent: there was no equivalent convention for telling an AI model what a site actually means — what it is about, what it offers, and which content represents it best. HTML pages are written for human readers. Navigation menus, hero images, footer links and cookie banners are meaningful to humans but add noise for an AI trying to understand a site's core purpose.

In 2024, researcher Jeremy Howard proposed llms.txt as a solution. The specification is documented at llmstxt.org and is deliberately simple: a Markdown-formatted text file at the root of a domain that summarizes the site in a format AI systems can parse directly, without inference or layout interpretation.

The difference between llms.txt and llms-full.txt

The llmstxt.org specification defines two files, each serving a different purpose:

llms.txt is the concise version. Its job is to give an AI model a fast, accurate picture of the site: who it belongs to, what it does, and which pages are the most important entry points. It should be short enough to fit comfortably within a typical LLM context window — ideally under 2,000 words. Think of it as the elevator pitch your site gives to any AI that comes looking.

llms-full.txt is the expanded version. It contains the full text of every important page on the site, concatenated into a single document. This gives retrieval systems the complete content of your site in one place, which is useful for tools that perform deep context retrieval rather than just identifying the most relevant pages. The tradeoff is size: llms-full.txt can be very large, and not all AI crawlers will fetch or process it in full.

Most businesses should implement llms.txt first. Only add llms-full.txt once the concise version is complete and accurate.

What a good llms.txt contains

A well-formed llms.txt follows a consistent Markdown structure. Here is an example for a hypothetical GEO consultancy:

# Meridian

> Meridian (meridianai.ch) is a Generative Engine Optimization (GEO)
> consultancy based in Zurich, Switzerland. We help brands become
> accurately cited by ChatGPT, Claude, Gemini and Perplexity through
> structured data, llms.txt implementation and AI visibility audits.

## Services

- GEO Audit: A structured assessment of what major AI models currently
  say about your brand, scored across citation rate, sentiment,
  factual accuracy and share of voice.
- GEO Implementation: Hands-on optimization of the technical and
  content signals that determine AI model citations — schema markup,
  llms.txt, entity architecture and quotable content.
- GEO Monitoring: Ongoing monthly tracking of AI Visibility across
  ChatGPT, Claude, Gemini and Perplexity with alerting on sentiment
  drift.

## Key Facts

- Founded: 2025
- Location: Zurich, Switzerland
- Languages: English, German, French
- Industries served: B2B services, professional services, SaaS,
  finance, hospitality

## Important pages

- https://meridianai.ch/ : Homepage with full service overview
- https://meridianai.ch/blog/what-is-geo : Definition of GEO
- https://meridianai.ch/blog/what-is-ai-visibility : AI Visibility
  metrics explained
- https://meridianai.ch/blog/llms-txt-guide : This guide

Notice the structure: the file opens with a level-one heading containing the brand name, followed immediately by a blockquote (the > lines) that serves as the description. Sections use level-two headings. URLs are absolute. Facts are stated plainly.

Does llms.txt actually influence AI model answers?

The honest answer depends on which type of AI system you are asking about.

For retrieval-augmented systems — Perplexity, ChatGPT with Search, Gemini with grounding, and similar tools that fetch live web content before generating an answer — llms.txt has a clear and direct influence. These tools crawl websites looking for concise, authoritative summaries of what a site contains. A well-written llms.txt is exactly that: a single document that answers the crawler's question without requiring it to parse dozens of HTML pages. It is also worth noting that OpenAI's GPTBot crawler has been observed fetching llms.txt files directly, suggesting the file influences ChatGPT's retrieval layer even beyond third-party integrations.

For parametric models answering without live web search — using only what they absorbed during training — the effect is indirect and slower. A llms.txt file published today will not change what GPT-4o or Claude 3.5 Sonnet currently know about your brand. It may, however, influence the next generation of model weights if your file is crawled before the next training data cut-off.

The practical conclusion: implement llms.txt now because it has an immediate, measurable impact on retrieval-layer tools, and a compounding benefit on parametric knowledge over time.

How to write yours

Five principles for writing an effective llms.txt:

  1. Put the description line first. The blockquote immediately beneath the title is the single most important element. It should state, in one to three sentences, exactly what your organisation is, what it does, and who it serves. Write it as if you were explaining your business to an intelligent colleague who has never heard of you. Avoid marketing language — "world-class" and "innovative" mean nothing to a model; "GEO consultancy based in Zurich" means everything.
  2. Use plain Markdown only. No HTML, no JavaScript, no images. Tables are acceptable but keep them small. Bullet lists work well. The goal is zero parsing friction for any text-based system.
  3. State explicit facts. Include your founding year, location, industries served, key team members, primary services and any distinguishing credentials. These facts are what AI models will quote when asked about your brand — make them accurate and quotable.
  4. Use absolute URLs. Any link in the file should be a full https:// URL, not a relative path. AI crawlers do not always have a base URL context when processing the file.
  5. Keep it under 2,000 words. The purpose of llms.txt is to be concise. If you find yourself writing more than 2,000 words, you are probably duplicating your website rather than summarizing it. Save the full content for llms-full.txt instead.

Frequently asked questions

Is llms.txt an official standard?

No. llms.txt is a community-proposed convention, not an official internet standard like robots.txt. It was proposed by Jeremy Howard and documented at llmstxt.org in 2024. However, it has seen rapid adoption because the underlying principle — providing a machine-readable summary of a site — is immediately useful for AI systems that perform retrieval-augmented generation. The absence of a formal standard does not reduce its practical value.

Does llms.txt replace robots.txt?

No. robots.txt is a crawl-permission file — it tells crawlers which parts of your site they may or may not access. llms.txt is a content-summary file — it tells AI systems what your site is about and which pages are most important. Both serve different purposes and should coexist. In fact, you may wish to explicitly allow AI crawlers in robots.txt and then point them to llms.txt for structured content.

What content type should llms.txt be served with?

llms.txt should be served with the content type text/plain. Most web servers will do this automatically for files with a .txt extension. If you are generating llms.txt dynamically via a server route — for example in Next.js using a route handler — ensure you set the Content-Type header to text/plain; charset=utf-8 explicitly.

Should llms.txt be listed in robots.txt?

You do not need to list llms.txt in robots.txt — it is a public file at a predictable path and AI crawlers that look for it will find it automatically. However, some practitioners add a comment in robots.txt pointing to llms.txt as a courtesy signal to crawlers. More importantly, ensure your robots.txt does not accidentally block the /llms.txt path by an overly broad Disallow rule.

How long does it take for AI models to reflect a new llms.txt?

For retrieval-augmented systems like Perplexity, ChatGPT with Search and Gemini with grounding, the effect can appear within days to a few weeks — as fast as those systems re-crawl your domain. For parametric models that answer without live web search, llms.txt has no direct effect until the next training run, which can be months away. This is why measuring both parametric and retrieval performance separately is important.