Logo

MediaFast

GEO Technical Guide

What Is llms.txt? [2026 Format Guide + Real Example]

llms.txt is a plain Markdown file you place at the root of your site to give AI tools a structured, low-noise map of your most important content. Here is what it actually is, how to write one, and what it honestly does and does not do.

The Short Answer

llms.txt is a Markdown file placed at https://yourdomain.com/llms.txt. It was proposed in September 2024 by Jeremy Howard, co-founder of Answer.AI and fast.ai. The idea: give AI systems a curated index of your site so they can understand your content without parsing raw HTML.

The spec lives at llmstxt.org and is maintained informally. As of mid-2026, it has no backing from W3C, IETF, or any recognized standards body. Companies like Anthropic, Stripe, Cursor, Cloudflare, Vercel, Supabase, and Mintlify have published their own llms.txt files. AI coding tools (Cursor, Claude Code, GitHub Copilot, Windsurf) fetch it routinely. The chat-based LLMs (ChatGPT, Gemini) show limited confirmed usage.

~10%

of tech sites have published one

Sep 2024

proposed by Jeremy Howard

97%

received zero AI requests (May 2026)

A Real llms.txt Format Example

Below is a representative llms.txt following the spec at llmstxt.org. The file is plain Markdown. Every element is required except the Optional section. Use this as a copy-paste starting point.

llms.txt
# Acme SaaS

> Acme SaaS helps marketing teams publish to Reddit, LinkedIn,
> and X from one dashboard. Built for B2B SaaS founders.

## Docs

- [Getting Started Guide](https://acme.com/docs/getting-started): Install and connect your first subreddit in under 5 minutes.
- [API Reference](https://acme.com/docs/api): Full REST API for post scheduling and analytics.
- [Rate Limits](https://acme.com/docs/rate-limits): Per-endpoint limits and how to handle 429 responses.

## Blog

- [How Reddit Marketing Works in 2026](https://acme.com/blog/reddit-marketing-2026): Data-backed breakdown of what converts.
- [SaaS Founder Case Study](https://acme.com/blog/founder-case-study): How one founder hit $10k MRR with Reddit alone.

## Legal

- [Privacy Policy](https://acme.com/privacy): What data we collect and how it is handled.
- [Terms of Service](https://acme.com/terms): Usage terms and acceptable use policy.

## Optional

- [Changelog](https://acme.com/changelog): Version history and release notes.
- [Old API v1 Docs](https://acme.com/docs/v1): Legacy API for users on older integrations.

Note: The H1 is your site or product name. The blockquote is a one-to-two sentence summary. H2 headings organize sections. Each bullet is a Markdown link followed by a colon and a short description. That is the entire format.

How to Create an llms.txt File (Step by Step)

Creating an llms.txt file takes under an hour for most sites. These steps follow the spec at llmstxt.org.

1

Open a plain text editor and start with a single H1 heading

The very first line must be a Markdown H1 heading with your site or product name. Example: # MyProduct. Do not use a tagline or description here, just the name. This is how the spec identifies your property to AI systems that parse the file.

2

Write a one-to-two sentence blockquote summary

Immediately below the H1, add a Markdown blockquote (a line starting with >) that describes what your site does and who it is for. Keep it factual and under 40 words. This becomes the context an AI agent uses when deciding whether your site is relevant to a query.

3

Create H2 sections for each content category

Use Markdown H2 headings (## Heading) to group your links into logical categories. Common sections: Docs, Blog, API, Legal, Tools, Guides. There is no fixed list of required section names, use whatever matches your content structure. Most sites need three to five sections.

4

Add bullet-point links with short descriptions under each section

Each bullet follows the pattern: - [Page Title](URL): One sentence description. The description should tell an AI what it will find at that URL, not just restate the title. Example: - [Rate Limits](https://example.com/docs/rate-limits): Per-endpoint request limits and how to handle 429 responses.

5

Add an Optional section for lower-priority pages

The spec supports an H2 section titled exactly "Optional" for pages that are useful but not essential to understanding your site. Put changelog entries, legacy docs, archived posts, and deep-config guides here. AI agents with limited context windows can skip Optional sections.

6

Save the file as llms.txt and serve it at the root of your domain

The file must be accessible at https://yourdomain.com/llms.txt with a Content-Type of text/plain or text/markdown. In Next.js you can place it in the /public directory. For other frameworks, serve it as a static file. Verify it is publicly accessible with a curl command or by visiting the URL in an incognito tab.

7

Optionally create llms-full.txt for coding tool use cases

If your primary audience is developers using AI coding assistants, create a second file at /llms-full.txt that inlines the full text content of each linked page rather than just the link. Tools like Cursor and Windsurf will load the entire file in one request to avoid multiple fetches. This file can be large, so only include it if developer documentation is a core use case.

Get Your Brand Cited in AI Answers, Not Just Listed in Search

MediaFast helps you build the Reddit presence and content signals that make AI tools like ChatGPT and Perplexity recommend you by name.

mediafa.st / find-subreddits
How it works
AI search → Reddit → Sales
1
User asks ChatGPT
"Best tool for SaaS Reddit marketing?"
ChatGPT recommends you
"Founders use MediaFast for Reddit"
New signup
+1 user · via ChatGPT
Traffic compounds
+412%in 30 days
Live · this happens daily
Start the loop
ChatGPTLive
"Founders use MediaFast for Reddit"

What to Include in llms.txt (Reference Table)

Use this table to decide which pages belong in your llms.txt and which to leave out. The guiding rule: include pages that help an AI understand what your product does and how to use it. Exclude pages that exist for design or navigation purposes.

Page TypeInclude?Section to UseWhy / Why Not
API reference docsYes## Docs or ## APICore for developer-facing products. AI coding tools read this directly.
Getting started guideYes## DocsSets the mental model for what the product does and how it works.
Pricing pageYes## ProductHelps AI systems answer "how much does X cost" accurately.
Blog posts (core topics)Yes## BlogSubstantive long-form content is what AI retrieval systems index most.
Legal pages (Privacy, ToS)Yes## LegalHelps AI agents answer compliance and data questions correctly.
HomepageSkipN/AThe blockquote summary already covers homepage-level context.
Changelog / release notesOptional## OptionalUseful for deep research but not essential to core understanding.
Category/tag archive pagesSkipN/ANavigation pages with no standalone content value.
Old/deprecated docsOptional## OptionalFlag as legacy with a note. Helps avoid AI citing outdated info.
Contact/about pagesSkipN/ALow informational value. Wastes context window space.
Case studies/testimonialsYes## Blog or ## ResourcesProvides social proof context AI systems use when evaluating credibility.
Free tool pagesYes## ToolsHelps AI systems surface your tools as recommendations.

Who Needs llms.txt, Who Does Not

Given the real adoption data (97% of files with valid llms.txt received zero AI requests in May 2026 per research tracking 38,000 domains), you should make an honest assessment of whether this is worth your time. Here is a clear decision framework.

You should publish llms.txt

Developer tool or API product

AI coding tools (Cursor, Claude Code, Copilot) actively fetch /llms.txt for documentation sites. Your docs become more accessible to developers using AI assistants.

SaaS with technical documentation

If your product has an API, SDK, or detailed how-to docs, llms.txt makes that content more parseable by AI systems your target users already use daily.

Your users are developers or AI-native teams

Audiences already using AI tools in their workflow are the most likely to have those tools fetch your llms.txt.

You want to be cited by Perplexity

Perplexity has publicly confirmed it retrieves and uses llms.txt for page selection. If Perplexity is a traffic source you care about, this is a concrete signal.

Your content is documentation-heavy

The format was designed for sites with many structured docs pages. The more your site looks like documentation, the more useful llms.txt becomes.

Lower priority for you

Pure content/blog site

ChatGPT and Google AI Overviews do not reliably consume llms.txt in their main interfaces. Content sites see more impact from structured data, E-E-A-T signals, and entity mentions.

Local business or service site

Local search intent is handled by Google Business Profile and local schema. llms.txt adds no meaningful signal for location-based queries.

E-commerce store

Product discovery via AI is driven by structured product data (schema.org/Product, merchant feeds) not llms.txt. Invest there instead.

Your goal is ChatGPT citations specifically

ChatGPT's main chat interface is trained on static data and does not fetch /llms.txt at query time. Publish anyway (zero cost), but do not expect it to move ChatGPT citations.

You have no content worth curating

If your site is thin (under 15-20 meaningful pages), an llms.txt offers little over what AI systems already see. Build the content first.

Honest Adoption Status (Mid-2026)

There is a lot of hype around llms.txt that does not match the actual data. Here is what independent research shows as of mid-2026.

Confirmed

Who has published one

Anthropic, Stripe, Cursor, Cloudflare, Vercel, Supabase, Mintlify, LangGraph, and several hundred other developer-focused companies. Around 10% of tech and documentation sites overall.

Confirmed

Which AI tools actively fetch it

AI coding tools fetch it routinely: Cursor, Windsurf, Claude Code, GitHub Copilot, Cline, and Aider all look for /llms.txt and /llms-full.txt when pointed at a documentation site. Perplexity has publicly confirmed it retrieves llms.txt for page prioritization.

Unconfirmed

ChatGPT and Gemini usage

OpenAI has not made a public commitment to reading llms.txt in production ChatGPT systems. Google has not either for Gemini. Observable retrieval patterns suggest some usage, but no official confirmation.

Sobering

Real request volume

Research tracking approximately 38,000 domains with valid llms.txt files found that 97% received zero requests for their llms.txt in May 2026. The practical traffic impact for most sites is currently near zero.

None

Standards body status

llms.txt has no backing from W3C, IETF, or any recognized standards body as of mid-2026. It is a community convention maintained at llmstxt.org. This means no enforcement, no guarantee of consistent implementation, and no official crawl behavior.

llms.txt vs robots.txt vs sitemap.xml

These three files are often confused because they all live at the site root and all relate to how automated systems interact with your content. They serve completely different audiences and should all coexist.

FileFormatPrimary AudiencePurposeStandards Status
llms.txtMarkdownAI systems and coding toolsCurated content map for AI context windowsCommunity proposal (llmstxt.org)
robots.txtPlain text directivesWeb crawlers (Googlebot, Bingbot)Allow/deny crawling of specific pathsDe facto standard, RFC 9309
sitemap.xmlXMLSearch engine crawlersIndex of all pages with metadata (dates, priority)Sitemaps.org protocol, widely adopted

You need all three. They do not overlap or replace each other. Removing sitemap.xml in favor of llms.txt would break your traditional search indexing entirely.

llms.txt Glossary

Key terms you will encounter when implementing or discussing llms.txt.

llms.txt

A plain Markdown file placed at the root of a domain (https://yourdomain.com/llms.txt) that gives AI systems a curated index of the site's most important content. Proposed by Jeremy Howard in September 2024. Format: H1 site name, blockquote summary, H2-organized link sections with short descriptions per link.

llms-full.txt

The companion file to llms.txt that inlines the full text content of linked pages rather than just the URLs. AI coding tools (Cursor, Windsurf, Cline) use this to load a site's entire documentation corpus in a single request without needing to follow each link. File size can be large for sites with substantial docs.

Context Window

The maximum amount of text an LLM can hold in active memory during a single interaction, measured in tokens. llms.txt is designed to fit within a typical context window (roughly 100,000-200,000 tokens for modern models). This is why the file should be concise rather than comprehensive.

GEO (Generative Engine Optimization)

The practice of optimizing your content so AI-powered answer engines (ChatGPT, Perplexity, Claude, Google AI Overviews) cite or recommend your brand. llms.txt is one technical signal in a GEO strategy, alongside structured data, authoritative content, and entity mentions across the web.

AI Retrieval

When an AI system fetches external content at query time rather than relying solely on its training data. AI agents and some search-integrated AI tools perform retrieval. The distinction matters for llms.txt: it is most useful for retrieval-based systems, less so for pure training-based chat responses.

Optional Section

An H2 section in an llms.txt file titled exactly "Optional" that groups lower-priority pages (changelogs, legacy docs, archives). AI agents with limited context budgets can skip Optional sections and still get a complete picture of the site. All other H2 sections are treated as required content.

4 Common llms.txt Mistakes to Avoid

Based on a review of published llms.txt files from over 100 sites, these are the most frequent errors that undermine the file's usefulness to AI systems.

1

Dumping every URL on your site. llms.txt is meant to be a curated index, not a sitemap. Including every page, including tag archives, pagination URLs, and admin pages, wastes the AI's context window and buries your most important content. Keep it to your 15-50 most valuable pages.

2

Writing descriptions that just restate the page title. A description like "- [API Reference](url): API Reference" adds zero context. Write descriptions that answer what an AI will learn from that page. "Per-endpoint rate limits and how to handle 429 responses" is far more useful than "Rate Limits page".

3

Skipping the blockquote summary. The H1 + blockquote at the top is how AI systems understand your site before reading any links. Sites that start directly with H2 sections leave AI tools without the foundational context they need to prioritize your content correctly. The blockquote is required by the spec.

4

Treating llms.txt as a replacement for good content. If your pages lack depth, clarity, or authority, llms.txt cannot fix that. AI systems that do retrieve your llms.txt will then visit the linked pages and judge their quality. Posting to communities and building genuine mentions, such as the kind tools like <Link href="/" className="text-orange-600 font-semibold hover:underline">MediaFast</Link> help create, matters more than the technical file itself.

How to Verify Your llms.txt Is Accessible

After publishing, confirm your file is publicly accessible and correctly formatted. Run these three checks.

1. Curl the URL directly

curl -I https://yourdomain.com/llms.txt

Should return HTTP 200 with Content-Type: text/plain or text/markdown. Any 404 or redirect means the file is not served correctly.

2. Open in an incognito tab

Visit https://yourdomain.com/llms.txt in a private browser window

Confirms the file is publicly accessible without authentication. You should see raw Markdown text.

3. Validate with an online checker

Search for "llms.txt validator" or check llmstxt.org for any linked validation tools

Some community-built validators check Markdown structure against the spec. Useful for catching formatting issues.

llms.txt FAQ

Honest answers to the questions developers and marketers ask most.

The honest answer is: for most ChatGPT usage, no. ChatGPT's main chat interface is trained on a static snapshot and does not fetch /llms.txt at query time. Where llms.txt does show confirmed usage is in AI coding tools like Cursor, Claude Code, GitHub Copilot, and Windsurf, which routinely fetch /llms.txt and /llms-full.txt when pointed at a documentation site. Perplexity has publicly confirmed it retrieves llms.txt for page prioritization. OpenAI's usage in production systems remains unconfirmed.

No. As of mid-2026, llms.txt is a community convention proposed by Jeremy Howard of Answer.AI and fast.ai in September 2024. It has no backing from W3C, IETF, or any recognized standards body. The spec at llmstxt.org is maintained informally. robots.txt by contrast is a de facto standard with well-documented behavior from every major crawler. llms.txt is an emerging proposal with partial and inconsistent adoption.

llms.txt is the index file, essentially a curated table of contents linking to your most important pages with short descriptions. llms-full.txt is the expanded version that contains the actual full text of those pages inlined, so an AI agent does not need to follow each link separately. AI coding tools that want to load your entire documentation corpus in one request use llms-full.txt. For most sites, publishing only llms.txt is sufficient.

The file should be as short as possible while still covering all critical pages. The goal is to fit within a typical LLM context window without padding. Most well-maintained llms.txt files are between 200 and 800 lines. Anthropic's llms.txt, for example, links to around 50-80 documentation pages with brief descriptions. Avoid dumping every page, include only the pages that help an AI understand your product, API, or core content.

No. sitemap.xml is crawled by Google, Bing, and other search engines to index your pages for traditional search. llms.txt targets AI systems, not search crawlers. They serve completely different audiences. You need both. Removing your sitemap.xml in favor of llms.txt would break your traditional search indexing.

The spec allows an H2 section labeled "Optional" to group lower-priority pages that are useful for deep research but not part of the core mental model of your site. Examples include changelog pages, legacy API docs, advanced configuration guides, or archived blog posts. AI agents that run with limited context windows can skip Optional sections and still get a complete picture of what your site offers.