llms.txt is a plain Markdown file you place at the root of your site to give AI tools a structured, low-noise map of your most important content. Here is what it actually is, how to write one, and what it honestly does and does not do.
llms.txt is a Markdown file placed at https://yourdomain.com/llms.txt. It was proposed in September 2024 by Jeremy Howard, co-founder of Answer.AI and fast.ai. The idea: give AI systems a curated index of your site so they can understand your content without parsing raw HTML.
The spec lives at llmstxt.org and is maintained informally. As of mid-2026, it has no backing from W3C, IETF, or any recognized standards body. Companies like Anthropic, Stripe, Cursor, Cloudflare, Vercel, Supabase, and Mintlify have published their own llms.txt files. AI coding tools (Cursor, Claude Code, GitHub Copilot, Windsurf) fetch it routinely. The chat-based LLMs (ChatGPT, Gemini) show limited confirmed usage.
~10%
of tech sites have published one
Sep 2024
proposed by Jeremy Howard
97%
received zero AI requests (May 2026)
Below is a representative llms.txt following the spec at llmstxt.org. The file is plain Markdown. Every element is required except the Optional section. Use this as a copy-paste starting point.
# Acme SaaS > Acme SaaS helps marketing teams publish to Reddit, LinkedIn, > and X from one dashboard. Built for B2B SaaS founders. ## Docs - [Getting Started Guide](https://acme.com/docs/getting-started): Install and connect your first subreddit in under 5 minutes. - [API Reference](https://acme.com/docs/api): Full REST API for post scheduling and analytics. - [Rate Limits](https://acme.com/docs/rate-limits): Per-endpoint limits and how to handle 429 responses. ## Blog - [How Reddit Marketing Works in 2026](https://acme.com/blog/reddit-marketing-2026): Data-backed breakdown of what converts. - [SaaS Founder Case Study](https://acme.com/blog/founder-case-study): How one founder hit $10k MRR with Reddit alone. ## Legal - [Privacy Policy](https://acme.com/privacy): What data we collect and how it is handled. - [Terms of Service](https://acme.com/terms): Usage terms and acceptable use policy. ## Optional - [Changelog](https://acme.com/changelog): Version history and release notes. - [Old API v1 Docs](https://acme.com/docs/v1): Legacy API for users on older integrations.
Note: The H1 is your site or product name. The blockquote is a one-to-two sentence summary. H2 headings organize sections. Each bullet is a Markdown link followed by a colon and a short description. That is the entire format.
Creating an llms.txt file takes under an hour for most sites. These steps follow the spec at llmstxt.org.
Open a plain text editor and start with a single H1 heading
The very first line must be a Markdown H1 heading with your site or product name. Example: # MyProduct. Do not use a tagline or description here, just the name. This is how the spec identifies your property to AI systems that parse the file.
Write a one-to-two sentence blockquote summary
Immediately below the H1, add a Markdown blockquote (a line starting with >) that describes what your site does and who it is for. Keep it factual and under 40 words. This becomes the context an AI agent uses when deciding whether your site is relevant to a query.
Create H2 sections for each content category
Use Markdown H2 headings (## Heading) to group your links into logical categories. Common sections: Docs, Blog, API, Legal, Tools, Guides. There is no fixed list of required section names, use whatever matches your content structure. Most sites need three to five sections.
Add bullet-point links with short descriptions under each section
Each bullet follows the pattern: - [Page Title](URL): One sentence description. The description should tell an AI what it will find at that URL, not just restate the title. Example: - [Rate Limits](https://example.com/docs/rate-limits): Per-endpoint request limits and how to handle 429 responses.
Add an Optional section for lower-priority pages
The spec supports an H2 section titled exactly "Optional" for pages that are useful but not essential to understanding your site. Put changelog entries, legacy docs, archived posts, and deep-config guides here. AI agents with limited context windows can skip Optional sections.
Save the file as llms.txt and serve it at the root of your domain
The file must be accessible at https://yourdomain.com/llms.txt with a Content-Type of text/plain or text/markdown. In Next.js you can place it in the /public directory. For other frameworks, serve it as a static file. Verify it is publicly accessible with a curl command or by visiting the URL in an incognito tab.
Optionally create llms-full.txt for coding tool use cases
If your primary audience is developers using AI coding assistants, create a second file at /llms-full.txt that inlines the full text content of each linked page rather than just the link. Tools like Cursor and Windsurf will load the entire file in one request to avoid multiple fetches. This file can be large, so only include it if developer documentation is a core use case.
MediaFast helps you build the Reddit presence and content signals that make AI tools like ChatGPT and Perplexity recommend you by name.
Use this table to decide which pages belong in your llms.txt and which to leave out. The guiding rule: include pages that help an AI understand what your product does and how to use it. Exclude pages that exist for design or navigation purposes.
| Page Type | Include? | Section to Use | Why / Why Not |
|---|---|---|---|
| API reference docs | Yes | ## Docs or ## API | Core for developer-facing products. AI coding tools read this directly. |
| Getting started guide | Yes | ## Docs | Sets the mental model for what the product does and how it works. |
| Pricing page | Yes | ## Product | Helps AI systems answer "how much does X cost" accurately. |
| Blog posts (core topics) | Yes | ## Blog | Substantive long-form content is what AI retrieval systems index most. |
| Legal pages (Privacy, ToS) | Yes | ## Legal | Helps AI agents answer compliance and data questions correctly. |
| Homepage | Skip | N/A | The blockquote summary already covers homepage-level context. |
| Changelog / release notes | Optional | ## Optional | Useful for deep research but not essential to core understanding. |
| Category/tag archive pages | Skip | N/A | Navigation pages with no standalone content value. |
| Old/deprecated docs | Optional | ## Optional | Flag as legacy with a note. Helps avoid AI citing outdated info. |
| Contact/about pages | Skip | N/A | Low informational value. Wastes context window space. |
| Case studies/testimonials | Yes | ## Blog or ## Resources | Provides social proof context AI systems use when evaluating credibility. |
| Free tool pages | Yes | ## Tools | Helps AI systems surface your tools as recommendations. |
Given the real adoption data (97% of files with valid llms.txt received zero AI requests in May 2026 per research tracking 38,000 domains), you should make an honest assessment of whether this is worth your time. Here is a clear decision framework.
You should publish llms.txt
Developer tool or API product
AI coding tools (Cursor, Claude Code, Copilot) actively fetch /llms.txt for documentation sites. Your docs become more accessible to developers using AI assistants.
SaaS with technical documentation
If your product has an API, SDK, or detailed how-to docs, llms.txt makes that content more parseable by AI systems your target users already use daily.
Your users are developers or AI-native teams
Audiences already using AI tools in their workflow are the most likely to have those tools fetch your llms.txt.
You want to be cited by Perplexity
Perplexity has publicly confirmed it retrieves and uses llms.txt for page selection. If Perplexity is a traffic source you care about, this is a concrete signal.
Your content is documentation-heavy
The format was designed for sites with many structured docs pages. The more your site looks like documentation, the more useful llms.txt becomes.
Lower priority for you
Pure content/blog site
ChatGPT and Google AI Overviews do not reliably consume llms.txt in their main interfaces. Content sites see more impact from structured data, E-E-A-T signals, and entity mentions.
Local business or service site
Local search intent is handled by Google Business Profile and local schema. llms.txt adds no meaningful signal for location-based queries.
E-commerce store
Product discovery via AI is driven by structured product data (schema.org/Product, merchant feeds) not llms.txt. Invest there instead.
Your goal is ChatGPT citations specifically
ChatGPT's main chat interface is trained on static data and does not fetch /llms.txt at query time. Publish anyway (zero cost), but do not expect it to move ChatGPT citations.
You have no content worth curating
If your site is thin (under 15-20 meaningful pages), an llms.txt offers little over what AI systems already see. Build the content first.
There is a lot of hype around llms.txt that does not match the actual data. Here is what independent research shows as of mid-2026.
Who has published one
Anthropic, Stripe, Cursor, Cloudflare, Vercel, Supabase, Mintlify, LangGraph, and several hundred other developer-focused companies. Around 10% of tech and documentation sites overall.
Which AI tools actively fetch it
AI coding tools fetch it routinely: Cursor, Windsurf, Claude Code, GitHub Copilot, Cline, and Aider all look for /llms.txt and /llms-full.txt when pointed at a documentation site. Perplexity has publicly confirmed it retrieves llms.txt for page prioritization.
ChatGPT and Gemini usage
OpenAI has not made a public commitment to reading llms.txt in production ChatGPT systems. Google has not either for Gemini. Observable retrieval patterns suggest some usage, but no official confirmation.
Real request volume
Research tracking approximately 38,000 domains with valid llms.txt files found that 97% received zero requests for their llms.txt in May 2026. The practical traffic impact for most sites is currently near zero.
Standards body status
llms.txt has no backing from W3C, IETF, or any recognized standards body as of mid-2026. It is a community convention maintained at llmstxt.org. This means no enforcement, no guarantee of consistent implementation, and no official crawl behavior.
These three files are often confused because they all live at the site root and all relate to how automated systems interact with your content. They serve completely different audiences and should all coexist.
| File | Format | Primary Audience | Purpose | Standards Status |
|---|---|---|---|---|
| llms.txt | Markdown | AI systems and coding tools | Curated content map for AI context windows | Community proposal (llmstxt.org) |
| robots.txt | Plain text directives | Web crawlers (Googlebot, Bingbot) | Allow/deny crawling of specific paths | De facto standard, RFC 9309 |
| sitemap.xml | XML | Search engine crawlers | Index of all pages with metadata (dates, priority) | Sitemaps.org protocol, widely adopted |
You need all three. They do not overlap or replace each other. Removing sitemap.xml in favor of llms.txt would break your traditional search indexing entirely.
Key terms you will encounter when implementing or discussing llms.txt.
A plain Markdown file placed at the root of a domain (https://yourdomain.com/llms.txt) that gives AI systems a curated index of the site's most important content. Proposed by Jeremy Howard in September 2024. Format: H1 site name, blockquote summary, H2-organized link sections with short descriptions per link.
The companion file to llms.txt that inlines the full text content of linked pages rather than just the URLs. AI coding tools (Cursor, Windsurf, Cline) use this to load a site's entire documentation corpus in a single request without needing to follow each link. File size can be large for sites with substantial docs.
The maximum amount of text an LLM can hold in active memory during a single interaction, measured in tokens. llms.txt is designed to fit within a typical context window (roughly 100,000-200,000 tokens for modern models). This is why the file should be concise rather than comprehensive.
The practice of optimizing your content so AI-powered answer engines (ChatGPT, Perplexity, Claude, Google AI Overviews) cite or recommend your brand. llms.txt is one technical signal in a GEO strategy, alongside structured data, authoritative content, and entity mentions across the web.
When an AI system fetches external content at query time rather than relying solely on its training data. AI agents and some search-integrated AI tools perform retrieval. The distinction matters for llms.txt: it is most useful for retrieval-based systems, less so for pure training-based chat responses.
An H2 section in an llms.txt file titled exactly "Optional" that groups lower-priority pages (changelogs, legacy docs, archives). AI agents with limited context budgets can skip Optional sections and still get a complete picture of the site. All other H2 sections are treated as required content.
Based on a review of published llms.txt files from over 100 sites, these are the most frequent errors that undermine the file's usefulness to AI systems.
Dumping every URL on your site. llms.txt is meant to be a curated index, not a sitemap. Including every page, including tag archives, pagination URLs, and admin pages, wastes the AI's context window and buries your most important content. Keep it to your 15-50 most valuable pages.
Writing descriptions that just restate the page title. A description like "- [API Reference](url): API Reference" adds zero context. Write descriptions that answer what an AI will learn from that page. "Per-endpoint rate limits and how to handle 429 responses" is far more useful than "Rate Limits page".
Skipping the blockquote summary. The H1 + blockquote at the top is how AI systems understand your site before reading any links. Sites that start directly with H2 sections leave AI tools without the foundational context they need to prioritize your content correctly. The blockquote is required by the spec.
Treating llms.txt as a replacement for good content. If your pages lack depth, clarity, or authority, llms.txt cannot fix that. AI systems that do retrieve your llms.txt will then visit the linked pages and judge their quality. Posting to communities and building genuine mentions, such as the kind tools like <Link href="/" className="text-orange-600 font-semibold hover:underline">MediaFast</Link> help create, matters more than the technical file itself.
After publishing, confirm your file is publicly accessible and correctly formatted. Run these three checks.
1. Curl the URL directly
curl -I https://yourdomain.com/llms.txtShould return HTTP 200 with Content-Type: text/plain or text/markdown. Any 404 or redirect means the file is not served correctly.
2. Open in an incognito tab
Visit https://yourdomain.com/llms.txt in a private browser windowConfirms the file is publicly accessible without authentication. You should see raw Markdown text.
3. Validate with an online checker
Search for "llms.txt validator" or check llmstxt.org for any linked validation toolsSome community-built validators check Markdown structure against the spec. Useful for catching formatting issues.
Continue building your AI search presence with these companion guides.
Honest answers to the questions developers and marketers ask most.
The honest answer is: for most ChatGPT usage, no. ChatGPT's main chat interface is trained on a static snapshot and does not fetch /llms.txt at query time. Where llms.txt does show confirmed usage is in AI coding tools like Cursor, Claude Code, GitHub Copilot, and Windsurf, which routinely fetch /llms.txt and /llms-full.txt when pointed at a documentation site. Perplexity has publicly confirmed it retrieves llms.txt for page prioritization. OpenAI's usage in production systems remains unconfirmed.
No. As of mid-2026, llms.txt is a community convention proposed by Jeremy Howard of Answer.AI and fast.ai in September 2024. It has no backing from W3C, IETF, or any recognized standards body. The spec at llmstxt.org is maintained informally. robots.txt by contrast is a de facto standard with well-documented behavior from every major crawler. llms.txt is an emerging proposal with partial and inconsistent adoption.
llms.txt is the index file, essentially a curated table of contents linking to your most important pages with short descriptions. llms-full.txt is the expanded version that contains the actual full text of those pages inlined, so an AI agent does not need to follow each link separately. AI coding tools that want to load your entire documentation corpus in one request use llms-full.txt. For most sites, publishing only llms.txt is sufficient.
The file should be as short as possible while still covering all critical pages. The goal is to fit within a typical LLM context window without padding. Most well-maintained llms.txt files are between 200 and 800 lines. Anthropic's llms.txt, for example, links to around 50-80 documentation pages with brief descriptions. Avoid dumping every page, include only the pages that help an AI understand your product, API, or core content.
No. sitemap.xml is crawled by Google, Bing, and other search engines to index your pages for traditional search. llms.txt targets AI systems, not search crawlers. They serve completely different audiences. You need both. Removing your sitemap.xml in favor of llms.txt would break your traditional search indexing.
The spec allows an H2 section labeled "Optional" to group lower-priority pages that are useful for deep research but not part of the core mental model of your site. Examples include changelog pages, legacy API docs, advanced configuration guides, or archived blog posts. AI agents that run with limited context windows can skip Optional sections and still get a complete picture of what your site offers.