Does llms.txt Mean a Business Is Open to AI?
A four-market study pairs llms.txt adoption with AI-crawler access on the same domains. Adoption is emerging and overwhelmingly valid — but it does not predict whether a site actually lets AI in. More than a third of adopters block the very crawlers they publish guidance for.
Publishing llms.txt is not a reliable indicator of AI openness. Across four markets, businesses that adopted llms.txt were no more likely to permit AI crawlers than businesses that had not — the difference in blocking was not statistically significant (36.0% vs 40.7%, p = 0.165). More than one third of adopters block the crawlers they are guiding.
Adoption reached 15.0% across the four markets and 91.2% of adopters published a valid file. Adoption is genuine and conformance is high — but machine-readable AI guidance and actual AI access remain largely independent signals.
Three findings, in order
The study measures two things on the same domains: whether a site publishes llms.txt (AI guidance), and whether it permits AI crawlers in robots.txt (AI access). The relationship between them is the result.
The headline. Publishing AI guidance does not reliably indicate that a business is open to AI access. The data does not support the claim that adopters are meaningfully more open — so it is not made.
Adoption is emerging, and valid when present
llms.txt adoption across the four markets, with conformance reported separately. A file counts as present only when it returns a genuine 200 (verified against a random control path to exclude soft-404 servers); conformant only when it is a non-empty, markdown-structured, non-HTML file with a heading.
Conformance among adopters is high in every market — 96.1% in the United States, 90.9% in Australia, 88.9% in Singapore, 84.0% in Great Britain, and 91.2% pooled. The ~9% that are present-but-invalid are empty files, HTML pages served at the path, or placeholders — caught only because presence and conformance were measured as separate things.
How to read these rates. Because the sample consists primarily of established businesses rather than a random sample of websites, these adoption rates should not be interpreted as representative of the broader web. They are a benchmark for substantial businesses in four markets, not an internet-wide estimate.
The headline: adoption does not predict openness
The central question — do llms.txt adopters actually permit AI crawlers more often than non-adopters? — resolves to no, to any degree the data can distinguish from chance.
Effect estimate. Risk ratio 0.89 (95% CI 0.75–1.05); odds ratio 0.82 (95% CI 0.63–1.07). Direction: adopters block slightly less. Magnitude: small, ~11% lower relative risk. Uncertainty: large — the confidence interval crosses 1, so the data are consistent with no difference in either direction. The interval crossing 1 is the same conclusion as the non-significant p-value, expressed as a range.
The contradiction. 36% of llms.txt adopters simultaneously block AI crawlers in robots.txt — publishing instructions for AI while forbidding it from fetching the site. The United States is highest at 43.4%. The same null, expressed as a number people remember.
The interpretation follows a pre-registered decision rule established prior to analysis. Because adopters and non-adopters exhibited statistically indistinguishable AI-crawler blocking rates (p = 0.165), the study concludes that llms.txt adoption is not a reliable indicator of AI-crawler openness within the observed sample — the protocol fixed how to read the result before the result was seen.
This suggests that, at present, llms.txt functions more as a guidance mechanism than as a signal of AI openness. A business publishing llms.txt is telling AI systems how to read its content; it is not, by that act, telling them they may. The two decisions live in different files and, on this evidence, do not move together.
Adoption tracks sector, not openness
Where adoption varies cleanly is by sector — and it tracks technical sophistication, not access posture. Technology firms adopt llms.txt roughly seven times more often than legal or education.
llms.txt is a developer-facing convention, and the sectors closest to web engineering adopt it first. This says nothing about whether those sectors are more open to AI — only that they are more aware of the convention.
The access layer, reconfirmed
Because v1.3 re-runs the v1.2 access scan, this study independently reconfirms the AI-crawler-access findings on the same frozen frames — a small reliability result in its own right.
These figures match the frozen v1.2 benchmark, within one domain of expected on Australia — an artefact of live infrastructure response. The same instrument on the same frames returns the same access picture.
Supporting market volumes
Each market has a companion volume reporting its descriptive figures — adoption, conformance, contradiction, and access benchmark. These are descriptive breakdowns; the inferential finding lives in this pooled analysis, where the sample (n = 1,617) supports it.
How this was measured
Instrument & design
- Unequal paired samples. Great Britain and Singapore carry higher infrastructure non-response, so their paired denominators are smaller; pooled figures weight by observable sample.
- Point-in-time. Both layers are a single observation per domain; robots.txt and llms.txt can change. This is a snapshot, not a trend.
- Presence, not richness. Conformance is a structural check, not a judgement of guidance quality.
- Retrieval, not outcomes. The study measures access and adoption, not whether AI systems actually cite the sites — that is the planned citation study.