Study 02 · Global Digital Authority Benchmark Series · Cross-Market Analysis

Does llms.txt Mean a Business Is Open to AI?

A four-market study pairs llms.txt adoption with AI-crawler access on the same domains. Adoption is emerging and overwhelmingly valid — but it does not predict whether a site actually lets AI in. More than a third of adopters block the very crawlers they publish guidance for.

Research instrumentPTODA C01 Crawler v1.3
Scan date17 June 2026
Paired sample1,617 domains · 4 markets
36%
of llms.txt adopters simultaneously block AI crawlers in robots.txt

Publishing llms.txt is not a reliable indicator of AI openness. Across four markets, businesses that adopted llms.txt were no more likely to permit AI crawlers than businesses that had not — the difference in blocking was not statistically significant (36.0% vs 40.7%, p = 0.165). More than one third of adopters block the crawlers they are guiding.

Adoption reached 15.0% across the four markets and 91.2% of adopters published a valid file. Adoption is genuine and conformance is high — but machine-readable AI guidance and actual AI access remain largely independent signals.

Three findings, in order

The study measures two things on the same domains: whether a site publishes llms.txt (AI guidance), and whether it permits AI crawlers in robots.txt (AI access). The relationship between them is the result.

15.0%
Adoption is emerging
284 of 1,894 scannable domains publish llms.txt — one in seven, ~18 months after the proposal appeared. Not fringe, not standard.
91.2%
Adoption quality is high
When a business adopts llms.txt, it almost always publishes a valid file. This refutes the "everyone ships broken files" narrative.
n.s.
Adoption ≠ openness
No statistically significant relationship between adopting llms.txt and permitting AI crawlers. The two signals are independent.

The headline. Publishing AI guidance does not reliably indicate that a business is open to AI access. The data does not support the claim that adopters are meaningfully more open — so it is not made.

Adoption is emerging, and valid when present

llms.txt adoption across the four markets, with conformance reported separately. A file counts as present only when it returns a genuine 200 (verified against a random control path to exclude soft-404 servers); conformant only when it is a non-empty, markdown-structured, non-HTML file with a heading.

llms.txt adoption by market — % of scannable domains
Singapore
19.4%
United States
17.0%
Australia
12.7%
Great Britain
12.4%
Pooled
15.0%

Conformance among adopters is high in every market — 96.1% in the United States, 90.9% in Australia, 88.9% in Singapore, 84.0% in Great Britain, and 91.2% pooled. The ~9% that are present-but-invalid are empty files, HTML pages served at the path, or placeholders — caught only because presence and conformance were measured as separate things.

How to read these rates. Because the sample consists primarily of established businesses rather than a random sample of websites, these adoption rates should not be interpreted as representative of the broader web. They are a benchmark for substantial businesses in four markets, not an internet-wide estimate.

The headline: adoption does not predict openness

The central question — do llms.txt adopters actually permit AI crawlers more often than non-adopters? — resolves to no, to any degree the data can distinguish from chance.

36.0%
Block rate — adopters
Of domains that publish llms.txt, this share also block at least one AI retrieval crawler.
40.7%
Block rate — non-adopters
Of domains without llms.txt, this share block at least one AI retrieval crawler.
p = 0.165
Difference: not significant
The 4.7-point gap is within the range expected by chance. It cannot be reported as an openness effect.

Effect estimate. Risk ratio 0.89 (95% CI 0.75–1.05); odds ratio 0.82 (95% CI 0.63–1.07). Direction: adopters block slightly less. Magnitude: small, ~11% lower relative risk. Uncertainty: large — the confidence interval crosses 1, so the data are consistent with no difference in either direction. The interval crossing 1 is the same conclusion as the non-significant p-value, expressed as a range.

The contradiction. 36% of llms.txt adopters simultaneously block AI crawlers in robots.txt — publishing instructions for AI while forbidding it from fetching the site. The United States is highest at 43.4%. The same null, expressed as a number people remember.

The interpretation follows a pre-registered decision rule established prior to analysis. Because adopters and non-adopters exhibited statistically indistinguishable AI-crawler blocking rates (p = 0.165), the study concludes that llms.txt adoption is not a reliable indicator of AI-crawler openness within the observed sample — the protocol fixed how to read the result before the result was seen.

This suggests that, at present, llms.txt functions more as a guidance mechanism than as a signal of AI openness. A business publishing llms.txt is telling AI systems how to read its content; it is not, by that act, telling them they may. The two decisions live in different files and, on this evidence, do not move together.

Contradiction rate by market — adopters who also block AI
United States
43.4%
Great Britain
32.4%
Australia
31.8%
Singapore
22.2%

Adoption tracks sector, not openness

Where adoption varies cleanly is by sector — and it tracks technical sophistication, not access posture. Technology firms adopt llms.txt roughly seven times more often than legal or education.

llms.txt adoption by sector — pooled across four markets
Technology / SaaS
37.3%
Accounting / finance
23.7%
Professional svcs
20.0%
Retail / e-commerce
13.9%
Healthcare
10.6%
Real estate
10.0%
Hospitality / tourism
9.4%
Building / trades
7.8%
Education / training
5.4%
Legal
5.0%

llms.txt is a developer-facing convention, and the sectors closest to web engineering adopt it first. This says nothing about whether those sectors are more open to AI — only that they are more aware of the convention.

The access layer, reconfirmed

Because v1.3 re-runs the v1.2 access scan, this study independently reconfirms the AI-crawler-access findings on the same frozen frames — a small reliability result in its own right.

AI retrieval-crawler block rate by market — reproduced
Australia
42.8%
United States
42.2%
Great Britain
38.8%
Singapore
33.1%

These figures match the frozen v1.2 benchmark, within one domain of expected on Australia — an artefact of live infrastructure response. The same instrument on the same frames returns the same access picture.

Supporting market volumes

Each market has a companion volume reporting its descriptive figures — adoption, conformance, contradiction, and access benchmark. These are descriptive breakdowns; the inferential finding lives in this pooled analysis, where the sample (n = 1,617) supports it.

17.0%
Adoption 17.0% · conformance 96.1% · contradiction 43.4%.
12.7%
Adoption 12.7% · conformance 90.9% · contradiction 31.8%.
12.4%
Adoption 12.4% · conformance 84.0% · contradiction 32.4%.
19.4%
Adoption 19.4% · conformance 88.9% · contradiction 22.2%.

How this was measured

Instrument & design

Instrument
PTODA C01 Crawler v1.3 — a strict superset of the v1.2 access scanner. Read the methodology →
Two layers, one pass
robots.txt access (Layer 1) and llms.txt adoption (Layer 2) measured on each domain in the same scan, then cross-tabulated (Layer 3).
Presence control
A random control path is fetched first; servers that 200 every path (soft-404) are excluded so adoption is not inflated.
Presence vs conformance
Reported separately. A present file is conformant only if non-empty, markdown, headed, and not HTML or placeholder.
Denominators
Rates computed only where the relevant layer was observable. The paired analysis uses domains where both layers were observable (n = 1,617).
Determinism
Same domain in the same state returns the same classification. Validation fixtures pass before every scan.