Study 02 · Global Digital Authority Benchmark Series · Cross-Market Analysis

Does llms.txt Mean a Business Is Open to AI?

A four-market study pairs llms.txt adoption with AI-crawler access on the same domains. Adoption is emerging and overwhelmingly valid — but it does not predict whether a site actually lets AI in. More than a third of adopters block the very crawlers they publish guidance for.

AuthorDouglas Lord

Published byPeriodic Table of Digital Authority

Research instrumentPTODA C01 Crawler v1.3

MethodologyPTODA C01 Crawler Methodology v1.3

Scan date17 June 2026

Paired sample1,617 domains · 4 markets

36^%

of llms.txt adopters simultaneously block AI crawlers in robots.txt

Publishing llms.txt is not a reliable indicator of AI openness. Across four markets, businesses that adopted llms.txt were no more likely to permit AI crawlers than businesses that had not — the difference in blocking was not statistically significant (36.0% vs 40.7%, p = 0.165). More than one third of adopters block the crawlers they are guiding.

Adoption reached 15.0% across the four markets and 91.2% of adopters published a valid file. Adoption is genuine and conformance is high — but machine-readable AI guidance and actual AI access remain largely independent signals.

Three findings, in order

The study measures two things on the same domains: whether a site publishes llms.txt (AI guidance), and whether it permits AI crawlers in robots.txt (AI access). The relationship between them is the result.

15.0%

Adoption is emerging

284 of 1,894 scannable domains publish llms.txt — one in seven, ~18 months after the proposal appeared. Not fringe, not standard.

91.2%

Adoption quality is high

When a business adopts llms.txt, it almost always publishes a valid file. This refutes the "everyone ships broken files" narrative.

n.s.

Adoption ≠ openness

No statistically significant relationship between adopting llms.txt and permitting AI crawlers. The two signals are independent.

The headline. Publishing AI guidance does not reliably indicate that a business is open to AI access. The data does not support the claim that adopters are meaningfully more open — so it is not made.

Adoption is emerging, and valid when present

llms.txt adoption across the four markets, with conformance reported separately. A file counts as present only when it returns a genuine 200 (verified against a random control path to exclude soft-404 servers); conformant only when it is a non-empty, markdown-structured, non-HTML file with a heading.

llms.txt adoption by market — % of scannable domains

Singapore

19.4%

United States

17.0%

Australia

12.7%

Great Britain

12.4%

Pooled

15.0%

Conformance among adopters is high in every market — 96.1% in the United States, 90.9% in Australia, 88.9% in Singapore, 84.0% in Great Britain, and 91.2% pooled. The ~9% that are present-but-invalid are empty files, HTML pages served at the path, or placeholders — caught only because presence and conformance were measured as separate things.

How to read these rates. Because the sample consists primarily of established businesses rather than a random sample of websites, these adoption rates should not be interpreted as representative of the broader web. They are a benchmark for substantial businesses in four markets, not an internet-wide estimate.

The headline: adoption does not predict openness

The central question — do llms.txt adopters actually permit AI crawlers more often than non-adopters? — resolves to no, to any degree the data can distinguish from chance.

36.0%

Block rate — adopters

Of domains that publish llms.txt, this share also block at least one AI retrieval crawler.

40.7%

Block rate — non-adopters

Of domains without llms.txt, this share block at least one AI retrieval crawler.

p = 0.165

Difference: not significant

The 4.7-point gap is within the range expected by chance. It cannot be reported as an openness effect.

Effect estimate. Risk ratio 0.89 (95% CI 0.75–1.05); odds ratio 0.82 (95% CI 0.63–1.07). Direction: adopters block slightly less. Magnitude: small, ~11% lower relative risk. Uncertainty: large — the confidence interval crosses 1, so the data are consistent with no difference in either direction. The interval crossing 1 is the same conclusion as the non-significant p-value, expressed as a range.

The contradiction. 36% of llms.txt adopters simultaneously block AI crawlers in robots.txt — publishing instructions for AI while forbidding it from fetching the site. The United States is highest at 43.4%. The same null, expressed as a number people remember.

The interpretation follows a pre-registered decision rule established prior to analysis. Because adopters and non-adopters exhibited statistically indistinguishable AI-crawler blocking rates (p = 0.165), the study concludes that llms.txt adoption is not a reliable indicator of AI-crawler openness within the observed sample — the protocol fixed how to read the result before the result was seen.

This suggests that, at present, llms.txt functions more as a guidance mechanism than as a signal of AI openness. A business publishing llms.txt is telling AI systems how to read its content; it is not, by that act, telling them they may. The two decisions live in different files and, on this evidence, do not move together.

Contradiction rate by market — adopters who also block AI

United States

43.4%

Great Britain

32.4%

Australia

31.8%

Singapore

22.2%

Adoption tracks sector, not openness

Where adoption varies cleanly is by sector — and it tracks technical sophistication, not access posture. Technology firms adopt llms.txt roughly seven times more often than legal or education.

llms.txt adoption by sector — pooled across four markets

Technology / SaaS

37.3%

Accounting / finance

23.7%

Professional svcs

20.0%

Retail / e-commerce

13.9%

Healthcare

10.6%

Real estate

10.0%

Hospitality / tourism

9.4%

Building / trades

7.8%

Education / training

5.4%

Legal

5.0%

llms.txt is a developer-facing convention, and the sectors closest to web engineering adopt it first. This says nothing about whether those sectors are more open to AI — only that they are more aware of the convention.

The access layer, reconfirmed

Because v1.3 re-runs the v1.2 access scan, this study independently reconfirms the AI-crawler-access findings on the same frozen frames — a small reliability result in its own right.

AI retrieval-crawler block rate by market — reproduced

Australia

42.8%

United States

42.2%

Great Britain

38.8%

Singapore

33.1%

These figures match the frozen v1.2 benchmark, within one domain of expected on Australia — an artefact of live infrastructure response. The same instrument on the same frames returns the same access picture.

Supporting market volumes

Each market has a companion volume reporting its descriptive figures — adoption, conformance, contradiction, and access benchmark. These are descriptive breakdowns; the inferential finding lives in this pooled analysis, where the sample (n = 1,617) supports it.

17.0%

United States →

Adoption 17.0% · conformance 96.1% · contradiction 43.4%.

12.7%

Australia →

Adoption 12.7% · conformance 90.9% · contradiction 31.8%.

12.4%

Great Britain →

Adoption 12.4% · conformance 84.0% · contradiction 32.4%.

19.4%

Singapore →

Adoption 19.4% · conformance 88.9% · contradiction 22.2%.

How this was measured

Instrument & design

Instrument

PTODA C01 Crawler v1.3 — a strict superset of the v1.2 access scanner. Read the methodology →

Two layers, one pass

robots.txt access (Layer 1) and llms.txt adoption (Layer 2) measured on each domain in the same scan, then cross-tabulated (Layer 3).

Presence control

A random control path is fetched first; servers that 200 every path (soft-404) are excluded so adoption is not inflated.

Presence vs conformance

Reported separately. A present file is conformant only if non-empty, markdown, headed, and not HTML or placeholder.

Denominators

Rates computed only where the relevant layer was observable. The paired analysis uses domains where both layers were observable (n = 1,617).

Determinism

Same domain in the same state returns the same classification. Validation fixtures pass before every scan.

Unequal paired samples. Great Britain and Singapore carry higher infrastructure non-response, so their paired denominators are smaller; pooled figures weight by observable sample.
Point-in-time. Both layers are a single observation per domain; robots.txt and llms.txt can change. This is a snapshot, not a trend.
Presence, not richness. Conformance is a structural check, not a judgement of guidance quality.
Retrieval, not outcomes. The study measures access and adoption, not whether AI systems actually cite the sites — that is the planned citation study.