A structured benchmark of 754 publicly identifiable British business websites (commercial operating entities) across 10 industry groups, measuring which AI search crawlers can — and cannot — access them.
Of 536 domains whose crawler policy could be directly observed, 208 — 38.8% — block at least one crawler used by AI search systems to discover and cite content.
We separate observable AI-crawler policy from infrastructure non-response. A block declared in robots.txt is a policy decision; a 403, timeout or unscannable response is an access outcome, not evidence of crawler policy. These are reported separately below.
Of the blocked sites, 88.9% are broad access restrictions catching AI crawlers incidentally rather than AI-specific decisions.
Policy-layer figures are based on 536 domains whose robots.txt was successfully retrieved and parsed. A further 218 domains returned no observable policy (66 access-denied, 152 unscannable) and are reported separately in the Infrastructure layer section. They are not counted as open or blocked.
The central finding: Most blocked businesses are not actively choosing to exclude AI search. They have broad access restrictions set years ago that are now inadvertently catching AI crawlers. This is a configuration problem, not a strategic decision.
We separate observable AI-crawler policy from infrastructure non-response. A block in robots.txt is a policy decision. A 403, timeout or unscannable response is an access outcome, not evidence of crawler policy. These 218 domains are reported here and excluded from every policy-layer figure. Great Britain has the largest number of unscannable domains in the series, and the highest unscannable rate among the large-market volumes.
Why this matters: a domain that denies the crawler at the infrastructure layer has not expressed an AI-crawler policy — it has prevented one from being read. Treating such a response as “open” would overstate access; treating it as “blocked” would overstate restriction. Reporting it separately keeps the policy-layer figures based only on directly observed robots.txt behaviour.
The Great Britain signature: infrastructure non-response. Great Britain is the mirror image of the United States. Where the US actively denies the crawler at the edge, Great Britain simply does not respond. Its 152 unscannable domains (20.2%, and the largest such count in the series) returned no readable robots.txt through connection failure or timeout, and they carry no managed-WAF signature: of those domains, 148 of 152 sit behind no identifiable managed CDN. This is passive non-response, not deliberate mitigation. The unscannable domains are spread across mainstream sectors (healthcare, building, accounting) rather than clustered in tech or challenger firms, so this is a feature of British business web infrastructure rather than an artefact of which businesses were sampled. How access is restricted, not only how much, differs by market.
Of the 208 sites blocking AI retrieval crawlers, the source of the block was classified into three categories.
The infrastructure-imposed subset is the most commercially significant finding: these site owners may be blocking AI search discovery without ever having made that decision. The indeterminate category — 82 Cloudflare-hosted sites — most likely represents explicit blocks, but the configuration path cannot be confirmed by automated means alone.
Block rates vary across the 10 sectors. Real Estate and Education & Training are highest; Healthcare lowest. Rates are computed on policy-observed domains per sector (readable robots.txt only).
Real Estate at 59.3% and Education & Training at 58.2% are the highest-blocking British sectors. Education is notably high, driven by university and college domains running deliberate, managed robots.txt policies. Technology & SaaS at 48.3% continues the cross-market pattern of AI-aware sectors blocking heavily. Healthcare at 20.9% is the most open, the lowest healthcare rate in the series.
Group A (retrieval/citation crawlers) drives the headline finding. Group B (training crawlers) is reported separately, because blocking training crawlers is often a deliberate and legitimate content-protection decision.
The Googlebot parity finding holds in Great Britain. Googlebot is blocked at 34.5%, right alongside the AI retrieval crawlers (ClaudeBot 37.9%, GPTBot 37.7%). Most British AI-crawler blocks are broad restrictions, not targeted AI decisions. The 14 retrieval crawlers cluster tightly (34.5%–37.9%), indicating that where AI is blocked, it is typically blocked uniformly across operators rather than selectively.
Block rates by content management system among policy-observed domains. WordPress and Drupal have usable bases in the British sample; the others rest on small samples and should be read with caution.
Most sites return no identifiable CMS signature, so platform-level rates are based on the minority that do. WordPress at 25.6% (n=82) sits below the overall British sample average. Drupal at 7.7% (n=26) is markedly lower again, consistent with its heavy public-sector and enterprise skew, where robots.txt tends to be deliberately and conservatively managed. Shopify (n=9), Webflow (n=5), Joomla (n=2) and Squarespace (n=1) have too few domains in this sample to report a meaningful rate.
Roles. This study is published by the Periodic Table of Digital Authority (PTODA), the methodology owner. It was conducted using the PTODA C01 Crawler v1.2, a deterministic robots.txt reference instrument, under PTODA C01 Crawler Methodology v1.2. AUTHORITY44 provided technical infrastructure and execution support as commercial operator. Douglas Lord is the founder of both PTODA and AUTHORITY44; this relationship is disclosed in full. The sample was constructed from named public directories with no reference to commercial relationships. The methodology is fully documented and reproducible. This study publishes aggregate, anonymised findings only. No named individual site results are published.
Attribution chain: Douglas Lord (researcher, author) · Periodic Table of Digital Authority (publisher & methodology owner) · PTODA C01 Crawler v1.2 (research instrument) · AUTHORITY44™ (commercial operator) · Digital Dominator Pty Ltd ABN 28 616 931 116 (operating entity).
Intellectual property notice: This study, its methodology, findings, data, and all associated content are the original work of Douglas Lord and the property of Digital Dominator Pty Ltd (ABN 28 616 931 116). The Periodic Table of Digital Authority™ is a coined framework and trade mark pending (TM 2644497). AUTHORITY44™ is a trade mark pending (TM 2643932). All rights reserved.
You may cite findings from this study with appropriate attribution identifying the author (Douglas Lord), the publisher (Periodic Table of Digital Authority — periodictableofdigitalauthority.com), and the research instrument (PTODA C01 Crawler v1.2). You may not reproduce this study in full, present these findings as your own research, or use the framework name or trade marks without prior written consent. Use of this research is subject to the Terms of Use.