AI Crawler Access Across Four Markets
Policy, infrastructure and scale. A harmonised benchmark of AI crawler access across Australia, the United States, Great Britain and Singapore, measured with a single frozen instrument so the four volumes can be compared like for like.
Across Australia, the United States, Great Britain and Singapore, 660 of 1,643 domains with a readable robots.txt policy block at least one of the AI retrieval crawlers used by AI search systems to discover and cite content. The pooled rate is 40.2%; market rates range from 33.3% (Singapore) to 42.4% (Australia).
This analysis draws together four country volumes measured on one frozen instrument. The headline rate is consistent, but the more interesting findings are in how access is restricted, and in which organisations restrict it most.
Four markets, one instrument
All four volumes use the same 21-user-agent crawler set, the same policy/infrastructure two-layer model, and the same commercial-operator entity definition. Policy-layer rates are computed only on domains whose robots.txt could actually be read.
Sample totals: 2,239 domains approached, 1,643 with a readable robots.txt policy, 660 blocking at least one retrieval crawler. Per market: AU 133/314, US 261/619, GB 208/536, SG 58/174.
AI crawler blocking is common in every market
In all four markets, between a third and a little over two in five businesses with a readable crawler policy block at least one AI retrieval crawler. Australia (42.4%) and the United States (42.2%) sit almost level at the top; Great Britain (38.8%) is a few points lower; Singapore (33.3%) is the most open. No market is close to fully open, and none blocks the majority. The consistency of the band, roughly 33% to 42% across four very different economies, is itself the first finding: AI-crawler restriction is now a normal feature of the business web, not a quirk of any one market.
Markets restrict access in different ways
The headline rate measures how much access is restricted. The infrastructure layer reveals how — and here the markets diverge sharply. The PTODA model separates a policy decision (a block declared in robots.txt) from an access outcome (a 403, timeout or unscannable response), and reports the second layer separately.
The United States restricts access by actively denying the crawler at the edge; Great Britain and Singapore restrict it by not responding. Australia sits between the two, with a modest 39 access-denials and 56 unscannables. This is the analysis's most novel result: two markets with nearly identical headline rates (the US at 42.2%, Australia at 42.4%) arrive there through different mechanisms. Reporting only a single block percentage would have hidden this entirely.
Most blocking is not AI-specific
In every market, the large majority of blocked sites also block Googlebot, the conventional search crawler used here as a baseline control. If a site blocks Googlebot at the same rate it blocks AI crawlers, the block is a broad restriction catching AI crawlers incidentally rather than a deliberate decision to exclude AI search.
Between 86% and 90% of blocks in each market are broad restrictions. Only 10% to 14% are deliberate AI-only blocks, where a site keeps Googlebot accessible but excludes AI crawlers. The practical implication is the same everywhere: most businesses that are invisible to AI search did not choose to be. They carry broad access restrictions, often set years ago, that now catch AI crawlers as a side effect.
In Western markets, larger organisations block more often
Three of the four volumes carry an organisational-scale tier, assigned before crawling and held in a separate frozen metadata file. In two of those three — the United States and Great Britain — enterprises block AI retrieval crawlers more often than smaller organisations. The third, Singapore, runs the other way, and is treated separately as Finding 5.
In the United States the relationship is a clean gradient: enterprise (46.8%) above mid-market (40.4%) above challenger (36.8%). In Great Britain enterprises stand clearly apart (44.5%) while mid-market and challenger sit together near 36%. The likely mechanism is governance: larger organisations have dedicated IT, security and legal functions that set conservative, managed robots.txt and edge policies, where smaller firms more often run platform defaults.
Scope of this finding. The enterprise-blocks-more pattern appears in the two Western tiered volumes, the United States and Great Britain. Singapore is also tiered but inverts the pattern (Finding 5). Australia was sampled as an untiered practitioners-only pilot and does not contribute to the tier analysis. The US challenger tier (n=57) rests on a modest base, so the effect is reported as observed rather than as a tested statistical difference.
Singapore inverts the scale relationship
Singapore is the most open market overall, and within it the scale relationship runs backwards.
Where US and British enterprises block most, Singapore's enterprises block least (28.8%), and its challenger firms block most (37.7%) — the reverse of the Western pattern. This is not an artefact of global-firm branch offices: Singapore's enterprise tier is 87% Singapore-headquartered, and those home-grown enterprises drive the low rate. A plausible reading is that Singapore's largest businesses are globally oriented and optimise for visibility in a small market where being found matters, while smaller local firms are more defensive about their content. With modest tier sizes the inversion is reported as an observed pattern rather than a confirmed effect, but it is the clearest sign in the series that organisational scale and national market interact, rather than scale acting alone.
Read the United States study in full
The largest market in the series, and the one with the most distinctive mechanism. The US volume details the active-edge-denial pattern — the managed-WAF signature that sets it apart from the other three markets — across all 10 sectors and three organisational tiers.
AI Crawler Access in the United States 2026. 42.2% of 619 policy-observed US business websites block at least one AI retrieval crawler (808 domains approached). The US records the series' highest rate of active edge denial: 146 domains, 18.1%, behind managed WAFs. Read the United States volume →
Or read the other country volumes: Australia · Great Britain · Singapore.
How this analysis was produced
Data: PTODA C01 Crawler v1.2 · series master frozen 17 June 2026.
Roles. This study is published by the Periodic Table of Digital Authority (PTODA), the methodology owner. It was conducted using the PTODA C01 Crawler v1.2, a deterministic robots.txt reference instrument, under PTODA C01 Crawler Methodology v1.2. AUTHORITY44 provided technical infrastructure and execution support as commercial operator. Douglas Lord is the founder of both PTODA and AUTHORITY44; this relationship is disclosed in full. The sample was constructed from named public directories with no reference to commercial relationships. The methodology is fully documented and reproducible. This study publishes aggregate, anonymised findings only. No named individual site results are published.
Attribution chain: Douglas Lord (researcher, author) · Periodic Table of Digital Authority (publisher & methodology owner) · PTODA C01 Crawler v1.2 (research instrument) · AUTHORITY44™ (commercial operator) · Digital Dominator Pty Ltd ABN 28 616 931 116 (operating entity).
Intellectual property notice: This study, its methodology, findings, data, and all associated content are the original work of Douglas Lord and the property of Digital Dominator Pty Ltd (ABN 28 616 931 116). The Periodic Table of Digital Authority™ is a coined framework and trade mark pending (TM 2644497). AUTHORITY44™ is a trade mark pending (TM 2643932). All rights reserved.
You may cite findings from this study with appropriate attribution identifying the author (Douglas Lord), the publisher (Periodic Table of Digital Authority — periodictableofdigitalauthority.com), and the research instrument (PTODA C01 Crawler v1.2). You may not reproduce this study in full, present these findings as your own research, or use the framework name or trade marks without prior written consent. Use of this research is subject to the Terms of Use.