Study 01 · Global Digital Authority Benchmark Series · Cross-Market Analysis

AI Crawler Access Across Four Markets

Policy, infrastructure and scale. A harmonised benchmark of AI crawler access across Australia, the United States, Great Britain and Singapore, measured with a single frozen instrument so the four volumes can be compared like for like.

AuthorDouglas Lord

Published byPeriodic Table of Digital Authority

Research instrumentPTODA C01 Crawler v1.2

MethodologyPTODA C01 Crawler Methodology v1.2

Scan date17 June 2026

Sample2,239 domains · 4 markets · 39 sector cells

40^%

of business websites with an observable crawler policy block at least one AI retrieval crawler, pooled across four markets

Across Australia, the United States, Great Britain and Singapore, 660 of 1,643 domains with a readable robots.txt policy block at least one of the AI retrieval crawlers used by AI search systems to discover and cite content. The pooled rate is 40.2%; market rates range from 33.3% (Singapore) to 42.4% (Australia).

This analysis draws together four country volumes measured on one frozen instrument. The headline rate is consistent, but the more interesting findings are in how access is restricted, and in which organisations restrict it most.

At a glance

Four markets, one instrument

All four volumes use the same 21-user-agent crawler set, the same policy/infrastructure two-layer model, and the same commercial-operator entity definition. Policy-layer rates are computed only on domains whose robots.txt could actually be read.

Policy-layer block rate by market (% blocking ≥1 retrieval crawler, of policy-observed domains)

Australia

42.4%

United States

42.2%

Great Britain

38.8%

Singapore

33.3%

Sample totals: 2,239 domains approached, 1,643 with a readable robots.txt policy, 660 blocking at least one retrieval crawler. Per market: AU 133/314, US 261/619, GB 208/536, SG 58/174.

Finding 1

AI crawler blocking is common in every market

In all four markets, between a third and a little over two in five businesses with a readable crawler policy block at least one AI retrieval crawler. Australia (42.4%) and the United States (42.2%) sit almost level at the top; Great Britain (38.8%) is a few points lower; Singapore (33.3%) is the most open. No market is close to fully open, and none blocks the majority. The consistency of the band, roughly 33% to 42% across four very different economies, is itself the first finding: AI-crawler restriction is now a normal feature of the business web, not a quirk of any one market.

Finding 2 · the novel result

Markets restrict access in different ways

The headline rate measures how much access is restricted. The infrastructure layer reveals how — and here the markets diverge sharply. The PTODA model separates a policy decision (a block declared in robots.txt) from an access outcome (a 403, timeout or unscannable response), and reports the second layer separately.

18.1%

United States — active edge denial

146 of 808 domains returned HTTP 403/401/429: the edge actively refused the request. The largest identifiable share sits behind managed WAFs (Cloudflare 41, Akamai 26). This is deliberate bot mitigation.

20.2%

Great Britain — non-response

152 of 754 domains were unscannable through connection failure or timeout, with no managed-CDN signature (148 of 152 behind no identifiable managed CDN). Passive non-response, not denial.

30.2%

Singapore — non-response

81 of 268 domains unscannable against just 13 active denials. The same passive pattern as Great Britain, more pronounced.

The United States restricts access by actively denying the crawler at the edge; Great Britain and Singapore restrict it by not responding. Australia sits between the two, with a modest 39 access-denials and 56 unscannables. This is the analysis's most novel result: two markets with nearly identical headline rates (the US at 42.2%, Australia at 42.4%) arrive there through different mechanisms. Reporting only a single block percentage would have hidden this entirely.

Finding 3

Most blocking is not AI-specific

In every market, the large majority of blocked sites also block Googlebot, the conventional search crawler used here as a baseline control. If a site blocks Googlebot at the same rate it blocks AI crawlers, the block is a broad restriction catching AI crawlers incidentally rather than a deliberate decision to exclude AI search.

Share of blocked sites that also block Googlebot (broad restriction, not AI-specific)

Australia

90.2%

United States

89.7%

Great Britain

88.9%

Singapore

86.2%

Between 86% and 90% of blocks in each market are broad restrictions. Only 10% to 14% are deliberate AI-only blocks, where a site keeps Googlebot accessible but excludes AI crawlers. The practical implication is the same everywhere: most businesses that are invisible to AI search did not choose to be. They carry broad access restrictions, often set years ago, that now catch AI crawlers as a side effect.

Finding 4 · scale

In Western markets, larger organisations block more often

Three of the four volumes carry an organisational-scale tier, assigned before crawling and held in a separate frozen metadata file. In two of those three — the United States and Great Britain — enterprises block AI retrieval crawlers more often than smaller organisations. The third, Singapore, runs the other way, and is treated separately as Finding 5.

Block rate by organisational tier (policy-observed domains)

US · Enterprise

46.8%

US · Mid-market

40.4%

US · Challenger

36.8%

GB · Enterprise

44.5%

GB · Mid-market

35.6%

GB · Challenger

36.2%

In the United States the relationship is a clean gradient: enterprise (46.8%) above mid-market (40.4%) above challenger (36.8%). In Great Britain enterprises stand clearly apart (44.5%) while mid-market and challenger sit together near 36%. The likely mechanism is governance: larger organisations have dedicated IT, security and legal functions that set conservative, managed robots.txt and edge policies, where smaller firms more often run platform defaults.

Scope of this finding. The enterprise-blocks-more pattern appears in the two Western tiered volumes, the United States and Great Britain. Singapore is also tiered but inverts the pattern (Finding 5). Australia was sampled as an untiered practitioners-only pilot and does not contribute to the tier analysis. The US challenger tier (n=57) rests on a modest base, so the effect is reported as observed rather than as a tested statistical difference.

Finding 5 · the outlier

Singapore inverts the scale relationship

Singapore is the most open market overall, and within it the scale relationship runs backwards.

Singapore block rate by tier (policy-observed domains)

SG · Enterprise

28.8% (n=52)

SG · Mid-market

33.3% (n=69)

SG · Challenger

37.7% (n=53)

Where US and British enterprises block most, Singapore's enterprises block least (28.8%), and its challenger firms block most (37.7%) — the reverse of the Western pattern. This is not an artefact of global-firm branch offices: Singapore's enterprise tier is 87% Singapore-headquartered, and those home-grown enterprises drive the low rate. A plausible reading is that Singapore's largest businesses are globally oriented and optimise for visibility in a small market where being found matters, while smaller local firms are more defensive about their content. With modest tier sizes the inversion is reported as an observed pattern rather than a confirmed effect, but it is the clearest sign in the series that organisational scale and national market interact, rather than scale acting alone.

Featured country volume

Read the United States study in full

The largest market in the series, and the one with the most distinctive mechanism. The US volume details the active-edge-denial pattern — the managed-WAF signature that sets it apart from the other three markets — across all 10 sectors and three organisational tiers.

AI Crawler Access in the United States 2026. 42.2% of 619 policy-observed US business websites block at least one AI retrieval crawler (808 domains approached). The US records the series' highest rate of active edge denial: 146 domains, 18.1%, behind managed WAFs. Read the United States volume →

Or read the other country volumes: Australia · Great Britain · Singapore.

Methodology

How this analysis was produced

Instrument

PTODA C01 Crawler v1.2 — a deterministic robots.txt scanner testing 21 AI user-agents (14 retrieval, 7 training). Same input produces the same result.

Harmonisation

All four volumes use one frozen crawler list, one commercial-operator entity rule (portals, aggregators, government, industry bodies and not-for-profits excluded), and one policy/infrastructure two-layer model, so the markets are comparable like for like.

Two layers

Policy layer (open / partial / blocked, from readable robots.txt) is reported separately from the infrastructure layer (access-denied 401/403/429, and unscannable timeouts/5xx). The infrastructure layer is excluded from every policy rate.

Tiering

Organisational scale (enterprise / mid-market / challenger) was assigned before crawling and held in a frozen metadata file, joined to observations on domain. Tiers were never assigned after seeing results. US, GB and SG are tiered; AU is not.

Dataset

AI Crawler Access Study Series v1.2, frozen 17 June 2026. The authoritative source for every figure on this page.

Country studies

Full per-market detail: Australia · United States · Great Britain · Singapore.

Data: PTODA C01 Crawler v1.2 · series master frozen 17 June 2026.

Disclosure & Intellectual Property

Roles. This study is published by the Periodic Table of Digital Authority (PTODA), the methodology owner. It was conducted using the PTODA C01 Crawler v1.2, a deterministic robots.txt reference instrument, under PTODA C01 Crawler Methodology v1.2. AUTHORITY44 provided technical infrastructure and execution support as commercial operator. Douglas Lord is the founder of both PTODA and AUTHORITY44; this relationship is disclosed in full. The sample was constructed from named public directories with no reference to commercial relationships. The methodology is fully documented and reproducible. This study publishes aggregate, anonymised findings only. No named individual site results are published.

Attribution chain: Douglas Lord (researcher, author) · Periodic Table of Digital Authority (publisher & methodology owner) · PTODA C01 Crawler v1.2 (research instrument) · AUTHORITY44™ (commercial operator) · Digital Dominator Pty Ltd ABN 28 616 931 116 (operating entity).

Intellectual property notice: This study, its methodology, findings, data, and all associated content are the original work of Douglas Lord and the property of Digital Dominator Pty Ltd (ABN 28 616 931 116). The Periodic Table of Digital Authority™ is a coined framework and trade mark pending (TM 2644497). AUTHORITY44™ is a trade mark pending (TM 2643932). All rights reserved.

You may cite findings from this study with appropriate attribution identifying the author (Douglas Lord), the publisher (Periodic Table of Digital Authority — periodictableofdigitalauthority.com), and the research instrument (PTODA C01 Crawler v1.2). You may not reproduce this study in full, present these findings as your own research, or use the framework name or trade marks without prior written consent. Use of this research is subject to the Terms of Use.