Study 01 · Global Digital Authority Benchmark Series · Australia 2026

Nearly 2 in 5 Australian business websites are blocking the AI crawlers they need

A structured benchmark of 500 publicly identifiable Australian business websites across 10 industry groups, measuring which AI search crawlers can — and cannot — access them.

Author Douglas Lord
Instrument AUTHORITY44™
Scan date 6 June 2026, 20:00 AEST
Sample 500 domains · 10 sectors · 440 scannable
38%
of Australian business websites block at least one AI retrieval crawler

Across 440 scannable sites, 169 — 38.4% — block at least one crawler used by AI search systems to discover and cite content.

Of those, 91.7% are broad access restrictions catching AI crawlers incidentally. Most businesses likely do not know this is happening.

Key findings

The numbers at a glance

All figures based on 440 scannable domains. 60 excluded as unscannable — 23 confirmed bot-protection on major enterprise sites, 37 inaccessible.

61.6%
Fully open to all AI crawlers
271 of 440 sites — no restrictions preventing AI search discovery
33.2%
Fully blocked to all AI crawlers
146 sites block all tested retrieval crawlers
5.2%
Partially blocked
23 sites block some crawlers but not all
91.7%
Broad blocks — not targeted at AI
155 of 169 blocked sites also block Googlebot — the block is a broad restriction, not an AI-specific decision
8.3%
Deliberate AI-only blocks
14 sites specifically blocked AI crawlers while keeping Googlebot accessible
50
False positives prevented
Sites with WordPress /wp-admin/ or Crawl-delay directives correctly classified as open — not AI blocks

The central finding: Most blocked businesses are not actively choosing to exclude AI search. They have broad access restrictions set years ago that are now inadvertently catching AI crawlers. This is a configuration problem, not a strategic decision.

Block origin

Intentional vs infrastructure-imposed

Of the 169 sites blocking AI retrieval crawlers, the source of the block was classified into three categories.

60.9%
Explicit — author-set
103 sites. Block is in the site's own robots.txt. May be intentional or legacy configuration.
33.1%
Indeterminate
56 Cloudflare-hosted sites without a managed-robots signature. Likely explicit blocks — cannot be confirmed by automated analysis alone.
5.9%
Infrastructure-imposed
10 sites. Block originates from Cloudflare's managed robots.txt feature — a platform default the owner may never have consciously set.

The infrastructure-imposed subset is the most commercially significant finding: these site owners may be blocking AI search discovery without ever having made that decision. The indeterminate category — 56 Cloudflare-hosted sites — most likely represents explicit blocks, but the configuration path cannot be confirmed by automated means alone.

Sector analysis

Block rates by industry

Block rates vary significantly across the 10 sectors. Accounting & Finance and Technology & SaaS are highest; Professional Services and Legal are lowest.

% blocking ≥1 retrieval crawler (of scannable domains per sector)
Accounting & Finance
52.6%
Technology & SaaS
45.8%
Real Estate
41.7%
Retail & Ecommerce
40.5%
Healthcare
39.6%
Education & Training
38.6%
Hospitality & Tourism
38.5%
Building & Trades
33.3%
Legal
32.6%
Professional Services
21.4%

Accounting & Finance at 52.6% is the highest-blocking sector — notable given AI search is reshaping how Australians find financial advice and compare products. Technology & SaaS at 45.8% is the study's most counterintuitive finding: the sector most aware of AI is among the most likely to be invisible to it.

Per-crawler analysis

Which crawlers are blocked most

Group A (retrieval/citation crawlers) drives the headline finding. Group B (training crawlers) is reported separately — blocking training crawlers is often a deliberate and legitimate content-protection decision.

Group A — Retrieval & Citation Crawlers
GPTBot OpenAI37.5%
ClaudeBot Anthropic37.3%
anthropic-ai Anthropic35.9%
Googlebot baseline35.2%
OAI-SearchBot OpenAI35.0%
ChatGPT-User OpenAI35.0%
PerplexityBot Perplexity34.8%
Group B — Training Crawlers (separate)
CCBot Common Crawl38.2%
Bytespider ByteDance38.0%
Applebot-Extended Apple37.7%
Amazonbot Amazon37.7%
Google-Extended Gemini training37.5%
meta-externalagent Meta37.3%
FacebookBot Meta36.1%

The Googlebot parity finding is the most important number in the dataset. Googlebot is blocked at 35.2% — nearly identical to the AI retrieval crawlers. This confirms that the vast majority of AI crawler blocks are broad restrictions, not targeted AI decisions. Businesses blocking AI crawlers are mostly also blocking Google.

Platform analysis

CMS correlation

Block rates differ significantly by content management system — reflecting both platform defaults and operator technical sophistication.

Block rate by detected CMS
Joomla
66.7% (n=3)
WordPress
38.6%
Shopify
36.4%
Drupal
6.2%
Webflow
0%

Drupal at 6.2% is the standout. Drupal is predominantly enterprise and government — organisations with active IT governance making deliberate robots.txt decisions. Webflow at 0% reflects a newer generation of sites by operators who are more AI-search aware. WordPress at 38.6% tracks the overall sample average — the broadest cross-section of Australian business websites.

Methodology

How this study was conducted

Study specification

Sample
500 publicly identifiable Australian business websites across 10 industry groups (50 per sector), sourced from named public directories. No client sites. No sites selected by outcome.
Sectors
Retail/Ecommerce, Real Estate, Legal, Healthcare, Building/Trades, Accounting/Finance, Hospitality/Tourism, Education/Training, Technology/SaaS, Professional Services
Measurement
Public robots.txt parsed per user-agent. Homepage meta robots and X-Robots-Tag headers examined. CMS and CDN/host detected from homepage signals.
Bot identity
A44-Research-Bot/1.0 — identified honestly in every request. Full info at authority44.ai/bot. robots.txt respected; polite rate limits applied.
Scan date
6 June 2026, 20:00 AEST. Point-in-time snapshot.
False positive prevention
WordPress /wp-admin/ disallows, Crawl-delay directives, and sitemap declarations explicitly excluded from blocked classification. Validated against 14 fixture tests before batch ran.
URL structure
Root-level domains only. Businesses whose primary AU presence is a sub-path of an international domain were replaced with root-domain equivalents. Some businesses that would otherwise qualify are not included.
Unscannable
60 domains (12%) excluded. 23 confirmed major brands protected by bot-detection. 37 inaccessible. Denominator for all findings: 440.
Limitations

Caveats

Disclosure & Intellectual Property

Douglas Lord created the Periodic Table of Digital Authority methodology and founded AUTHORITY44, the platform used to conduct this study. The sample was constructed from named public directories with no reference to commercial relationships. The methodology is fully documented and reproducible. This study publishes aggregate, anonymised findings only. No named individual site results are published.

Attribution chain: Douglas Lord (researcher, author) · Periodic Table of Digital Authority (methodology) · AUTHORITY44™ (instrument) · Digital Dominator Pty Ltd ABN 28 616 931 116 (operating entity).

Intellectual property notice: This study, its methodology, findings, data, and all associated content are the original work of Douglas Lord and the property of Digital Dominator Pty Ltd (ABN 28 616 931 116). The Periodic Table of Digital Authority™ is a coined framework and trade mark pending (TM 2644497). AUTHORITY44™ is a trade mark pending (TM 2643932). All rights reserved.

You may cite findings from this study with appropriate attribution identifying the author (Douglas Lord), the publisher (Periodic Table of Digital Authority — periodictableofdigitalauthority.com), and the measurement instrument (AUTHORITY44). You may not reproduce this study in full, present these findings as your own research, or use the framework name or trade marks without prior written consent. Use of this research is subject to the Terms of Use.