Verifying Googlebot with Reverse DNS Lookup
Any client can set its User-Agent header to Googlebot, so the user-agent string alone proves nothing. To trust that a request in your access log really came from Google's crawler — before you grant it crawl privileges, exclude it from rate limits, or attribute crawl budget to it — you must run a forward-confirmed reverse DNS (FCrDNS) check on the source IP. This is the exact verification method Google documents, and it is the only check that does not rely on a list of IP ranges that drift over time.
This guide takes a claimed-Googlebot IP straight from your logs, confirms its PTR record ends in googlebot.com or google.com, then forward-resolves that hostname back to the same IP. You will build a single-IP verifier, then a batch script that extracts every unique self-declared Googlebot IP from an access log and verifies them with a local cache. Once you can confirm the real crawler, the inverse problem — detecting fake Googlebot traffic — becomes a one-line filter.
The Symptom: Unverifiable Crawler Claims
You are auditing crawl activity as part of identifying search engine bots in server logs and notice that the volume of "Googlebot" requests is far higher than Search Console's crawl stats report. The first job is to confirm which of those requests are genuine.
Pull the IPs claiming to be Googlebot. Using the standard combined-log fields documented in awk and grep commands for log filtering, isolate the source IP and request line for matching rows:
grep -i "googlebot" access.log | awk '{print $1, $9, $7}' | sort | uniq -c | sort -nr | head
Expected Output:
14213 66.249.66.1 200 /products/widgets/
9881 66.249.66.135 200 /blog/
4402 66.249.66.74 200 /category/tools/
870 45.143.200.18 200 /wp-login.php
231 185.220.101.7 304 /sitemap.xml
The top three are in Google's 66.249.66.0/24 block and look plausible; the bottom two (45.143.200.18, 185.220.101.7) are not, yet both sent the Googlebot user-agent. The user-agent string cannot tell them apart. Only DNS can.
Concept: Why Forward-Confirmed Reverse DNS Works
A reverse DNS (PTR) lookup asks "what hostname owns this IP?" Google controls the PTR records for its crawler IPs, so a genuine Googlebot IP reverse-resolves to a hostname under crawl-*.googlebot.com (or *.google.com / *.googleusercontent.com for some Google products). An attacker who merely spoofs the user-agent does not control Google's reverse DNS zone and cannot make their IP reverse-resolve to googlebot.com.
But a PTR record alone is not enough — PTR records can be set by whoever controls the IP's reverse zone, and a lookup could be poisoned. So you add the forward-confirmed half: take the hostname the PTR returned and resolve it forward (A/AAAA). If the forward lookup returns the original IP, the chain is closed and the IP is trusted. Spoofers fail the PTR step; misconfigured or poisoned records fail the forward step. Passing both is FCrDNS.
Step-by-Step: Build the Verifier
Step 1: Reverse-resolve a single IP.
Take one of the suspect IPs and ask for its PTR record with dig -x.
dig +short -x 66.249.66.1
Expected Output:
crawl-66-249-66-1.googlebot.com.
Now run the same command against the suspicious IP from the log:
dig +short -x 45.143.200.18
Expected Output:
(empty — no PTR, or an unrelated hosting hostname like 18.200.143.45.example-vps.net.)
The genuine IP returns a googlebot.com hostname; the spoofer returns nothing or an unrelated host. An empty or non-Google PTR is an immediate fail.
Step 2: Confirm the PTR hostname's suffix.
Never match googlebot as a substring anywhere in the hostname — an attacker can register googlebot.com.evil.example. Anchor the check to the end of the hostname. Strip the trailing dot and test the suffix.
host=$(dig +short -x 66.249.66.1 | sed 's/\.$//')
case "$host" in
*.googlebot.com|*.google.com|*.googleusercontent.com) echo "suffix OK: $host" ;;
*) echo "suffix FAIL: $host" ;;
esac
Expected Output:
suffix OK: crawl-66-249-66-1.googlebot.com
Step 3: Forward-resolve the hostname back to the IP.
Resolve the PTR hostname's A record and check it equals the IP you started with.
dig +short crawl-66-249-66-1.googlebot.com
Expected Output:
66.249.66.1
The forward lookup returns the original IP. The chain is closed: this IP is the real Googlebot.
Step 4: Wrap the three checks into a verify function.
Combine the PTR lookup, suffix match, and forward confirmation into one reusable script. It exits 0 (verified), 1 (failed), or 2 (DNS error/timeout) so it composes cleanly in pipelines.
#!/usr/bin/env bash
# verify-bot.sh <ip> — forward-confirmed reverse DNS for Googlebot
set -u
ip="$1"
ptr=$(dig +short +time=3 +tries=2 -x "$ip" 2>/dev/null | head -1 | sed 's/\.$//')
[ -z "$ptr" ] && { echo "$ip FAIL no-ptr"; exit 1; }
case "$ptr" in
*.googlebot.com|*.google.com|*.googleusercontent.com) : ;;
*) echo "$ip FAIL bad-suffix $ptr"; exit 1 ;;
esac
fwd=$(dig +short +time=3 +tries=2 "$ptr" 2>/dev/null)
if printf '%s\n' "$fwd" | grep -qxF "$ip"; then
echo "$ip VERIFIED $ptr"; exit 0
else
echo "$ip FAIL no-forward-match $ptr -> ${fwd:-none}"; exit 1
fi
Run it against both IPs:
chmod +x verify-bot.sh
./verify-bot.sh 66.249.66.1
./verify-bot.sh 45.143.200.18
Expected Output:
66.249.66.1 VERIFIED crawl-66-249-66-1.googlebot.com
45.143.200.18 FAIL no-ptr
Step 5: Batch-verify every unique bot IP from the log, with caching.
Reverse DNS over thousands of duplicate IPs is slow and hammers your resolver. Extract the unique claimed-Googlebot IPs first, then verify each once and memoize the result to a cache file so re-runs are instant.
#!/usr/bin/env bash
# batch-verify.sh <access.log> — verify all self-declared Googlebot IPs
set -u
log="$1"
cache="${HOME}/.cache/bot-fcrdns.tsv"
mkdir -p "$(dirname "$cache")"; touch "$cache"
grep -i "googlebot" "$log" | awk '{print $1}' | sort -u | while read -r ip; do
cached=$(awk -F'\t' -v ip="$ip" '$1==ip {print $2}' "$cache")
if [ -n "$cached" ]; then
echo -e "$ip\t$cached\t(cached)"
continue
fi
if ./verify-bot.sh "$ip" >/dev/null 2>&1; then
result="VERIFIED"
else
result="FAKE"
fi
printf '%s\t%s\n' "$ip" "$result" >> "$cache"
echo -e "$ip\t$result"
done
./batch-verify.sh access.log
Expected Output:
185.220.101.7 FAKE
45.143.200.18 FAKE
66.249.66.1 VERIFIED
66.249.66.135 VERIFIED
66.249.66.74 VERIFIED
You now have a definitive per-IP verdict. Feed the VERIFIED set into your crawl-budget accounting and route the FAKE set into rate-limiting, covered in detecting fake Googlebot traffic.
Production Warning: Reverse DNS adds a network round-trip per IP. Run batch verification against an archived log offline, never inline in a request hot path. If you must verify live (for example in a WAF rule), cache verdicts with a TTL of a few hours and fail open for verified-crawler ranges so a resolver outage never blocks the real Googlebot.
Edge Cases
IPv6 Googlebot. Google increasingly crawls from IPv6 (2001:4860:4801::/48). dig -x handles IPv6 PTRs natively, but make sure your forward-confirmation compares the expanded address consistently. Normalize before comparing:
dig +short -x 2001:4860:4801:10::1 # -> crawl-*-...googlebot.com.
If your log stores compressed IPv6 (2001:4860:4801:10::1) but dig returns a differently-formatted forward record, compare by canonical form (e.g. pipe both through sipcalc or Python's ipaddress) rather than string equality.
Other Google properties. AdsBot, Google-InspectionTool, and the Google Read Aloud / APIs-Google fetchers may resolve to google.com or googleusercontent.com rather than googlebot.com. The suffix list in Step 4 already covers these. Special-purpose user-triggered fetchers (e.g. Google Site Verifier) use published IP ranges instead of reverse DNS; verify those against Google's official googlebot.json / special-crawlers.json IP lists rather than FCrDNS.
Verification
Confirm your verified set holds up by spot-checking a verified IP end to end in one command — PTR, suffix, and forward match in a single readable line:
ip=66.249.66.1; ptr=$(dig +short -x $ip | sed 's/\.$//'); \
echo "$ip -> $ptr -> $(dig +short $ptr)"
Expected Output:
66.249.66.1 -> crawl-66-249-66-1.googlebot.com -> 66.249.66.1
The IP you started with is the IP you end with, through a googlebot.com hostname. That round-trip is the proof. Re-run batch-verify.sh and confirm the second run prints (cached) for every IP — instant, no DNS traffic.
Common Mistakes
- Substring-matching the hostname. Checking
*googlebot*instead of an anchored*.googlebot.comsuffix lets an attacker pass with a hostname likegooglebot.com.attacker.net. Always anchor to the end of the FQDN with a leading dot. - Skipping forward confirmation. A PTR record can be set by whoever controls the IP's reverse zone. Trusting the PTR alone, without resolving the hostname back to the same IP, defeats the entire point. The forward step is mandatory.
- Verifying by hard-coded IP range. Matching against
66.249.0.0/16looks convenient but Google's ranges change without notice and you will eventually block real crawlers or trust stale ranges. FCrDNS is the future-proof, Google-sanctioned method; IP-range matching is only a fast pre-filter.
Frequently Asked Questions
Is reverse DNS verification the method Google officially recommends?
Yes. Google documents forward-confirmed reverse DNS as the way to verify Googlebot: run a reverse lookup on the IP, confirm the hostname ends in googlebot.com or google.com, then run a forward lookup on that hostname and confirm it returns the original IP. Google also publishes JSON IP-range files for cases where DNS verification is impractical, but FCrDNS is the canonical check.
How do I avoid overloading my DNS resolver when verifying thousands of IPs?
Deduplicate first. A busy log may contain millions of Googlebot rows but only a few hundred unique IPs. Extract unique IPs with awk '{print $1}' | sort -u, verify each once, and memoize verdicts to a cache file keyed by IP as shown in Step 5. Cached re-runs touch the network zero times.
Can a fake bot ever pass forward-confirmed reverse DNS?
Not without controlling Google's DNS zones, which it cannot. The only realistic false positives come from misconfiguration on your side — substring matching, skipping forward confirmation, or comparing IPv6 addresses in inconsistent formats. Get the three checks right and FCrDNS is effectively unspoofable.
Related Guides
- Detecting Fake Googlebot Traffic in Access Logs — apply this verifier in reverse to quantify and block spoofers.
- awk and grep Commands for Log Filtering — extract the IPs and fields the verifier consumes.
- CLI One-Liners for Quick Audits — fast triage commands that pair with reverse-DNS checks.
Part of the Identifying Search Engine Bots in Server Logs series.