Article

How Much 'AI Bot' Traffic Is Fake? We Verified 3,392 Hits (2026 Data)

7 min readLumenGEO Research
fake AI botsspoofed GPTBotverify AI crawlersfirst-party databot impersonationChatGPT-Useroriginal researchAI bot traffic

A user agent is just a string of text the visitor chooses to send. Anyone can claim to be GPTBot or ChatGPT — it takes one line of code. So we stopped trusting the label and checked it. Across 3,392 AI-bot fetches to lumengeo.co over two weeks, we verified each claimed identity against the IP ranges the AI companies actually publish. At least 2% were provably spoofed — hits that named a checkable bot but came from outside that company's own network — including one host that wore four different AI companies' bot identities in a single day. This is first-party server-log data, with the method and the limits stated honestly below.

Most "AI bot traffic" reporting takes the user agent at face value. That is the one thing you cannot do. The user agent is self-reported by the visitor, and a scraper that wants to look trustworthy will simply put GPTBot or ChatGPT-User in that field. The only way to know whether a hit is real is to check where it came from against the source-IP ranges each AI company publishes for its crawlers. We ran that check on every AI-bot request we received. Here is what two weeks of verified-against-spoofed data looks like.

Last updated: June 2026

Data window June 2026

The 67 spoofed hits we found are a provable floor, not a ceiling. They are requests that claimed a checkable bot identity and failed the check. The true amount of fake traffic could be higher — some of it hidden in the 40% we cannot verify at all — but we only report what we can prove. "Unverifiable" is not "fake."

What we measured (read this first)

Every request to lumengeo.co from a known AI user agent is logged with the bot name it claims, the path it requested, the date, and the source IP. We then compare that source IP to the crawler IP ranges the major AI companies publish. Each hit lands in one of three buckets:

  • Verified — the hit claimed a checkable bot identity (an OpenAI or Perplexity crawler) and the source IP falls inside that company's published range. The claim checks out.
  • Failed — the hit claimed a checkable bot identity, but the source IP falls outside that company's published range. The claim is provably false. This is the impersonation signal.
  • Unverifiable — the hit claimed a bot we do not currently IP-check (for example Bingbot, ClaudeBot, Bytespider, or Meta's crawlers, which we did not machine-verify in this window). We cannot say whether these are real or fake, so we do not.

The window is 14 to 28 June 2026: roughly two weeks, one site, 3,392 logged AI-bot hits from 13 distinct bots. This is a small, single-site sample in the GEO and AI-search niche, and we only machine-checked OpenAI and Perplexity ranges. Treat the shape as directional, not as an internet-wide census. The full caveats are in the methodology section.

The headline: 2% provably fake, 40% unverifiable, 59% checks out

BucketShare of all hitsHitsWhat it means
Verified58.6%1,987Claimed a checkable bot and the IP matched
Failed (provably spoofed)2.0%67Claimed a checkable bot, IP came from outside its network
Unverifiable39.4%1,338A vendor we did not IP-check; status unknown

Two ways to read the spoof rate, both true. Against all 3,392 hits, the 67 failures are 2.0%. But the fairer denominator is the 2,054 hits we could actually check (verified plus failed) — and against those, the failure rate is 3.3%. Either way, the number is not zero, and that is the point: in a clean log of "AI bots," a measurable slice is forged.

About 1 in 50 of all AI-bot hits — and 1 in 30 of the hits we could verify — were provable impersonation. And 40% of the traffic could not be checked at all. Raw, unverified bot logs are not a safe basis for blocking, billing, or analytics decisions.

One IP, four AI companies: caught in the act

The single most damning evidence is timing. On 2026-06-19, individual IP addresses presented multiple different AI-company bot identities on the same day. There is no legitimate explanation for this. OpenAI's servers do not also identify themselves as Perplexity. A real crawler operates from one company's infrastructure and announces one bot. A host that announces four is a scraper rotating through trusted names to look harmless.

Source IPBot identities claimed (same day)Date
45.45.237.7ChatGPT-User (4), GPTBot (4), OAI-SearchBot (1), PerplexityBot (4)2026-06-19
64.50.191.33ChatGPT-User (5), GPTBot (5), OAI-SearchBot (3), PerplexityBot (3)2026-06-19
38.248.95.248ChatGPT-User (3), GPTBot (2), OAI-SearchBot (1), PerplexityBot (3)2026-06-19
142.188.3.52GPTBot (6)2026-06-19

The first three IPs each impersonated three OpenAI crawlers and Perplexity's crawler within hours. That is not a misconfiguration; it is a deliberate disguise.

A single host that identifies as four different AI vendors in one day is the clearest signature of bot impersonation we found. No legitimate crawler from OpenAI also identifies as PerplexityBot.

This was not a one-day event. We also saw recurring single-IP ChatGPT-User impostors across multiple days — from ranges like 158.46.x.x (June 17, 21, 26, and 27) and 91.108.x — plus one-off offenders (178.171.75.50, 185.89.110.56, 194.102.123.37, 77.83.51.156). The pattern is opportunistic and lower-volume: a steady trickle of requests wearing the most-trusted AI bot name in the log.

Which bot identity gets faked most

Not every bot name is equally abused. Here is the failure rate by claimed identity, counted only against the hits we could check for that bot:

Claimed identityFailed / checkableFail rate
GPTBot (OpenAI, training)17 / 15710.8%
PerplexityBot (Perplexity, retrieval)10 / 1347.5%
OAI-SearchBot (OpenAI, retrieval)9 / 3292.7%
ChatGPT-User (OpenAI, agent)31 / 1,3932.2%
Perplexity-User (Perplexity, agent)0 / —0%

GPTBot is the most-faked identity by rate: more than one in ten of the requests claiming to be OpenAI's training crawler came from outside OpenAI's network. That is a notable inversion — GPTBot is the bot many sites block, so impersonating it is a strange choice unless the goal is to slip a scraper past filters that whitelist it, or simply to launder a scraper's identity behind a recognizable name. ChatGPT-User produced the most failures in raw count (31) because it is by far the highest-volume bot, but its rate is low. Perplexity-User had no failures at all.

GPTBot was impersonated most by rate — 10.8% of requests claiming to be it failed verification. The bot name a scraper borrows is a choice, and trusted names get borrowed.

What this means for your bot logs and your decisions

Real teams make real decisions on raw bot logs every week. They block "AI scrapers" by user agent. They celebrate that "ChatGPT is crawling us." They feed bot hits into analytics and, in some products, into billing. Every one of those decisions assumes the user agent is honest.

It is not reliably honest. If roughly 1 in 30 of your checkable AI-bot hits is forged, and a further 40% can't be checked at all, then an unverified log is the wrong instrument for any of those calls:

  • Blocking by user agent alone can block nothing real (the spoofer just changes the string) while giving you false confidence that you "stopped the scrapers."
  • Counting "AI crawled us" as a citation signal overstates your reach when some of those fetches were never the AI company at all.
  • Acting on the unverifiable middle in either direction — treating it as all-real or all-fake — is a guess. It is genuinely unknown until you check it.
Free GEO audit

Real bots, real citations — see yours

Past the spoofing, what matters is whether real AI engines cite you. Your free audit shows where ChatGPT, Perplexity, and Google AI do and don't.

Run my free GEO audit

How to tell real from fake

The check is simple in principle: do not trust the user agent, trust the source. For the bots that support it, that means confirming the request's IP belongs to the company it claims, using either the published IP ranges or a reverse-then-forward DNS lookup. A request claiming to be GPTBot from an IP that does not resolve to OpenAI's infrastructure is, by definition, not GPTBot.

We walk through the exact verification method — published ranges, reverse-DNS, and the per-bot specifics — in how to verify AI crawlers. To check your own traffic against a known-bot reference without writing code, use the AI crawler check tool. For the full roster of bots and what each one does, see the AI crawler list for 2026.

Methodology and limitations

The data is from lumengeo.co server logs, 14 to 28 June 2026: 3,392 requests from known AI user agents across 13 distinct bots. Each hit's source IP was compared to published crawler ranges. "Verified" means the claimed bot is one we IP-check and the IP matched; "failed" means the claimed bot is one we IP-check and the IP did not match; "unverifiable" means we do not currently run an IP check for that vendor.

The honest limits:

  • One site, one niche. This is a single GEO-focused site, so the bot mix likely skews toward AI-curious crawling. Your spoof rate will differ.
  • Two weeks is short. The smoking-gun cluster fell on one day; a longer window would refine every percentage here.
  • We only machine-checked OpenAI and Perplexity ranges. That is why 39.4% is unverifiable. Those hits — Bingbot, ClaudeBot, Bytespider, Meta, and others — include genuine, well-behaved bots. Do not read "unverifiable" as "fake." It means exactly what it says: unknown.
  • 67 is the floor, not the total. We only count a failure when a hit claimed something we could check and failed. The real volume of impersonation could be higher (some hiding in the unverifiable bucket) and is certainly not lower. We report only the provable number.

For how legitimate crawls turn into actual AI citations, see our 12-day AI crawler traffic study, and for deciding which bots to allow or block, robots.txt for AI crawlers. We will refresh this study as the log grows.

FAQ

How much AI bot traffic is fake?

In our two-week first-party log of 3,392 AI-bot hits, 2.0% were provably spoofed — they claimed a checkable bot identity but came from an IP outside that company's published range. Measured only against the 2,054 hits we could actually verify, the spoof rate is 3.3%. A further 39.4% came from vendors we did not IP-check, so their status is unknown. The honest summary: at least 1 in 50 of all AI-bot hits was forged, and that is a floor, not a ceiling.

Can you spoof GPTBot or ChatGPT's crawler?

Yes, trivially. The user agent is a self-reported text string the visitor controls, so any request can claim to be GPTBot or ChatGPT-User with a single line of code. We saw this directly: 10.8% of requests claiming to be GPTBot, and 2.2% claiming to be ChatGPT-User, came from outside OpenAI's published IP ranges. The only defense is to verify the source IP rather than trusting the label.

Is "unverifiable" AI bot traffic fake?

No, and conflating the two is the most common mistake. "Unverifiable" in our data means the hit claimed a bot we do not currently IP-check (such as Bingbot or ClaudeBot), so we cannot confirm or deny it. Many of those are genuine, well-behaved crawlers. We separate provably fake (67 hits that failed a check) from simply unchecked (1,338 hits) on purpose, and we only call something fake when we can prove it.

Which AI bot identity gets impersonated most?

By failure rate, GPTBot — 17 of 157 checkable requests (10.8%) failed verification. PerplexityBot was next at 7.5%, then OAI-SearchBot at 2.7% and ChatGPT-User at 2.2%. ChatGPT-User produced the most failures in raw count (31) only because it is the highest-volume bot. Perplexity-User had zero failures in our window.

How do I verify that an AI bot is real?

Don't trust the user agent — check the source IP. Confirm that the request's IP belongs to the company it claims, using each vendor's published crawler IP ranges or a reverse-then-forward DNS lookup. A request claiming to be a given bot from an IP that doesn't resolve to that company's infrastructure is not that bot. Our how to verify AI crawlers guide has the step-by-step method, and the AI crawler check tool lets you test your own traffic.