GA4

How to Identify Bot Traffic in GA4

You’ve probably seen it: all of the sudden, your time-series charts in GA are rendered useless by a single-day spike in traffic. If only it were the runaway success of your latest blog post. Fingers crossed, but it might also be a bot.

GA4 automatically excludes traffic from bots that properly identify themselves, like Google’s search crawler. But it’s not uncommon to see spam traffic coming through every now and again, and that non-human traffic can make it hard to see what’s actually happening with your website.

Identifying bot traffic

First, decide if a surge or spike in traffic really is a bot. To do this, you need to isolate the anomalous traffic in reporting and decide if it looks human or not. You are looking for a dimension or combination of dimensions that show unexpected/unreasonable growth and likely have a very low engagement rate. Though the latter is not always true – sometimes spam traffic will generate metrics that look very much like human behavior. If you have a high enough volume of ecommerce purchases or lead form submissions, a lack of these metrics might also help you identify non-human traffic, but if not you may have to make a call based on instinct. E.g., “do I really believe that half of my visitors come from Ashburn, Virginia?” 

Below are the reports and dimensions we tend to look at when we are doing this type of diagnosis.

  • Acquisition > Traffic acquisition – set your primary dimension to Session source / medium: look for a source you’ve never seen before that doesn’t make sense. Also take note if there’s been a big increase in ‘(direct) / (none)’ traffic. The latter is not useful for filtering traffic since you get plenty of real direct traffic, but bot traffic often has no attributable source.
  • User attributes > Demographic details – set your primary dimension to Country: look for countries outside of your market area – in particular, countries whose primary language is different from the language(s) of your site.
  • User attributes > Demographic details – set your primary dimension to City: look for relatively low-population cities with disproportionately high traffic volume.
  • Tech > Tech details – set your primary dimension to Browser*: this one is not very useful by itself, but often bot traffic will be identifiable by a specific combination of browser, OS and screen resolution. Note that Browser version is not available as a dimension in regular GA4 reports, but it is in Explorations. I have often found that bot traffic is identifiable by a big spike in a specific, slightly-out-of-date browser version.
  • Tech > Tech details – set your primary dimension to OS with version*: per the previous bullet, this dimension can be helpful in combination with others. You may also see high volumes of traffic from old OS versions that look suspicious, for example Windows 7 and 8.
  • Tech > Tech details – set your primary dimension to Screen resolution*: for some reason, bots often show up with a 800×600 screen resolution or other resolutions that were common 20 years ago.
  • Engagement > Landing page – it is most common for bots to request your home page, which is not a useful differentiator, but sometimes they make high-volume requests of a page that doesn’t even exist on your site, which is a dead giveaway. For example, a few years back, there was a bot that requested the page path /bottraffic.live on millions of websites.
  • IP address – this one is a bit tricky, since GA4 doesn’t report on nor store IP addresses for visitors. But it is possible to block traffic from specific IP addresses in GA4, and bot traffic often comes from a narrow range of IP addresses. Your hosting provider or CDN may provide reporting that includes the IP addresses of visitors or you may be able to access web server log files that include IP addresses.

*The dimensions with asterisks often work better when combined with other dimensions – for example, combining screen resolution with browser may reveal suspicious traffic in a way that the individual dimensions do not. You can add a secondary dimension to any of the above reports to report on two dimensions, but you’ll need to use Explorations or Looker Studio in order to analyze more than two dimensions at the same time.

Stick With It

Bot hunting definitely takes tenacity. The process of identifying the characteristics of a particular bot can be time-consuming. Here’s an example of the steps I took over several months to identify one particularly tricky GA4 bot.

Take Action

Once you’ve identified a bot’s characteristics, what do you do next?

  • If the bot is still hitting your website, you can set up a filter to prevent that traffic from being logged in the first place. Read our post How to filter bot traffic from GA4.
  • If you’re looking at past traffic, unfortunately there’s no way to remove that data entirely. So you need to remove it from reports as you’re viewing or creating them. Read our post How to remove bot traffic from GA4 reports, explorations, and Looker Studio.
  • And how can you keep on top of future bots? We like to use GA4’s Custom Insights to get notified as soon as possible about anomalous traffic. That’s what we depend on as part of the Managed Analytics service we deliver for clients, so they can worry less about their analytics infrastructure–and go back to using the data to make better marketing decisions.
Nico Brooks

Nico loves marketing analytics, running, and analytics about running. He's Two Octobers' Head of Analytics, and loves teaching. Learn more about Nico or read more blogs he has written.

Recent Posts

Analytics Roundup – Updates from October 2024

ChatGPT traffic in the GA house! Plus new features in GA4 and understanding GTM first-party…

19 hours ago

GA4 Path Analysis with BigQuery

This article details the process of building two BigQuery tables for path analysis, with a…

4 days ago

Five Great Plug-and-Play SEO Dashboards

Preview five great dashboards for SEO reporting and analysis, and find the one that works…

3 weeks ago