There have been several recent instances of spikes in spam traffic that have been reported by a large number of websites. One that appears to be affecting a lot of sites is a jump in traffic coming from Warsaw, Poland, mostly from a small group of referring domains.
GA4 automatically excludes traffic from bots and spiders that identify themselves as such, but does not do a good job of filtering out spam traffic that is not self-identifying. In most cases, these bots are not trying to do any harm to your site, apart from adding unwanted load to your web server. But they are polluting your Google Analytics metrics, making it hard to separate the signal from the noise.
So, what can you do about it?
First, you’ll need to decide if a surge or spike in traffic really is a bot, by isolating the strange-looking traffic. Read our blog post How to identify bot traffic in GA4.
In regular reporting, your only option is to add an exclude filter each time you want to view a report with spam traffic removed. Unfortunately, you can’t save filters, so you have to recreate the filter every time you view a report. It is possible to customize a report and save it with a filter applied, but if you do that you won’t be able to apply additional filters to the report.
To the right is an example of a filter that removes the spam traffic from the report screenshotted above. This example excludes traffic that we know to be spam, but sometimes there is no way to exclude spam traffic without also excluding a little bit of genuine traffic. For example, we recently dealt with a bot with the following characteristics:
The latter two are obviously very common for human visitors, and there are also circumstances where GA can’t identify the location of a real visitor, but because of the volume of bot traffic, it was worthwhile to apply the filter, even though some real humans would be excluded.
In Explorations, you can create a segment that excludes the dimensions associated with spam traffic and apply it to multiple report tabs in the same Exploration.
And in Looker Studio, you can create a filter and apply it at the chart, page or dashboard level. The latter is typically the best option, so you don’t have to remember to apply it each time.
Each of these methods removes bot traffic from reporting, but you may be wondering, “what if I want to prevent it from showing up in GA4 in the first place?”
The only built-in mechanism for doing this is to filter traffic based on IP address(es). It is also possible to prevent GA4 tags from firing in Google Tag Manager (GTM) based on certain traffic attributes. Or you set a value for the traffic_type parameter in GTM that flags it as spam, then filter it in GA4. Setting up the latter approaches requires proficiency with GTM and possibly some custom JavaScript, depending on the combination of dimensions you are filtering on. I describe this approach in more detail in this post: How to Filter Bot Traffic from GA4.
The actual feature in GA4 is called an “Internal traffic filter”, but it should have been named “IP address filter”, since it is filtering by IP address, internal or otherwise. Google’s documentation describes this pretty well, so I won’t walk through each step here, but a couple of words of advice:
Thanks for reading, and good luck bot hunting! It’s a skill we all may need to lean on a bit more in the future 😉
ChatGPT traffic in the GA house! Plus new features in GA4 and understanding GTM first-party…
This article details the process of building two BigQuery tables for path analysis, with a…
Preview five great dashboards for SEO reporting and analysis, and find the one that works…