There have been several recent instances of spikes in spam traffic that have been reported by a large number of websites. One that appears to be affecting a lot of sites is a jump in traffic coming from Warsaw, Poland, mostly from a small group of referring domains.
GA4 automatically excludes traffic from bots and spiders that identify themselves as such, but does not do a good job of filtering out spam traffic that is not self-identifying. In most cases, these bots are not trying to do any harm to your site, apart from adding unwanted load to your web server. But they are polluting your Google Analytics metrics, making it hard to separate the signal from the noise.
So, what can you do about it?
First, you’ll need to decide if a surge or spike in traffic really is a bot, by isolating the strange-looking traffic. Read our blog post How to identify bot traffic in GA4.
Removing bot traffic from GA4 reports
In regular reporting, your only option is to add an exclude filter each time you want to view a report with spam traffic removed. Unfortunately, you can’t save filters, so you have to recreate the filter every time you view a report. It is possible to customize a report and save it with a filter applied, but if you do that you won’t be able to apply additional filters to the report.
To the right is an example of a filter that removes the spam traffic from the report screenshotted above. This example excludes traffic that we know to be spam, but sometimes there is no way to exclude spam traffic without also excluding a little bit of genuine traffic. For example, we recently dealt with a bot with the following characteristics:
- Country = (not set)
- Browser = Chrome
- OS = Windows 10
The latter two are obviously very common for human visitors, and there are also circumstances where GA can’t identify the location of a real visitor, but because of the volume of bot traffic, it was worthwhile to apply the filter, even though some real humans would be excluded.
In Explorations, you can create a segment that excludes the dimensions associated with spam traffic and apply it to multiple report tabs in the same Exploration.
And in Looker Studio, you can create a filter and apply it at the chart, page or dashboard level. The latter is typically the best option, so you don’t have to remember to apply it each time.
Each of these methods removes bot traffic from reporting, but you may be wondering, “what if I want to prevent it from showing up in GA4 in the first place?”
The only built-in mechanism for doing this is to filter traffic based on IP address(es). It is also possible to prevent GA4 tags from firing in Google Tag Manager (GTM) based on certain traffic attributes. Or you set a value for the traffic_type parameter in GTM that flags it as spam, then filter it in GA4. Setting up the latter approaches requires proficiency with GTM and possibly some custom JavaScript, depending on the combination of dimensions you are filtering on. I describe this approach in more detail in this post: How to Filter Bot Traffic from GA4.
Creating an IP-based traffic filter
The actual feature in GA4 is called an “Internal traffic filter”, but it should have been named “IP address filter”, since it is filtering by IP address, internal or otherwise. Google’s documentation describes this pretty well, so I won’t walk through each step here, but a couple of words of advice:
- Use an IP address range instead of a single IP address. It is fairly common for a router to be configured with a block of addresses and assign them dynamically to nodes in its network. So, a web server that has the address 192.0.2.34 one day might have 192.0.2.45 the next. I typically block a range of 256 addresses, which in this case would be done by specifying 192.0.2.0/24. (An explanation of how this notation works.)
This does run the risk of blocking non-spam traffic from other addresses in the block, but that risk is fairly small. - Use a meaningful name for traffic_type – this will help you out a lot if you add more filters down the road. The value defaults to “internal”, but there is no reason not to give it a more descriptive name, for example “grets_bot”.
- Don’t forget to create a data filter after you’ve defined the “internal” traffic. This is described in Step 2 in Google’s documentation, but I find it counterintuitive that the setup takes place in two different places in the GA4 admin UI and missed this step the first few times I set one up.
Thanks for reading, and good luck bot hunting! It’s a skill we all may need to lean on a bit more in the future 😉