Weird traffic in Analytics? Keep reading…
You may have noticed lately some random traffic peaks on your Google Analytics graphics. Well, most cultures would read this as good news but unfortunately we need to give very little trust to easy wins; the easier and aparently random win a registered fact (metric) is, the less confidence we have to give it.
In this case I am developing today main topic will be spammy affiliate fake traffic, with the specific example from semalt.com but applies to a fistful other cases already out there and probably increasing over time, naming a few examples:
How to detect this spam traffic?
Start your Google Analytics session and choose the Property you want to inspect.
Then go to “All Traffic” in “Acquisition” section (see illustration to the right, it is in Spanish but quite obvious).
Depending on the case you may spot immediately those spam traffic sources or mixed among main traffic sources.
Most of the times this spam traffic we are dealing with today comes in the referrals source. This sample case I am using today has little overall traffic so spam sources catch your eyes rapidly:
Spammers behavior can badly contaminate your web analytics.
There is an easy way to mitigate this spammy traffic and the harm it brings to your data health and reliability, this consists in filtering those visits with a minor edition on the main htacess file of the website you wish to keep clean.
If you are familiar to htaccess you may skip this section and jump straight to the addition details.
Htaccess file has rules a webserver will observe, it is usually helpful for redirections or SEO friendly rewriting of non semantic/readable URLs. You can access it using any FTP client (Filezilla and others) and the right permissions or usually you may get to it using a remote file manager provided by most hosting providers’ control panel.
If this htaccess file does not exist at the moment of inspection, you can create it in your own computer with an average notepad like Windows notepad or plain text tools like Ultraedit or Notepad++ and after editing it with the info from next section (“Filter spam traffic with Htaccess”) you can upload it to the root folder of your site and change its name and permissions, name must be literally .htaccess and permissions should be restrictive, at least 644. The “period” before the name makes this file “invisible” once on the remote folder, unless you configure your FTP client to show such hidden files.
In order to assign restrictive permissions (for web security reasons) to the htacce file you need to do the following: right click on it and…
Filter spam traffic with Htaccess
Now these are the data you have to add to your htaccess for filtering spam robots traffic. It is recommended to follow this tactics because these spammers will ignore any robots.txt rules and their opt-out policies are anything but trustworthy.
You may reuse the first line to add new foe referrers:
SetEnvIfNoCase Referer semalt.com spambot=yes
Allow from all
Deny from env=spambot
Filter spam traffic straight from Google Analytics
There is an even easier alternative but it relies on spambot managers fair-play, unlikely… But then this method is perfectly complementary to the previous htaccess stuff.
Google uses spider inventory and listing from IAB to feed the clockworks behind the checkbox you can find in Analytics for filtering out any known spiders or bots; that’s why they are known spiders or bots – Wether you did not know about the existance of this checkbox, you will find it as explained:
- Start Google Analytics session
- Go to Admin section (assuming you have the right access permissions)
- Choose the Account and Property to autofilter
- Choose View to receive the automated known spiders and bot filter
- Open [Configuration] in this chosen View and look for the suitable field (check) about the end of this config page
Update for both filtering methods
After some days observing Analytics I spotted an indeed surprising behavior, seems that both solutions have their own strong points but pay attention:
Implementing both solutions at once seems to invalidate their benefits and allows ghost traffic to reach your Google Analytics data
Here I am going to show empirical data collected from 2 twin Views from one of my web-labs, you will see that within same time range and specifically talking about Referrer traffic, the “htaccess only” version does the job and filters out those undesirable spiders semalt, webbuttons, darodar, iloveitaly and others (as specified in htaccess edit, see above):