Filter out spam spider traffic on Google Analytics

By 28 December 2014 SEO 3 Comments
filter spam traffic from google analytics

Weird traffic in Analytics? Keep reading…

You may have noticed lately some random traffic peaks on your Google Analytics graphics. Well, most cultures would read this as good news but unfortunately we need to give very little trust to easy wins; the easier and aparently random win a registered fact (metric) is, the less confidence we have to give it.

In this case I am developing today main topic will be spammy affiliate fake traffic, with the specific example from semalt.com but applies to a fistful other cases already out there and probably increasing over time, naming a few examples:

  • lumb.co/co.lumb
  • darodar.com
  • econom.co

How to detect this spam traffic?spam affiliates google analytics - 001

Start your Google Analytics session and choose the Property you want to inspect.

Then go to “All Traffic” in “Acquisition” section (see illustration to the right, it is in Spanish but quite obvious).

Depending on the case you may spot immediately those spam traffic sources or mixed among main traffic sources.

Most of the times this spam traffic we are dealing with today comes in the referrals source. This sample case I am using today has little overall traffic so spam sources catch your eyes rapidly:

referrer spam traffic in Google Analytics

referrer spam traffic in Google Analytics

Spammers behavior can badly contaminate your web analytics.

There is an easy way to mitigate this spammy traffic and the harm it brings to your data health and reliability, this consists in filtering those visits with a minor edition on the main htacess file of the website you wish to keep clean.

If you are familiar  to htaccess you may skip this section and jump straight to the addition details.

Htaccess file

Htaccess file has rules a webserver will observe, it is usually helpful for redirections or SEO friendly rewriting of non semantic/readable URLs. You can access it using any FTP client  (Filezilla and others) and the right permissions or usually you may get to it using a remote file manager provided by most hosting providers’ control panel.

If this htaccess file does not exist at the moment of inspection, you can create it in your own computer with an average notepad like Windows notepad or plain text tools like Ultraedit or Notepad++ and after editing it with the info from next section (“Filter spam traffic with Htaccess”) you can upload it to the root folder of your site and change its name and permissions, name must be literally .htaccess and permissions should be restrictive, at least 644. The “period” before the name makes this file “invisible” once on the remote folder, unless you configure your FTP client to show such hidden files.

Configure Filezilla: show htaccess and other hidden files

Configure Filezilla: show htaccess and other hidden files

In order to assign restrictive permissions (for web security reasons) to the htacce file you need to do the following: right click on it and…

FTP give htaccess restrictive permission 644

FTP give htaccess restrictive permission 644

assign restrictive permission 644 to htaccess file with Filezilla

assign restrictive permission 644 to htaccess file with Filezilla

 

Filter spam traffic with Htaccess

Now these are the data you have to add to your htaccess for filtering spam robots traffic. It is recommended to follow this tactics because these spammers will ignore any robots.txt rules and their opt-out policies are anything but trustworthy.

You may reuse the first line to add new foe referrers:

SetEnvIfNoCase Referer semalt.com spambot=yes
Order allow,deny
Allow from all
Deny from env=spambot

Filter spam traffic straight from Google Analytics

There is an even easier alternative but it relies on spambot managers fair-play, unlikely… But then this method is perfectly complementary to the previous htaccess stuff.

Google uses spider inventory and listing from IAB to feed the clockworks behind the checkbox you can find in Analytics for filtering out any known spiders or bots; that’s why they are known spiders or bots – Wether you did not know about the existance of this checkbox, you will find it as explained:

  • Start Google Analytics session
  • Go to Admin section (assuming you have the right access permissions)
  • Choose  the Account and Property to autofilter
  • Choose View to receive the automated known spiders and bot filter
  • Open [Configuration] in this chosen View and look for the suitable field (check) about the end of this config page
Configure your Properties in Google Analytics for robots and spiders autofilter.

Configure your Properties in Google Analytics for robots and spiders autofilter.

Activate autofilter for known spiders and robots

Activate autofilter for known spiders and robots

Update for both filtering methods post-update

After some days observing Analytics I spotted an indeed surprising behavior, seems that both solutions have their own strong points but pay attention:

Implementing both solutions at once seems to invalidate their benefits and allows ghost traffic to reach your Google Analytics data

Here I am going to show empirical data collected from 2 twin Views from one of my web-labs, you will see that within same time range and specifically talking about Referrer traffic, the “htaccess only” version does the job and filters out those undesirable spiders semalt, webbuttons, darodar, iloveitaly and others (as specified in htaccess edit, see above):

Test filtrado trafico fantasma semalt - Recogiendo tráfico spam

Test ghost traffic filtering for semalt – This View picks ghost hits and thus shows data distorsion

Test filtrado trafico fantasma semalt

Test ghost traffic filtering for semalt – No spam nor ghost traffic

 

3 Comments

Leave a Reply

Your email address will not be published.

¡Pon tu web a trabajar ya! Contacta hoy sin compromiso