I’ve been sick of analytic spam for some time. I first noticed in in April of 2014. At the time, but we had lots of regular traffic coming to our main site so I really didn’t look much into it. Later I noticed a large number of referrals and all were showing as a bounce. I finally started to take notice, but took no action. Recently though I’ve been doing some major research for a new project I’m working on and came across some new information on how to fight them.
As anyone with an analytics account will have noticed, there is now much more than 1 of these analytic spam bots visiting our sites. My business partner just kept telling me “build some filters in analytics”. It was the lazy way out, but he’s busy, so I didn’t press him too hard. I did build up a bunch of filters but I noticed they were still showing up even with the filters I’d made (not all of them, but some still do). I wanted a solution that would get rid of them and I was not wanting to jack up any of my sites by messing around with .htaccess. I decided to test it on some sites that never left the development phase but still had analytics on them.
Not too long ago I started really digging into learning the in’s and out’s of Google Analytics. I found something I’d never read before while doing some searching on analytic spam. I discovered Google Tag Manager (GTM). Yes, I realize I’m a bit behind the times, but one thing came up that was a way to protect my analytics ID from the spammers. What I read was that the spam bots come by, scrape your analytics ID from your site and they never have to visit your site again. They can do what is called “ghost spamming”. This is where they ping the analytics servers with your info and it registers a hit on your site. Because of this you can have your .htaccess file set up to block all traffic, and you’ll still be showing visits to your site. I’ll talk more about this in a bit.
What I found out is that by using the GTM container it “hides” your analytics ID from the spam bots. This got me excited, and I wanted to try it out. I switched over one of my sites to GTM and loaded up the analytics through it. The next day I still had spam. I remembered that they scrape my ID and can ghost spam. OK so I deleted the analytics ID I had for a site that never made it out of development. The data was useless anyways, but if you want to keep yours, you can just create a new ID for the site and keep both running if you wanted.
In addition to setting up GTM with a new analytics ID I also added an updated .htaccess file to block 22 analytic spam bots I had visiting my sites. This way no bots could access my site. Nearly an hour later I had my first spam under the new ID. I was beyond pissed. I set up another analytic ID for a website that could not possibly exist, one made just to test an idea out.
I knew at the very least it would be about an hour before I had anything to see so I started looking at other things I had. My wife has a domain that she recently renewed after she accidentally let it expire. This domain has analytic data for the time she had it expired showing about 1500 spam hits over 3 months. This is pretty much all I needed to see to know that ghost spamming was real. Below is an image of the traffic she had while her domain was expired.
I wake up today and check my analytics on my new ID that I set up, but did not put on a site. Spam. 17 different hits at noon, and 38 nearly 6 hours after. 32 are from floating-share-buttons.com. The only good news is right now I only have 4 total bots trashing my analytic data and not the 22 I’ve had to edit my .htaccess file to block. The problem is they’re still trashing my data.
I looked at the analytics dashboard next. The ID I’m given is a 8 digit number. My older sites start 37, 47, 49, and the new ones that I just set up last night start with 64. So right now based on what I’m seeing is they’re just assigning a new unique number that is likely 1 higher than whatever they have in their system. That means there are around 65 million analytic ID’s out there. I run a crawler of my own, and it is on a pretty crappy VPS and it can still crawl over 1 million url’s a day.
What I think is happening is that these spam bots don’t even need to hit up our sites. They just cycle through the analytics ID numbers and tell Google we had traffic. If my crawler on a crappy VPS can crawl 1 million a day, I’m sure that they could with a few servers that were much better run through 65 million numbers pinging Google’s servers and recording fake traffic. I’m sure with the money they’re making they could cycle through them +30 times a day like the floating buttons spammers.
I’ve tried everything I can find to beat these guys, and because of a flaw with how Google has their analytics set up, and with how they’re able to let people fake data there just is no way to beat them right now. I have done a number of searches and I’ve tried to find anything from Google on what they’re doing about it, and I don’t see anything. I see lots of people trying to work on a fix, but I’m pretty sure the only fix at this point is to switch to a different analytics provider. If these spammers are making anywhere near the money I assume they are making they’re likely looking for flaws in other analytic software they can exploit so they can make even more money with different analytic providers if people start jumping ship.
I hit up @googleanalytics a few hours ago on twitter and I reminded them:
At this point I think all we can do is bug them about it until they respond to it and tell us they’re going to fix it. We know that having a horrible bounce rate can hurt our sites rankings in the search results. We have no idea if these fake bounces are having the same effect. We need a fix for this. Please email or tweet them and use #analyticspam