Welcome Guest, Not a member yet? Register   Sign In
Need to track users ... but not bots. Any ideas?
#1

[eluser]skattabrain[/eluser]
I have a tracking script in place and I need to differentiate between actual people and bots. The CI library for this ...
Code:
$this->agent->is_robot()
doesn't seem to be doing the job ... like at all. I know the docs say this list is not comprehensive. So what do I do?

Is there some "current" list of user agents out there I can upload and update now and then?

I'm going to have a completely HUGE table if I don't deal with the bots.

Any ideas are appreciated.
#2

[eluser]BrianDHall[/eluser]
Surely there is a file out there somewhere to help with this, but be aware that they are utterly dependent on bots correctly reporting their user agent.

Be aware that malicious bots (spammers and countless others) don't honestly report their User Agent, but intentionally pretend to be a regular user. There simply is no sorting those out without some major site trickery.
#3

[eluser]skattabrain[/eluser]
understood ... this application is to weed out some noise ... so if a bot wants to pretend it's a browser ... i'm not going to lsoe sleep over it ... but logs filled with thousands of entries from google needs to be avoided.

i've created my own tool ... when i see a bot i don't like in my report ... i add it to a purge list and it then runs on a cron.

this was a good start ... http://www.user-agents.org/ but looks like it hasn't been updated in a while. my solution works for now ... granted ... it does require some manual effort.
#4

[eluser]InsiteFX[/eluser]
Here is a link to well known search engine bots

Search Engine Bots

Enjoy
InsiteFX
#5

[eluser]cahva[/eluser]
Download AWStats package and you can find "wwwroot/cgi-bin/lib/robots.pm" file in it. It has quite nice list of robots. Ofcourse theres missing some, but the rest you can add to your purge list Smile
#6

[eluser]skattabrain[/eluser]
@InsiteFX - thanks, but that list leaves a lot to be desires! Smile

@Cahva - nice idea! I think I'll try that!
#7

[eluser]davidbehler[/eluser]
€dit: delete plz...i should start reading the topic before answering to it...
#8

[eluser]skattabrain[/eluser]
I checked the awstats file, but I don't think it will work. They aren't using complete user agents ... and some are so short that I'm worried if I look for these patterns in all user agents, I'm afraid of false positives. Here's an example of some bots in that file - 'nhse', 'robi', 'rules' etc ... plenty more there too.

So what I've down is I use the user_agent class and I have the fields in my log table that use the $this->agent->is_browser() function to mark "known" humans. Then I look for all traffic that is not "know good" and I then select all the agents I want to purge ... then they get added to my purge list which a cron job fires every hour to purge the unwanted traffic.
#9

[eluser]InsiteFX[/eluser]
Also a bot does not need to use a user agent etc. Plus a bot can be a proxy, they also hook on to Google's or Yahoo's proxy servers. So there is really no way to catch them all, look at your server log files and you will see what I am talking about!

Enjoy
InsiteFX
#10

[eluser]skattabrain[/eluser]
yeah ... i have seen it ... but i need to at least cut the noise. from what i can see though, CI's ability to discern a real person seems decent. so maybe i need to be less careful and just use $this->agent->is_browser()




Theme © iAndrew 2016 - Forum software by © MyBB