Welcome Guest, Not a member yet? Register   Sign In
User Agent Library Question
#1

[eluser]tinawina[/eluser]
I have a click-tracking scenario in place that is getting clobbered by robots. I have a robots.txt file in place and that is helping somewhat.

I am wondering about how best to deploy the User Agent class. If I wanted to ensure that robots -- good or bad ones -- do not access a part of my site, would the following code do the trick?

Code:
if ( ($this->agent->is_browser() === true) || ($this->agent->is_mobile() === true) )
{
   allow action
}
else
{
   do not allow action
}

I know that there is a check specifically for robots ($this->agent->is_robot()), but the list of robots in the config for user agents is short, and robots are cropping up all of the time. I'm wondering if this would be more of a catch-all.

Thanks for any help/insight!
#2

[eluser]tinawina[/eluser]
Ok - I've been doing a lot of reading up about robots today.... Here's where I've landed.

We want to prevent up-ticking a page-view count by one when a robot visits the page. We do want good robots (eg., google) to index the page, however.

I put this code in place in my page-display controller a little earlier today and am feeling better about things. My approach is to check for a definite robot (definite meaning it is saved to my /config/user_agents file already) or a potential robot that I need to investigate and save to my config file if appropriate.

I sure would appreciate a heads-up from anyone out there who sees a problem or potential "yikes!" with this approach:

Code:
$this->load->library('user_agent');

$info = $this->agent->agent_string();
    
/* We check that the user-agent is not a browser or a mobile device.
   If it is either -- the click should be counted (assumption: this is a real person).
   If it is neither, it is possibly a robot.
*/

if ( ($this->agent->is_browser() === false) || ($this->agent->is_mobile() === false) )
{
    $where = array('info' => $info); // check whether user_agent info already exists in our dbase
    
    $get_records = $this->omni_data->getRecords('robots', $where);
    
// If no records are returned means this user-agent's info is not
// in our robot's table yet, we need to insert a new record for this user_agent

    if ($get_records === false)
    {
        $ip_address = $this->input->ip_address();
    
        $data = array(
            'page' => 'ratings',
            'ip_address' => $ip_address,
            'info' => $info
        );
        $this->omni_data->insert('robots', $data);

/* Now email me to let me know there's a potential robot to add
   to my /config/user_agents file
*/
        mail('[email protected]', 'New potential robot user-agent', $info, 'FROM: [email protected]');
    }
}
    
/* Now we do a quick check of the $this->agent->agent_string() variable.
   Looking at the data I collected just today, many of the bots include
   "bot", "spider", "search", or "crawl" in this string. If that is true
   in this instance, do not let this bot's visit be counted in click-data.
   (This is an on-the-fly catch of any robot that is  not in our
   /config/user_agents file yet.) As well, if the $this->agent->is_robot()
   is set to true, don't count visit.
*/

if ( !eregi('bot', $info) || !eregi('spider', $info) || !eregi('crawl', $info) || !eregi('search', $info) || ($this->agent->is_robot() === false) )
{
    // Ok - I'm pretty certain this is not a robot. Update clicks_listing_page table
}


I updated our /config/user_agents file so that it now includes the following robots (will be updating this as needed from here on out):

Code:
'askjeeves'    => 'AskJeeves',
'baiduspider'    => 'Baiduspider',
'cazoodlebot'    => 'CazoodleBot',
'charlotte'    => 'searchme',
'dotbot'    => 'DotBot',
'fastcrawler'    => 'FastCrawler',
'gigabot'    => 'Gigablast',
'googlebot'    => 'Googlebot',
'infoseek'    => 'InfoSeek Robot 1.0',
'msnbot'    => 'MSNBot',
'slurp'        => 'Inktomi Slurp',
'yahoo'        => 'Yahoo',
'lycos'        => 'Lycos',
'teoma'        => 'Ask Jeeves Teoma',
'twiceler'    => 'Twiceler'

Does this look like I've got it all handled? Thanks for any help!




Theme © iAndrew 2016 - Forum software by © MyBB