Welcome Guest, Not a member yet? Register   Sign In
Can I - or should I - block bots from creating DB sessions

[eluser]little brittle[/eluser]
I have my CI site using DB sessions with 'sess_use_database'=true, and set cookie expiration for 2 weeks.

The problem is, I currently have 60,000 rows in my CI_Sessions table, and I'm not getting a ton of traffic. Based on the useragent, many of those sessions are for Googlebot or other bots, sometimes with the same IP address. None of these bots require a session to retrieve the data they need. My concern is that when my site is getting a thousand times more traffic, I have millions of unnecessary database entries, making it more expensive to find a retrieve valid session data.

Is there a way to prevent bots from creating sessions? Is it a good idea? Has anyone encountered something like this?

I like the idea.

If you can see sessions for bots, then the sessions library must be writing the session to the database, and I agree that it shouldn't be. I think the sessions library may need to be rewritten to take into account any clients that don't support cookies (such as bots).

I would have thought that the sessions library will clear old entries after the cookie is due to expire, but I could be wrong.

60,000 is a very long way from millions, and a database should be able to happily work with a few million rows anyway, especially if you're just querying a single word or number in a single column.

You could modify the sessions lib to utilize the User Agent library's is_robot() method.

[eluser]little brittle[/eluser]
[quote author="simshaun" date="1234410641"]You could modify the sessions lib to utilize the User Agent library's is_robot() method.[/quote]
Is there an easy way to do that? I'd rather not edit core files, since it makes it a pain to upgrade.

Create a file in your ./system/application/libraries directory named MY_Session.php

Inside that, create a class named MY_Session that extends CI_Session.

Override any necessary functionality.

Load MY_Session instead of session.

Just turn up the garbage collection mrand() for the session management.

Are you only storing session data for two weeks for those users that actually are logged in?

[eluser]little brittle[/eluser]
Thanks for the help TheFuzzyOne, I'll try that out.

bd3521: Yes, the earliest record in my sessions table is two weeks ago. It doesn't appear to be storing anything older than that. It just seemed like a lot of unnecessary data was being retained, and I wanted to find a workaround.

I just looked at my database, and the ci_sessions table accounts for 80% of my db size, and it keeps growing by leaps and bounds. I'm surprised this issue hasn't been discussed more.

Session Library:: function _sess_gc()
var $gc_probability = 5;

bump this up a little bit as the TheFuzzyOne suggested in My_Session

[eluser]little brittle[/eluser]
[quote author="bd3521" date="1234864432"]Session Library:: function _sess_gc()
var $gc_probability = 5;

bump this up a little bit as the TheFuzzyOne suggested in My_Session[/quote]
But doesn't garbage collection only delete expired sessions that are lingering in the database? I don't think I have any sessions that are expired. My problem is that I have legit sessions created by robots that haven't expired yet. I'm trying to find a way to prevent them from being entered in the DB.

What are some of your oldest ci_sessions activity dates?

Theme © iAndrew 2016 - Forum software by © MyBB