Welcome Guest, Not a member yet? Register   Sign In
Can I - or should I - block bots from creating DB sessions
#1

[eluser]little brittle[/eluser]
I have my CI site using DB sessions with 'sess_use_database'=true, and set cookie expiration for 2 weeks.

The problem is, I currently have 60,000 rows in my CI_Sessions table, and I'm not getting a ton of traffic. Based on the useragent, many of those sessions are for Googlebot or other bots, sometimes with the same IP address. None of these bots require a session to retrieve the data they need. My concern is that when my site is getting a thousand times more traffic, I have millions of unnecessary database entries, making it more expensive to find a retrieve valid session data.

Is there a way to prevent bots from creating sessions? Is it a good idea? Has anyone encountered something like this?
#2

[eluser]TheFuzzy0ne[/eluser]
I like the idea.

If you can see sessions for bots, then the sessions library must be writing the session to the database, and I agree that it shouldn't be. I think the sessions library may need to be rewritten to take into account any clients that don't support cookies (such as bots).

I would have thought that the sessions library will clear old entries after the cookie is due to expire, but I could be wrong.

60,000 is a very long way from millions, and a database should be able to happily work with a few million rows anyway, especially if you're just querying a single word or number in a single column.
#3

[eluser]simshaun[/eluser]
You could modify the sessions lib to utilize the User Agent library's is_robot() method.
#4

[eluser]little brittle[/eluser]
[quote author="simshaun" date="1234410641"]You could modify the sessions lib to utilize the User Agent library's is_robot() method.[/quote]
Is there an easy way to do that? I'd rather not edit core files, since it makes it a pain to upgrade.
#5

[eluser]TheFuzzy0ne[/eluser]
Create a file in your ./system/application/libraries directory named MY_Session.php

Inside that, create a class named MY_Session that extends CI_Session.

Override any necessary functionality.

Load MY_Session instead of session.
#6

[eluser]bd3521[/eluser]
Just turn up the garbage collection mrand() for the session management.

Are you only storing session data for two weeks for those users that actually are logged in?
#7

[eluser]little brittle[/eluser]
Thanks for the help TheFuzzyOne, I'll try that out.

bd3521: Yes, the earliest record in my sessions table is two weeks ago. It doesn't appear to be storing anything older than that. It just seemed like a lot of unnecessary data was being retained, and I wanted to find a workaround.

I just looked at my database, and the ci_sessions table accounts for 80% of my db size, and it keeps growing by leaps and bounds. I'm surprised this issue hasn't been discussed more.
#8

[eluser]bd3521[/eluser]
Session Library:: function _sess_gc()
var $gc_probability = 5;

bump this up a little bit as the TheFuzzyOne suggested in My_Session
#9

[eluser]little brittle[/eluser]
[quote author="bd3521" date="1234864432"]Session Library:: function _sess_gc()
var $gc_probability = 5;

bump this up a little bit as the TheFuzzyOne suggested in My_Session[/quote]
But doesn't garbage collection only delete expired sessions that are lingering in the database? I don't think I have any sessions that are expired. My problem is that I have legit sessions created by robots that haven't expired yet. I'm trying to find a way to prevent them from being entered in the DB.
#10

[eluser]bd3521[/eluser]
What are some of your oldest ci_sessions activity dates?




Theme © iAndrew 2016 - Forum software by © MyBB