Welcome Guest, Not a member yet? Register   Sign In
Checking for non-logged in users before showing cached pages [Adv.]
#1

[eluser]Xeoncross[/eluser]
Caching the HTML output of non-important pages (like blog posts) really helps to speed up a site by skipping the loading of the entire system (and rendering time) and just spiting out a pre-made copy of the page. This would be one way you can keep large waves of users to your site index or whatever from eating resources.

However, one of the problems of caching pages is that in order to check wither or not you can show the page - you have to load the system (DB, Session lib, etc) to check if the user is logged in. This kind of defeats the point of the showing the cached page since you just wasted all that memory anyway.

With the session lib loaded you can then determine if it is safe to show a cached version of the page or if you should re-render the page for the logged in user. And vis-versa; a page rendered for a logged-in user shouldn't be shown to a guest!

So here is my idea. Since most sites use cookies for sessions & cookies can be checked without the loading of any libraries - why not check them?

At first I thought that I would test for the existence of ANY cookie

Code:
if(empty($_COOKIE)) {
    show_cache();
    die();
}

...

But the problem is that in order for the page to be created the first time - the whole system would have to be loaded and then at the end of all that a session would have already been started for the user so I couldn't create the cache because the check would now fail.


Code:
...

if(empty($_COOKIE)) {
    create_cache();
    die();
}


So then I thought of another way to handle this - Perhaps you could set a cookie called "logged_in" in addition to the session cookie when a use successfully logs in. Then you could test for "logged_in" cookie and the absence would prove this is a guest wither they have a session or not. It would even work for non-cookie user agents (false/non-existent cookie as a sign for a guest).

Any Ideas?

Quote:NOTE: this is only pertaining to full page caching. partial page caching (widgets, SQL, etc) should be handled with memcached and or other methods.
#2

[eluser]Damien K.[/eluser]
If I understand you correctly, you're trying to manage cached pages for authenticated users although your subject suggest otherwise. I'm not sure if this will help you, but I'll throw it out there anyway.

Generally speaking, I find that most protected pages are dynamic thus has little use for caching. Caching the site index page, which is public, may make sense for some website. Say if you do have a need and, for example, you want to show protected blog posts, which rarely changes after it is posted. So one may take the route to cache those posts and recreate the cache upon individual update.

This begs the question: how do you handle authentication in the first place? In order to access any protected url (ie, pages), you will always detect if the user is loggged in. To do this securely, you need to do this on the server side. Other technologies such as Java and .Net handles session data on the application server (database not always required, unless you have a session server because your web/application servers are load balanced), so testing if a user is logged in can be done simply by testing against a flag in the session. However, for PHP/CI you will have your sessions in a database for validation against user cookies. So before you even get to your caching "problem" for protected pages, you will have to test if the user is authenticated/authorized. You cannot skip this test for protected pages hence you will always be hitting the database (to validate the cookie).

For public pages, on the other hand, you can serve the cache page if they're static. If they're dynamic, then there's no point for caching unless you want to refresh the content of the page over a certain period of time, like every 2 hours.

One feature available for "enterprise" technologies is that you can pre-compile the pages, something sort of like caching. You can do something similar to this by creating a cache of all of your pages upon "application start-up". One way to do this is to "visit" all your pages like what a bot would do and that initial visit will trigger a "create_cache()" for each page.

Something tell me that your authentication/authorization tier is not clearly separated from the business logics of your application or at least not "securely" implemented, hence you're running into this "problem". I'm not quite sure what you're asking exactly, so maybe I'm WAY off. But I hope this has help.

Performance is not as high a concern for myself, but many in the community think otherwise. If it is, you probably won't leverage a framework for those certain use cases. Leveraging tools from the developer community usually has some sort of performance hit. However, I'll take the hit for what the tools can offer. Furthermore, if you have that much traffic causing a performance issue, you will probably have enough revenue or justifications to scale up and out.
#3

[eluser]Xeoncross[/eluser]
Thanks for the reply. My topic apparently, is relativity novel, which is why I can't fine much information in it. Let me try to explain it further.

Assume:
User Auth library uses sessions
Sessions are stored in DB
Checking a users session requires loading DB & user lib.

Lets say you have a page like a blog post. Most people will see the post and comments. Admins will see edit links next to each comment. Here we have two forms of the same page. Normal CI page caching checks for the existence of a cache BEFORE loading the system - so your user lib, sessions, and database are not loaded to even be checked. Therefore, if you create a cache for a guest - then an admin will also see that page (without the edit links) because at this point in the system we can't tell an admin from a guest.

The solution to this problem is to load the user lib, sessions, and database and THEN check to see if it is alright to show the cache. This is inefficient.

So my idea is to base the check off of something that can be accessed without loading all this, something that is native to PHP, something like... cookies. Now hear my cookie idea and keep in mind that this has nothing to do with actually logging people in based on this cookie.


A User logs in and is given a session cookie along with a cookie called "logged_in" which is set to true. This cookies only point in life is to give us a heads up that the user is *probably* logged in. All guests and non-logged in people have no cookie (unless they manually create one in their browser).

So back to the blog post. First we check for isset($_COOKIE['logged_in']). If it is found then we know that this user has probably logged in so we can justify wasting our resources to load the DB, sessions, and user lib to check for sure. If this cookie is not found then we know they couldn't possible be logged in so we show the cache.

So if a user creates a $_COOKIE['logged_in'] then our system wastes time to show them fresh pages (but still don't see edit links), admins with valid $_COOKIE['logged_in'] see fresh pages (with links), and everyone else sees caches.

make sense?
#4

[eluser]BrianDHall[/eluser]
Makes since to me - I don't see anything wrong with your logic or theory. It seems a perfectly reasonable way to short-circuit public semi-static page rendering.
#5

[eluser]Damien K.[/eluser]
@OP, thank you for putting your question in context. I'm inclined to say that this use case is not a good cadidate for caching, because (1) comments are dynamic (new ones can be added) and caching will not show new comments. Okay, we'll assume the post has closed off comments as it's an old post and the admin still wants to be able to edit the comments. And (2) because the page contains "dynamic" content (ie, edit links).

Clearly, we have members of the community who insist on caching Smile.

I see why you're doing this though. I assume you have a few million hits a day viewing your blog archives and cannot scale Wink. First, how are you caching your pages? CI's built-in feature or your own? I see you mention CI and I also see "show_cache()" and "create_cache()", which I don't think are CI functions. If you're using CI, where are you testing for the logged_in cookie?

However, the short (and I believe is the ideal) answer is: don't cache. I'm not trying to evade your question, because this clearly is implementable.

Btw, to put things in perspective, I don't believe Facebook even worry about these. It is unlikely that their code-base is even efficient, at least not majority of it. I don't have evidence to back this up, but people who code for them are no different from you and I. Facebook would scale via hardware, which is the cheapest way. I do like how you're approaching this though.
#6

[eluser]Xeoncross[/eluser]
[quote author="Damien K." date="1253604321"]I'm inclined to say that this use case is not a good cadidate for caching, because (1) comments are dynamic (new ones can be added) and caching will not show new comments.
[/quote]

That is solved by removing the cache when new comments are added by logged in users. So it's not a problem.


Quote:I see why you're doing this though. I assume you have a few million hits a day viewing your blog archives and cannot scale.

Actually, I am a die hard performance guy that can't stand page loads over 40ms.


Quote:First, how are you caching your pages? CI's built-in feature or your own? I see you mention CI and I also see "show_cache()" and "create_cache()", which I don't think are CI functions. If you're using CI, where are you testing for the logged_in cookie?

Actually, the use of CI is a example. Any framework will work. And yes, the show/create_cache() functions are fictitious.

Quote:However, the short (and I believe is the ideal) answer is: don't cache.

What if I were to tell you that not caching would result in a 10 fold increase in the required resources? Memcache and partial page caching is what takes a site from 100 pages a second to +1,000.

Plus I really don't like managing 5 servers when one will work. ;-)
#7

[eluser]Damien K.[/eluser]
Quote:Actually, I am a die hard performance guy that can’t stand page loads over 40ms.

Okay, sounds fair.

Quote:What if I were to tell you that not caching would result in a 10 fold increase in the required resources? Memcache and partial page caching is what takes a site from 100 pages a second to +1,000.

If I recall correctly, this is O(n) complexity. In computer theory, it means insignificant re: performance Smile.

Quote:So back to the blog post. First we check for isset($_COOKIE[‘logged_in’]). If it is found then we know that this user has probably logged in so we can justify wasting our resources to load the DB, sessions, and user lib to check for sure. If this cookie is not found then we know they couldn’t possible be logged in so we show the cache.

So if a user creates a $_COOKIE[‘logged_in’] then our system wastes time to show them fresh pages (but still don’t see edit links), admins with valid $_COOKIE[‘logged_in’] see fresh pages (with links), and everyone else sees caches.

Looks like you have solved your own problem? You're looking for an alternate implementation? If you're thinking about doing this in CI, you may have to fight the framework.
#8

[eluser]Chad Fulton[/eluser]
Damien:

I think you underestimate the power of page caching for non-logged-in users. (btw, Facebook most certainly caches its content quite aggressively)

Xeoncross:

I think this sounds like a good solution. Not sure how you're implementing it, but I have used the cache_override hook for this sort of thing, so if you haven't seen it, you might look into it.
#9

[eluser]Damien K.[/eluser]
I appreciate the need for caching. I'm certain large sites have some sort of caching in place. I'm a strong believer in code refactoring and will cross the bridges when I reach them. Xeoncross is different in that he has a preference or "specialization" in it.

cache_override... interesting. Didn't come across this in my readings...
#10

[eluser]kurucu[/eluser]
It sounds like a great idea, Xeoncross. I'd personally not worry until the site was large enough to be problematic, but see the point completely. What's more, in addition to your performance sentiments, doing it early will likely save a lot of time when it becomes later necessary.

What I would say is that understanding of the site in question would be important. If, for example, a public blog is commented on every few moments then you either give up caching, as creating and destroying the cache becomes a burden, or only update the cache every 10 minutes. Deciding on matters like that would depend on the system in question. Also, with some knowledge of the session handling of the implemented system, adding new cookies may not be necessary as there might be nice easy hooks to grab hold of already.

Of course the implementation you hinted at is possible, but I would argue that after all this it might only be relevant to some sites/pages. I've stuck to partial caching so that the dynamic content stays dynamic and the relatively static content is cached (general page elements, blog content etc). I wouldn't want pre-caching of the whole page to break things like form validation and flash messages.

Essentially, if you're escaping your framework (and therefore probably business logic of your site, as it was written within the framework) either your framework is too heavy or it has been misused.

Wow. What a conflicted post I've written!




Theme © iAndrew 2016 - Forum software by © MyBB