Login

04-25-2008, 11:25 AM

[eluser]Lewis[/eluser]
Actually, section31's solution is the best. Here's a (summarised) extract from 'Advanced PHP Programming' by George Schlossnagle.

Advisory locks work well, but there are a few reasons to consider not using them:
- If your files reside on an NFS filesystem, flock is not guaranteed to work at all.
- Certain operating systems (Windows) don't allow non-blocking flocks.
- If more than one person accesses the page during a write, each one will still generate and overwrite the content multiple times.
- If the process crashes out during writing to the cache, a reading process will read the partial and corrupt file.
- There are certain situations (although rare) that the lock is never removed. (I believe an EE user had this problem - I saw a link somewhere to a forum post about it).

File swaps work by taking advantage of a nuance. When you use unlink() on a file, what really happens is that the filename-to-inode mapping is removed. The filename not longer exists, but the storage associated with it remains unchanged (for the moment), and it still has the same inode associated with it. In fact,the operating system does not reallocate that space until all open file handles on that inode are closed. This means that any processes that are reading from that file while it is unlinked are not interrupted; they simply continue to read from the old file data. When the last of the processes holding an open descriptor on that inode closes, the space allocated for that inode is released back for reuse.

... It then goes on about and gives this example code (I've shortened it):

Code:
&lt;?php

if (file_exists($cache_file)){

// .. Load the cache here

}

else {

// PID is unique to each process so no worrying about clashes

$cache_file_tmp = $cache_file .'.'. getmypid();

// .. Write to $cache_file_tmp

// Then rename it 

rename($cache_file_tmp, $cache_file);

}

The rename() function performs atomic moves when the source and destination are on the same filesystem, meaning that the operation is instantaneous. The benefits of using this methodology:
- The code is much shorter and incurs fewer system calls (thus in general is faster)
- Because you never modify the cache file directly, you elminate the possibility of writing a partial or corrupted cache file.

So it looks like section31's method would be much more beneficial.

04-25-2008, 02:03 PM

[eluser]Rick Jolly[/eluser]
Lewis, that won't work on Windows - although the rename inconsistency might be fixed for future php releases. See here:http://bugs.php.net/bug.php?id=44805.

So @unlink would have to be used before rename. There are some good suggestions here, but they're untested. If there was a magic bullet, then why does the zend framework use expensive cache corruption detection?

04-25-2008, 04:50 PM

[eluser]Elliot Haughin[/eluser]
Ok, there's been some great discussion here. So, I'd like to propose a method that I believe will solve the problems.

Code:
$this->lockfile = 'cache.lock';

function lock_cache()

{

    if ( file_exists($this->lockfile) )

    {

        $modified = file_get_contents($this->lockfile);

        // modified is a unix timestamp as the contents of the lockfile.

        // slick huh? - gets around the filemtime issue.

        // Let's check the $modified time of cache lock is reasonable;

        // What time is it Mr. Wolf?

        $now = time();

        // We want to do numeric calculations only.

        // If the cach lock contents are corrupt, this will strip out

        // bad characters. Taking the value not the string.

        $modified = intval($modified);

        // Is the locktime in the future?

        $diff = $now - $modified;

        if ($diff < 0 || $diff > 3 )

        {

            // File was written in the future? - 

            // This isn't Marty McFly!

            // OR

            // The file was written more than 3 seconds before $now

            // The cache lock isn't valid... it was left behind accidentally.

            unlink($this->lockfile);

        }

    }

    // Time to test 1 last time, it might have been deleted

    // because it wasn't supposed to be there.

    if ( !file_exists($this->lockfile) )

    {

         // Write a little 'do not disturb sign and

         // put it on the door :)

         // which is the timstamp right now.

         $lockhandle = fopen($$this->lockfile, 'x+');  

         fwrite($lockhandle, time());

         fclose($lockhandle);

         return true;

    }

    else

    {

        return false;

    }

}

function unlock_cache()

{

    @unlink($this->lockfile);

}

function write_cache()

{

    if ( $this->lock_cache() )

    {

        // NORMAL WRITING OF CACHE FILE HERE

        $this->unlock_cache();

    }

}

Again, this isn't tested. I've probably missed semi-colons everywhere.
But read it through from the write_cache() function. Hopefully it should make sense, hope you can see where I'm going with this.

04-29-2008, 05:36 AM

[eluser]Lewis[/eluser]
Rick Jolly, as mentioned in the article if you unlink the file then all existing connections to that file will remain open and functional - so it won't interfere with others opening the cache. So unlinking the file may be needed, but apart from the extra function call processing associated with it, there's no problems introduced by calling unlink. I still see it as the best solution despite having to work around the Window's issue.

04-29-2008, 06:15 AM

[eluser]TheFuzzy0ne[/eluser]
Perhaps I am way out of my depth here, but please bear with me, and let me know if I have overstepped my boundaries.

Why not follow a process similar to this.

There can be a subdirectory within the cache directory - I will call it "tmp" for the sake of simplicity.

The cache class could check the tmp directory first, for the cache file it needs. If it's in there, then that file is rendered. If it's not, then the file is read from the cache directory.

A file is copied over to the tmp directory when the cache is being updated in the cache directory. Once the file has been cached in the cache directory, the file in the tmp directory can be deleted, so the cache is then pulled from the newly cached file.

The only problem I can see here is that there may be race conditions which need to be ironed out, but the idea is very simple, and I feel it can work. What's more, you don't even need to bother with locking the file that's being written to, as it's not even looked at so long as the copied file exists in the tmp directory. The cache file in the tmp directory is being read from and rendered even before the original file is written to, and continues to be read from until it's deleted shortly after the main cache file has been written.

Now I know you guys are a bunch of tech-heads, and I'm sure my idea is flawed somewhere, but in my mind, I can't see how. If that is the case, could someone please explain what I'm missing (or just tell me to shut up and go away). Big Grin

Another suggestion would be to allow database caching as an alternative. I'm not sure how much more/less efficient it could be, but I am sure it's doable. Obviously, I'm not saying scrap the disk caching, but rather, it could be an alternative.

Thanks.

04-29-2008, 09:39 AM

[eluser]Lone[/eluser]
One possible issue that I just wanted to verify (and dont think has been covered?) was if you had a controller function that had a different end output based on post data that was sent to it then does this make the 'cache' for this function incorrect as it will be saved based on the post data sent when the cache was originally written?

It isn't normal practice for us to do this but I know there are others out there that could be caught out by this?

04-29-2008, 10:10 AM

[eluser]Lewis[/eluser]
[quote author="TheFuzzy0ne" date="1209489308"]Perhaps I am way out of my depth here, but please bear with me, and let me know if I have overstepped my boundaries.

Why not follow a process similar to this.

There can be a subdirectory within the cache directory - I will call it "tmp" for the sake of simplicity.

The cache class could check the tmp directory first, for the cache file it needs. If it's in there, then that file is rendered. If it's not, then the file is read from the cache directory.

A file is copied over to the tmp directory when the cache is being updated in the cache directory. Once the file has been cached in the cache directory, the file in the tmp directory can be deleted, so the cache is then pulled from the newly cached file.

The only problem I can see here is that there may be race conditions which need to be ironed out, but the idea is very simple, and I feel it can work. What's more, you don't even need to bother with locking the file that's being written to, as it's not even looked at so long as the copied file exists in the tmp directory. The cache file in the tmp directory is being read from and rendered even before the original file is written to, and continues to be read from until it's deleted shortly after the main cache file has been written.

Now I know you guys are a bunch of tech-heads, and I'm sure my idea is flawed somewhere, but in my mind, I can't see how. If that is the case, could someone please explain what I'm missing (or just tell me to shut up and go away). Big Grin

Another suggestion would be to allow database caching as an alternative. I'm not sure how much more/less efficient it could be, but I am sure it's doable. Obviously, I'm not saying scrap the disk caching, but rather, it could be an alternative.

Thanks.[/quote]

Well actually, your solution is pretty good. It is essentially the same as the original one suggested and the one I back. Except instead of putting the file in a directory and moving it, you only really need to rename the file. I'm not sure if moving it would have much/any performance differences as renaming, but it means you have to have another folder chmoded so I'd say the rename is a slightly more elegant approach. But the theory was spot on, so well done!

Update: I will explain the main downside though, just so you know:

- If more than one person visits a site before a temporary cache file is generated, they will all generate their own versions and overwrite each other. This is not so much a problem, but having all those people regenerate the content means the cache is not in use and resource usage could go up if this happens in quick succession. With a flock the PHP script would wait until the last one is done, but it still would overwrite the cache anyway so you wouldn't get the users all using the resources at once, but they'd use them one after each other and plus you'd keep them waiting. So it's not so much of a downside as there's no real work-a-round for this.

@Lone, That will always happen with cache. The idea of cache is to cache static or semi-static data. A blog is a perfect example. Lets say you write an article and it gets Dugg. You have thousands of people visiting your blog at one time, and your blog software is working out the entry, what categories it's in, the name of the author, the sidebar etc. and chances are the output of this is going to be exactly the same for all these people visiting. Instead of regenerating it every time and using up all your server's resoures, you can simple save this generated version of it and keep serving up that for all these requests. Then you save all that extra overhead and your server doesn't die under all these requests. Data that is user-input dependant you just wouldn't cache as the chances are you'll never serve that exact same page enough times to make it worth while.

04-29-2008, 10:23 AM

[eluser]Lone[/eluser]
Lewis: Yup I understand where it is very important in situations like that and I guess my thoughts of a sample usuage of it was being in all of your functions in a controller by having it in the __construct function.

And thats where my concern of it acting the way it would came from - but obviously most people would be inserting it into a particular function that would be more static data as such and hence would not be as much of a concern.