Login

11-13-2008, 05:17 AM

[eluser]Arjen van Bochoven[/eluser]
Hey Allard,

Thank you for looking at this!

[quote author="Derek Allard" date="1226594930"]On quick inspection let me say that a serialized array may be an approach you want to re-consider. If output got large, PHP could blow up in our faces serializing arrays with that data..[/quote]

As far as I understand serialize() is not doing much with strings:

A string is encoded as
s Confused

ize:value;

But I must admit I haven't done any benchmarks, can someone confirm the serialize() function can handle large chunks of data?

Arjen

11-13-2008, 05:41 AM

[eluser]Aquillyne[/eluser]
It looks like my original solution of just prepending the data with timestamp (like before) plus headers (unlike before) may be simpler and work better, even if less elegant.

I'm so glad finally this has gathered some support, I've been battling on this one on-off for months. I did originally post in the feature requests forum too but I believe it was closed as it was considered a double-post.

Please add this to the core, it's sorely missing. The core code has a comment saying "we need to add header caching"!

11-13-2008, 12:17 PM

[eluser]narkaT[/eluser]
[quote author="Arjen van Bochoven" date="1226596677"]But I must admit I haven't done any benchmarks, can someone confirm the serialize() function can handle large chunks of data?[/quote]

define "large".

I've got an application that uses serialize and a database to cache data.
the biggest serialized string in the DB is currently about 12kb long,
runs fast and stable Smile

the js-file I used for benchmarking the code is about 31kb big.

I'm really impressed by the effectiveness of (un)serialize.
I've never expected it to be so damn fast Wink

[quote author="Aquillyne" date="1226598092"]It looks like my original solution of just prepending the data with timestamp (like before) plus headers (unlike before) may be simpler and work better, even if less elegant.[/quote]

I'll try to benchmark both versions soon, code-chunks welcome Smile

11-13-2008, 12:27 PM

[eluser]Arjen van Bochoven[/eluser]
I would consider 10MB+ large. If it can handle files of this size faster and in less memory than the original cache functions, it would surely prove that it does not "explode in our face". :-)

11-13-2008, 12:34 PM

[eluser]narkaT[/eluser]
okay, I'll put that on my "to-test-list" :lol:

11-17-2008, 03:12 AM

[eluser]narkaT[/eluser]
okay here are the results for displaying cached data:

1. a simple "string-disassembling" variant: ~ 0.00016 s
2. serialize: ~ 0.00018 s
3. RegExp: ~ 0.00076 s

memory consumption (measured using memory_get_peak_usage) is exactly the same with all 3 methods.

the "winner" looks like this Wink

Code:
class MY_Output extends CI_Output {

    /**

     * Write a Cache File

     *

     * @access    public

     * @return    void

     */    

    function _write_cache($output)

    {

        $CI =& get_instance();    

        $path = $CI->config->item('cache_path');

        $cache_path = ($path == '') ? BASEPATH.'cache/' : $path;

        if ( ! is_dir($cache_path) OR ! is_really_writable($cache_path))

        {

            return;

        }

        $uri =    $CI->config->item('base_url').

                $CI->config->item('index_page').

                $CI->uri->uri_string();

        $cache_path .= md5($uri);

        if ( ! $fp = @fopen($cache_path, FOPEN_WRITE_CREATE_DESTRUCTIVE))

        {

            log_message('error', "Unable to write cache file: ".$cache_path);

            return;

        }

        // Prepare expiration time and headers

        $expire = time() + ($this->cache_expiration * 60);

        $headers = array();

        foreach($this->headers as $header)

        {

            $headers[] = $header[0].(int)(boolean)$header[1];

        }

        $headers = implode("\n", $headers);

        if (flock($fp, LOCK_EX))

        {

            fwrite($fp, $expire .'TS---&gt;'. $headers .'H---&gt;'. $output);

            flock($fp, LOCK_UN);

        }

        else

        {

            log_message('error', "Unable to secure a file lock for file at: ".$cache_path);

            return;

        }

        fclose($fp);

        @chmod($cache_path, DIR_WRITE_MODE);

        log_message('debug', "Cache file written: ".$cache_path);

    }

    /**

     * Update/serve a cached file

     *

     * @access    public

     * @return    void

     */

    function _display_cache(&$CFG, &$URI)

    {

        $cache_path = ($CFG->item('cache_path') == '') ? BASEPATH.'cache/' : $CFG->item('cache_path');

        if ( ! is_dir($cache_path) OR ! is_really_writable($cache_path))

        {

            return FALSE;

        }

        // Build the file path.  The file name is an MD5 hash of the full URI

        $uri =    $CFG->item('base_url').

                $CFG->item('index_page').

                $URI->uri_string;

        $filepath = $cache_path.md5($uri);

        if ( ! @file_exists($filepath))

        {

            return FALSE;

        }

        if ( ! $fp = @fopen($filepath, FOPEN_READ))

        {

            return FALSE;

        }

        flock($fp, LOCK_SH);

        $cache = '';

        if (filesize($filepath) > 0)

        {

            $cache = fread($fp, filesize($filepath));

        }

        flock($fp, LOCK_UN);

        fclose($fp);

        // Strip out the embedded timestamp and headers        

        $ts = strpos($cache, 'TS---&gt;');

        $h = strpos($cache, 'H---&gt;');

        if ( ! $ts || ! $h ) {

            return FALSE;

        }

        $match = array();

        $match['1'] = substr($cache, 0, $ts);

        $match['2'] = substr($cache, $ts+6, $h-$ts-6);

        $match['0'] = $match['1'].'TS---&gt;'.$match['2'].'H---&gt;';

        // Has the file expired? If so we'll delete it.

        if (time() >= trim(str_replace('TS---&gt;', '', $match['1'])))

        {         

            @unlink($filepath);

            log_message('debug', "Cache file has expired. File deleted");

            return FALSE;

        }

        // Extract the headers

        $headers = explode("\n", $match['2']);

        foreach($headers as $header)

        {

            $this->headers[] = array(substr($header, 0, -1), substr($header, -1));

        }

        // Display the cache

        $cache = str_replace($match['0'], '', $cache);

        $this->_display(&$cache);

        log_message('debug', "Cache file is current. Sending it to browser.");

        return TRUE;

    }

}

11-17-2008, 05:01 AM

[eluser]Arjen van Bochoven[/eluser]
Ok, I did some benchmarking with a 'reasonably large' file (6,2MB)

As you can see the memory results are interesting: the CI builtin cache triples memory usage as well as narkaT's strpos() method. My own proposed method roughly doubles memory usage (which could be improved)

Not cached:
Page rendered in 1.1956 seconds
Memory usage: 6.94MB

CI 1.7.0 caching using preg_match():
Page rendered in 0.4045 seconds
Memory usage: 18.86MB

My solution, using serialize():
Page rendered in 0.3594 seconds
Memory usage: 12.74MB

narkaT's solution using strpos():
Page rendered in 0.4231 seconds
Memory usage: 18.9MB

@narkaT: You assume the second parameter of set_header() is a single char, but that is nowhere enforced, people can send in an empty string if they want. You should also set the type of the second var (which should be a boolean).
edit: Make sure you use the {memory_usage} pseudo variable in your view to measure memory usage, otherwise you get the cached value.

11-17-2008, 07:32 AM

[eluser]narkaT[/eluser]
[quote author="Arjen van Bochoven" date="1226941283"]@narkaT: You assume the second parameter of set_header() is a single char, but that is nowhere enforced, people can send in an empty string if they want. You should also set the type of the second var (which should be a boolean).[/quote]
thats right, I should have done some "cleaning" before posting the code Wink

I've edited the above code.

[quote author="Arjen van Bochoven" date="1226941283"]edit: Make sure you use the {memory_usage} psuedo variable in your view to measure memory usage, otherwise you get the cached value.[/quote]

I measured the memory usage directly in the extended output class using memory_get_peak_usage.
very "hacky" although.

I were confused that in my benchmark there where no difference between the memory consumptions,
even though I used the same php-function as CI to get the memory usage.

so I used the method with {memory_usage} you suggested.

I managed to detect the problem an cut down the memory usage drastically Smile

my previous benchmarks reported the same memory usage because I measured the usage
before calling the _display-function.

So the call to the _display-function was the cause of the large memory usage.
passing the output by reference solved that issue for both, my strpos approach and
your serialize approach.

I've done some benchmaking after optimizing the scripts.
both used str_repeat for generating the data (I'll leave out the non cached results).
one with 7Mb and one with 50Kb

7Mb
build in preg_match():
21.38MB - 0.0360 s

serialize():
7.4MB - 0.0239 s

strpos():
7.41MB - 0.0352 s

50Kb
build in preg_match():
0.53MB - 0.0041 s

serialize():
0.45MB - 0.0044 s

strpos():
0.46MB - 0.0045 s

serialize is clearly the memory-friendliest solution and when caching big chunks of
data its also the fastest Smile

11-17-2008, 07:57 AM

[eluser]Arjen van Bochoven[/eluser]
Very nice optimization, passing the data as ref. I think we have a winner here!

I've updated the wiki page and added a link to this thread.

Arjen

11-17-2008, 03:26 PM

[eluser]Arjen van Bochoven[/eluser]
I've spoken too soon, I forgot Call-time pass-by-reference" is deprecated, so to make this work we have to change the function definition of _display():

Code:
function _display($output = '')

to

Code:
function _display(&$output = '')

which means we need to extend _display(), which is a rather large piece of code. This would surely be a candidate for a core update and I don't know if adding the ref is breaking something (php4, anybody?)

I'll revert the wiki to the last version until we sort this one out.

Arjen