Welcome Guest, Not a member yet? Register   Sign In
Improving performance of csv_from_result method
#1

[eluser]qwertyqwerty[/eluser]
I was having a tough time of it today, wondering why on earth it was taking over one minute to convert 20,000 records to a csv string using the $this->dbutil->csv_from_result() method. Whilst 20,000 rows is a fair amount, it shouldn't take that long and after trying to export the query from phpMyAdmin, it only took a second or two, I knew something had to be up.

I'm no PHP expert but it seems that the problematic code is the following:

Code:
foreach ($row as $item)
{
    $out .= $enclosure.str_replace($enclosure, $enclosure.$enclosure, $item).$enclosure.$delim;            
}
$out = rtrim($out);
$out .= $newline;

I don't see why you need to all these string operations when this can be achieved in one line using the implode function.

Code:
$out .= '"' . implode('","', $row) . '"' . $newline;

Using implode, lets look at the difference

Code:
// My version
function csv_from_result($query, $delim = ",", $newline = "\n", $enclosure = '"')
{
    if ( ! is_object($query) OR ! method_exists($query, 'field_names'))
    {
        show_error('You must submit a valid result object');
    }    

    $out = '';
    
    // First generate the headings from the table column names
    $out = '"' . implode('","', $query->list_fields()) . '"' . $newline;

    // Next blast through the result array and build out the rows
    foreach ($query->result_array() as $row)
    {
        $out .= '"' . implode('","', $row) . '"' . $newline;
    }
    return $out;
}

// Original version
function csv_from_result($query, $delim = ",", $newline = "\n", $enclosure = '"')
{
    if ( ! is_object($query) OR ! method_exists($query, 'field_names'))
    {
        show_error('You must submit a valid result object');
    }    

    $out = '';
    
    // First generate the headings from the table column names
    foreach ($query->list_fields() as $name)
    {
        $out .= $enclosure.str_replace($enclosure, $enclosure.$enclosure, $name).$enclosure.$delim;
    }
    
    $out = rtrim($out);
    $out .= $newline;
    
    // Next blast through the result array and build out the rows
    foreach ($query->result_array() as $row)
    {
        foreach ($row as $item)
        {
            $out .= $enclosure.str_replace($enclosure, $enclosure.$enclosure, $item).$enclosure.$delim;            
        }
        $out = rtrim($out);
        $out .= $newline;
    }

    return $out;
}

To say this code is faster is an understatement as I've shaved off roughly 50 seconds of processing time.

Hope this helps someone and please bear in mind I'm not a PHP expert and I may have just misunderstood why the original code was done the way it was.

Cheers,
Ryan
#2

[eluser]googlymoogly[/eluser]
I ran into this myself recently, and did something similar using implode(). It turned out over 200 (!) times faster on a result set containing a little over 26,000 rows, 6 columns.

200 times.

Your code with the hard-coded delimiters and enclosures (mine had the same functionality as the original) should be even faster than that.

I suggest rewriting this function for the next release, because the original version is, if I may be so blunt, pretty atrocious. Even xml_from_result() is faster. Now I have no trouble at all generating .CSV files containing 200,000 rows.
#3

[eluser]johnwbaxter[/eluser]
@googlymoogly

Fancy sharing your function with us?

I just tried Ryno in Stereo's version of that function and it reduced the time it took to generate the csv by 3/4! Not a bad result!

Thanks Ryno in Stereo!
#4

[eluser]qwertyqwerty[/eluser]
No worries!
#5

[eluser]googlymoogly[/eluser]
audiopleb: it's identical to Ryno's without the hardcoded $delim and $enclosure.

Code:
if (! function_exists('csv_from_result'))
{
    function csv_from_result($query, $delim=",", $newline="\n", $enclosure='"')
    {
        if ( !is_object($query) || !method_exists($query, 'field_names')) {
            show_error('You must submit a valid result object');
        }    

        $out = '';
        
        // First generate the headings from the table column names
        $out = $enclosure . implode($enclosure . $delim . $enclosure, $query->list_fields()) . $enclosure . $newline;

        // Next blast through the result array and build out the rows
        foreach ($query->result_array() as $row) {
            $out .=  $enclosure . implode($enclosure . $delim . $enclosure, $row) . $enclosure . $newline;
        }
        return $out;
    }
}
#6

[eluser]johnwbaxter[/eluser]
Oh right so no quicker then.

Thanks though!
#7

[eluser]johnwbaxter[/eluser]
I just re-read what i wrote and i hope you don't think i was being ungrateful! That certainly wasn't what i was saying.

I meant "okay cool i can carry on with what i've got as yours are both the same"

Thanks to you both!
#8

[eluser]googlymoogly[/eluser]
No worries. (:
#9

[eluser]cynge[/eluser]
An important note on these useful functions though... With the posted updates, double-escaping the enclosures was left out. In case you need it to be closer to the original:

Code:
// First generate the headings from the table column names
$out = rtrim( $enclosure . implode($enclosure . $delim . $enclosure, str_replace($enclosure, $enclosure.$enclosure, $query->list_fields())) . $enclosure ) . $newline;
        
// Next blast through the result array and build out the rows
foreach ($query->result_array() as $row)
{
    $out .= rtrim( $enclosure . implode($enclosure . $delim . $enclosure, str_replace($enclosure, $enclosure.$enclosure, $row)) . $enclosure ) . $newline;
}




Theme © iAndrew 2016 - Forum software by © MyBB