Login

***MGatner*** · 03-01-2019, 09:50 AM

Hi all- I've tested this as thoroughly as I can and continue to be baffled where the buildup is coming from. If anyone wants to recreate or make suggestions please do. Note: I have already worked around this, but would like to determine if this is a framework issue so it gets addressed.

I am using a custom CLI command (App\Commands) to import a large CSV into the database. The actual processing is very simple, and uses existing models (with internal commands only, no modification) to save to the database. After processing a reasonable ~200k lines the script crashes with "Allowed memory size ... exhausted". However, there should be no accruing memory storage as all the variables are reused on each iteration. I suspect something in the model is hanging onto a variable by reference such that it doesn't get cleared on each iteration.

Here's a generalized (but still accurate) version of the code:

PHP Code:
public function run(array $params)
{
    $myModel = new MyModel();
    $myModel2 = new MyModel2();
    
    $file = "data.csv";
    $row = 0;
    
    if (($handle = fopen($file, "r")) !== false):
    
        while (($data = fgetcsv($handle, 1000, ",")) !== false):
            $row++;
            
/* VALIDATE DATA */
            // validate line
            $num = count($data);
            if ($num != 4):
                CLI::write("Invalid number of columns on line {$row}: ". CLI::color(implode(",", $data), 'yellow'));
                return;
            endif;
            
            // get the timestamp
            if ($timestamp = strtotime($data[2])):
            else:
                CLI::write("Invalid timestamp on line {$row}: ". CLI::color($data[2], 'yellow'));
                return;
            endif;

/* PRIMARY ENTRY */            
            // build the row
            $row = [
                'source' => "import",
                'created_at' => date("Y-m-d H:i:s", $timestamp)
            ];
            
            // add to the database
            $myModel->save($row);
            $ref_id = $myModel->getInsertID();

/* DEPENDENT ENTRY */            
            // build the row
            $row2 = [
                'ref_id' => $ref_id,
                'content' => $data[1],
            ];
            $myModel2->save($row2);

        endwhile;
        
        fclose($handle);
    endif;
} 

***kilishan*** · 03-01-2019, 01:32 PM

By default in development mode, each query object is saved for displaying in the debug bar, etc.

Can you check app/Config/Toolbar.php? There should be a setting, $maxQueries in there that determines the maximum number of queries that are saved. If this is something that you've been using for a bit and upgrading through the releases, that setting might have gotten missed putting in, so you should add it:

Code:
/*

    |--------------------------------------------------------------------------

    | Max Queries

    |--------------------------------------------------------------------------

    | If the Database Collector is enabled, it will log every query that the

    | the system generates so they can be displayed on the toolbar's timeline

    | and in the query log. This can lead to memory issues in some instances

    | with hundreds of queries.

    |

    | $maxQueries defines the maximum amount of queries that will be stored.

    |

    */

    public $maxQueries = 100;

If the data from your CSV rows is rather large than you might just need to lower that value.

Support Development • CodeIgniter 4 Foundations • Practical CodeIgniter 3 • CodeIgniter Tutorials

***MGatner*** · 03-04-2019, 11:49 AM

This particular app started on alpha-5, and does have `$maxQueries = 100` so I don't believe that is it. I will play around with those settings and maybe disabling the toolbar to see if it makes a difference. Any docs available on the toolbar function & code?

***MGatner*** · 03-05-2019, 02:29 PM

Still no resolution on this one, but some progress! The scope (330k lines/db calls) should be well within normal operational limits. I've checked Collector\Database and made sure it's only holding 100 maxQueries.
Here's the clincher: running the same script in the same setting with CI_ENVIRONMENT set to "production" has way higher capacity (I tested up to 10x more). So I'm definitely suspicious of system/Debug/Toolbar/Collectors but so far have been unable to find the culprit.

**John_Betong** · 03-05-2019, 04:17 PM

Have you tried importing the CSV data using the "Load Data" command? I found it was amazingly fast:

https://dev.mysql.com/doc/refman/8.0/en/load-data.html

***kilishan*** · 03-05-2019, 04:42 PM

I’ve used that in the past too. Worked great on CDV with a giant data set. Loaded into a temp table, did normalization within the table and then copied to the final table.

Support Development • CodeIgniter 4 Foundations • Practical CodeIgniter 3 • CodeIgniter Tutorials

**John_Betong** · 03-05-2019, 04:49 PM

Further to my orevious post, here is a post of mine from another forum:

// quote start
This may be of interest...

I was looking at different options to reduce the six minute time time taken to download and convert two XLS spreadsheets into a common database table. I reduced the time down to less than a minute mostly by using the following instead of stepping through each CSV row then adding to the table of over 22,000 rows.

https://dev.mysql.com/doc/refman/5.7/en/load-data.html 3

It was not easy to setup because the table did not exactly match the CSV fields so I now create a new matching table, import the CSV data. After importing I then add additional columns then finally add a primary index. The latter two operations only take fractions of a second. Before adding the primary key it was necessary to delete duplicates which only took a fraction of a second.

Adding and verifying a couple of hundred updates (from another CSV file with a different structure) does not take long.

***MGatner*** · 03-06-2019, 10:17 AM

I'll check that out, thanks for the reference. Worth noting though that this is an issue with any script running a large number of calls, not just data imports. After importing all the data mentioned above I ran a second command to cross-reference and tally the data and yet again had to set my environment to production before running it.

jobyjosephme · 09-10-2024, 06:30 AM

can you makesure your log level in config/logger.php

this should be zero when running big script (wont be an issue for short scripts)