Welcome Guest, Not a member yet? Register   Sign In
Memory consumption bug
#1

Hi all- I've tested this as thoroughly as I can and continue to be baffled where the buildup is coming from. If anyone wants to recreate or make suggestions please do. Note: I have already worked around this, but would like to determine if this is a framework issue so it gets addressed.

I am using a custom CLI command (App\Commands) to import a large CSV into the database. The actual processing is very simple, and uses existing models (with internal commands only, no modification) to save to the database. After processing a reasonable ~200k lines the script crashes with "Allowed memory size ... exhausted". However, there should be no accruing memory storage as all the variables are reused on each iteration. I suspect something in the model is hanging onto a variable by reference such that it doesn't get cleared on each iteration.

Here's a generalized (but still accurate) version of the code:

PHP Code:
public function run(array $params)
{
    
$myModel = new MyModel();
    
$myModel2 = new MyModel2();
    
    
$file "data.csv";
    
$row 0;
    
    if ((
$handle fopen($file"r")) !== false):
    
        while ((
$data fgetcsv($handle1000",")) !== false):
            
$row++;
            
/* VALIDATE DATA */
            // validate line
            
$num count($data);
            if (
$num != 4):
                
CLI::write("Invalid number of columns on line {$row}: "CLI::color(implode(","$data), 'yellow'));
                return;
            endif;
            
            
// get the timestamp
            
if ($timestamp strtotime($data[2])):
            else:
                
CLI::write("Invalid timestamp on line {$row}: "CLI::color($data[2], 'yellow'));
                return;
            endif;

/* PRIMARY ENTRY */            
            // build the row
            
$row = [
                
'source' => "import",
                
'created_at' => date("Y-m-d H:i:s"$timestamp)
            ];
            
            
// add to the database
            
$myModel->save($row);
            
$ref_id $myModel->getInsertID();

/* DEPENDENT ENTRY */            
            // build the row
            
$row2 = [
                
'ref_id' => $ref_id,
                
'content' => $data[1],
            ];
            
$myModel2->save($row2);

        endwhile;
        
        
fclose($handle);
    endif;

Reply
#2

By default in development mode, each query object is saved for displaying in the debug bar, etc.

Can you check app/Config/Toolbar.php? There should be a setting, $maxQueries in there that determines the maximum number of queries that are saved. If this is something that you've been using for a bit and upgrading through the releases, that setting might have gotten missed putting in, so you should add it:

Code:
/*
    |--------------------------------------------------------------------------
    | Max Queries
    |--------------------------------------------------------------------------
    | If the Database Collector is enabled, it will log every query that the
    | the system generates so they can be displayed on the toolbar's timeline
    | and in the query log. This can lead to memory issues in some instances
    | with hundreds of queries.
    |
    | $maxQueries defines the maximum amount of queries that will be stored.
    |
    */
    public $maxQueries = 100;

If the data from your CSV rows is rather large than you might just need to lower that value.
Reply
#3

This particular app started on alpha-5, and does have `$maxQueries = 100` so I don't believe that is it. I will play around with those settings and maybe disabling the toolbar to see if it makes a difference. Any docs available on the toolbar function & code?
Reply
#4

Still no resolution on this one, but some progress! The scope (330k lines/db calls) should be well within normal operational limits. I've checked Collector\Database and made sure it's only holding 100 maxQueries.
Here's the clincher: running the same script in the same setting with CI_ENVIRONMENT set to "production" has way higher capacity (I tested up to 10x more). So I'm definitely suspicious of system/Debug/Toolbar/Collectors but so far have been unable to find the culprit.
Reply
#5

Have you tried importing the CSV data using the "Load Data" command? I found it was amazingly fast:

https://dev.mysql.com/doc/refman/8.0/en/load-data.html
Reply
#6

I’ve used that in the past too. Worked great on CDV with a giant data set. Loaded into a temp table, did normalization within the table and then copied to the final table.
Reply
#7

(This post was last modified: 03-05-2019, 04:52 PM by John_Betong.)

Further to my orevious post, here is a post of mine from another forum:


// quote start
This may be of interest...

I was looking at different options to reduce the six minute time time taken to download and convert two XLS spreadsheets into a common database table. I reduced the time down to less than a minute mostly by using the following instead of stepping through each CSV row then adding to the table of over 22,000 rows.

https://dev.mysql.com/doc/refman/5.7/en/load-data.html 3

It was not easy to setup because the table did not exactly match the CSV fields so I now create a new matching table, import the CSV data. After importing I then add additional columns then finally add a primary index. The latter two operations only take fractions of a second. Before adding the primary key it was necessary to delete duplicates which only took a fraction of a second.

Adding and verifying a couple of hundred updates (from another CSV file with a different structure) does not take long.
Reply
#8

I'll check that out, thanks for the reference. Worth noting though that this is an issue with any script running a large number of calls, not just data imports. After importing all the data mentioned above I ran a second command to cross-reference and tally the data and yet again had to set my environment to production before running it.
Reply




Theme © iAndrew 2016 - Forum software by © MyBB