CodeIgniter Forums

Full Version: Memory consumption bug
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi all- I've tested this as thoroughly as I can and continue to be baffled where the buildup is coming from. If anyone wants to recreate or make suggestions please do. Note: I have already worked around this, but would like to determine if this is a framework issue so it gets addressed.

I am using a custom CLI command (App\Commands) to import a large CSV into the database. The actual processing is very simple, and uses existing models (with internal commands only, no modification) to save to the database. After processing a reasonable ~200k lines the script crashes with "Allowed memory size ... exhausted". However, there should be no accruing memory storage as all the variables are reused on each iteration. I suspect something in the model is hanging onto a variable by reference such that it doesn't get cleared on each iteration.

Here's a generalized (but still accurate) version of the code:

PHP Code:
public function run(array $params)
$myModel = new MyModel();
$myModel2 = new MyModel2();
$file "data.csv";
$row 0;
    if ((
$handle fopen($file"r")) !== false):
        while ((
$data fgetcsv($handle1000",")) !== false):
            // validate line
$num count($data);
            if (
$num != 4):
CLI::write("Invalid number of columns on line {$row}: "CLI::color(implode(","$data), 'yellow'));
// get the timestamp
if ($timestamp strtotime($data[2])):
CLI::write("Invalid timestamp on line {$row}: "CLI::color($data[2], 'yellow'));

/* PRIMARY ENTRY */            
            // build the row
$row = [
'source' => "import",
'created_at' => date("Y-m-d H:i:s"$timestamp)
// add to the database
$ref_id $myModel->getInsertID();

/* DEPENDENT ENTRY */            
            // build the row
$row2 = [
'ref_id' => $ref_id,
'content' => $data[1],


By default in development mode, each query object is saved for displaying in the debug bar, etc.

Can you check app/Config/Toolbar.php? There should be a setting, $maxQueries in there that determines the maximum number of queries that are saved. If this is something that you've been using for a bit and upgrading through the releases, that setting might have gotten missed putting in, so you should add it:

    | Max Queries
    | If the Database Collector is enabled, it will log every query that the
    | the system generates so they can be displayed on the toolbar's timeline
    | and in the query log. This can lead to memory issues in some instances
    | with hundreds of queries.
    | $maxQueries defines the maximum amount of queries that will be stored.
    public $maxQueries = 100;

If the data from your CSV rows is rather large than you might just need to lower that value.
This particular app started on alpha-5, and does have `$maxQueries = 100` so I don't believe that is it. I will play around with those settings and maybe disabling the toolbar to see if it makes a difference. Any docs available on the toolbar function & code?
Still no resolution on this one, but some progress! The scope (330k lines/db calls) should be well within normal operational limits. I've checked Collector\Database and made sure it's only holding 100 maxQueries.
Here's the clincher: running the same script in the same setting with CI_ENVIRONMENT set to "production" has way higher capacity (I tested up to 10x more). So I'm definitely suspicious of system/Debug/Toolbar/Collectors but so far have been unable to find the culprit.
Have you tried importing the CSV data using the "Load Data" command? I found it was amazingly fast:
I’ve used that in the past too. Worked great on CDV with a giant data set. Loaded into a temp table, did normalization within the table and then copied to the final table.
Further to my orevious post, here is a post of mine from another forum:

// quote start
This may be of interest...

I was looking at different options to reduce the six minute time time taken to download and convert two XLS spreadsheets into a common database table. I reduced the time down to less than a minute mostly by using the following instead of stepping through each CSV row then adding to the table of over 22,000 rows. 3

It was not easy to setup because the table did not exactly match the CSV fields so I now create a new matching table, import the CSV data. After importing I then add additional columns then finally add a primary index. The latter two operations only take fractions of a second. Before adding the primary key it was necessary to delete duplicates which only took a fraction of a second.

Adding and verifying a couple of hundred updates (from another CSV file with a different structure) does not take long.
I'll check that out, thanks for the reference. Worth noting though that this is an issue with any script running a large number of calls, not just data imports. After importing all the data mentioned above I ran a second command to cross-reference and tally the data and yet again had to set my environment to production before running it.