Memory consumption bug |
Hi all- I've tested this as thoroughly as I can and continue to be baffled where the buildup is coming from. If anyone wants to recreate or make suggestions please do. Note: I have already worked around this, but would like to determine if this is a framework issue so it gets addressed.
I am using a custom CLI command (App\Commands) to import a large CSV into the database. The actual processing is very simple, and uses existing models (with internal commands only, no modification) to save to the database. After processing a reasonable ~200k lines the script crashes with "Allowed memory size ... exhausted". However, there should be no accruing memory storage as all the variables are reused on each iteration. I suspect something in the model is hanging onto a variable by reference such that it doesn't get cleared on each iteration. Here's a generalized (but still accurate) version of the code: PHP Code: public function run(array $params)
By default in development mode, each query object is saved for displaying in the debug bar, etc.
Can you check app/Config/Toolbar.php? There should be a setting, $maxQueries in there that determines the maximum number of queries that are saved. If this is something that you've been using for a bit and upgrading through the releases, that setting might have gotten missed putting in, so you should add it: Code: /* If the data from your CSV rows is rather large than you might just need to lower that value.
This particular app started on alpha-5, and does have `$maxQueries = 100` so I don't believe that is it. I will play around with those settings and maybe disabling the toolbar to see if it makes a difference. Any docs available on the toolbar function & code?
Still no resolution on this one, but some progress! The scope (330k lines/db calls) should be well within normal operational limits. I've checked Collector\Database and made sure it's only holding 100 maxQueries.
Here's the clincher: running the same script in the same setting with CI_ENVIRONMENT set to "production" has way higher capacity (I tested up to 10x more). So I'm definitely suspicious of system/Debug/Toolbar/Collectors but so far have been unable to find the culprit.
Have you tried importing the CSV data using the "Load Data" command? I found it was amazingly fast:
https://dev.mysql.com/doc/refman/8.0/en/load-data.html
I’ve used that in the past too. Worked great on CDV with a giant data set. Loaded into a temp table, did normalization within the table and then copied to the final table.
Further to my orevious post, here is a post of mine from another forum:
// quote start This may be of interest... I was looking at different options to reduce the six minute time time taken to download and convert two XLS spreadsheets into a common database table. I reduced the time down to less than a minute mostly by using the following instead of stepping through each CSV row then adding to the table of over 22,000 rows. https://dev.mysql.com/doc/refman/5.7/en/load-data.html 3 It was not easy to setup because the table did not exactly match the CSV fields so I now create a new matching table, import the CSV data. After importing I then add additional columns then finally add a primary index. The latter two operations only take fractions of a second. Before adding the primary key it was necessary to delete duplicates which only took a fraction of a second. Adding and verifying a couple of hundred updates (from another CSV file with a different structure) does not take long.
I'll check that out, thanks for the reference. Worth noting though that this is an issue with any script running a large number of calls, not just data imports. After importing all the data mentioned above I ran a second command to cross-reference and tally the data and yet again had to set my environment to production before running it.
can you makesure your log level in config/logger.php
this should be zero when running big script (wont be an issue for short scripts) |
Welcome Guest, Not a member yet? Register Sign In |