CSV Importing Advice |
Hi,
I'm looking for advice on the best way to import a CSV file. The file has the following fields: Code: FirstName, LastName, EmailAddress, Department, Role Everything is working except that it is slow. I haven't tested it with more than 10 users in the CSV file. I expect that there will be times where a CSV file will contain thousands of users. I would also like to provide better feedback as the import progresses for things like an email address already existing in the database (I'd love if it was like the WordPress importer that shows a message per user imported). Here is my import function: PHP Code: public function import() Here is the CSVReader class: PHP Code: class CSVReader { Any advice is much appreciated. I was thinking that maybe I'd be better using something like MySQL Local Load into a specific imports table and then working off that data.
I use parseCSV.
https://github.com/parsecsv/parsecsv-for-php Super easy, and I've used it for large files. In fact, I'm using this week on files with 60K rows, and there's no lag.
(04-25-2017, 11:25 AM)skunkbad Wrote: I use parseCSV. Thanks. I'll have a look at it.
The problem looks like you're doing several db interactions at each point during this loop. You're checking to see if the company exists, potentially inserting a new department and updating the user to show they belong to that department. That many db interactions is going to cause a slowdown.
The best thing you can do is to break that down into a few individual steps to minimize the number of times you hit the db. It might mean looping over the CSV and collecting different arrays of info, like users, departments, whatever. Then you could pull all departments in one query. You could collect the department/user correlations and perform a single insert_batch. Things like that.
I use PHPOffice's PHPExcel and now it is called PHPSpreadsheet
PHPExcel: https://github.com/PHPOffice/PHPExcel PHPSpreadsheet: https://github.com/PHPOffice/PhpSpreadsheet I can easily read and write CSV, Excel, OpenOffice Spreadsheets, etc.
God Bless CI Contributors
![]()
(04-25-2017, 12:46 PM)kilishan Wrote: The problem looks like you're doing several db interactions at each point during this loop. You're checking to see if the company exists, potentially inserting a new department and updating the user to show they belong to that department. That many db interactions is going to cause a slowdown. You're right. I always felt that doing a db interaction for each line wasn't the right choice. I will have to have a good think about how to improve it. Thanks.
(04-25-2017, 04:57 PM)marksman Wrote: I use PHPOffice's PHPExcel and now it is called PHPSpreadsheet I do plan on providing an excel file upload in the future but I felt a csv file was easier to start with.
I use 3 different methods to import csv into postgresql/mysql.
1, fgetcsv 2, https://github.com/goodby/csv 3, direct load into table (where i can create and load the full data from csv. For me it is important to use transactions as otherwise the individual insert or update operations would take ages. My CSV files i process are relatively large extracts from SAP (400K - 600K lines). - fgetcsv gives me the best performance - goodby-csv is slightly slower but only noticeable over 500k lines. - direct load into tables only works for source files that i trust 100% that it will not change its structure and there won't be any characters in there that could break the csv files (line breaks, etc..) (04-26-2017, 03:06 AM)keithmclaughlin Wrote:(04-25-2017, 04:57 PM)marksman Wrote: I use PHPOffice's PHPExcel and now it is called PHPSpreadsheet I use to work with CSV before (back to php4 days) manually implode / explode comas into text and set headers to export then boom! its CSV. I also think that way is easier but when I start to use PHPExcel I found it much easier! just like working with real spreadsheet but programatically implemented.
God Bless CI Contributors
![]() |
Welcome Guest, Not a member yet? Register Sign In |