Welcome Guest, Not a member yet? Register   Sign In
CSV Importing Advice
#1

Hi,

I'm looking for advice on the best way to import a CSV file.

The file has the following fields:
Code:
FirstName, LastName, EmailAddress, Department, Role

Everything is working except that it is slow. I haven't tested it with more than 10 users in the CSV file. I expect that there will be times where a CSV file will contain thousands of users.

I would also like to provide better feedback as the import progresses for things like an email address already existing in the database (I'd love if it was like the WordPress importer that shows a message per user imported).

Here is my import function:

PHP Code:
public function import()
{
 
   $this->load->helper('form');

 
   if ( ! empty($_FILES) )
 
   {
 
       $config['upload_path'] = $this->company_lib->get_uploads_folder($this->user->company_id);
 
       $config['allowed_types'] = 'csv';
 
       $config['max_size'] = $this->config->item('max_file_size');
 
       $this->load->library('upload'$config);

 
       if ( ! $this->upload->do_upload('file') )
 
       {
 
           $this->flasher->set_danger($this->upload->display_errors(''''), 'setup/import'TRUE);
 
       }
 
       else
        
{
 
           $data $this->upload->data();

 
           $this->load->library('csvreader');
 
           $this->load->helper('string');

 
           if ($data['file_ext'] == '.csv'// will eventually provide excel file uploads too
 
           {
 
               $fields $this->csvreader->parse_file($data['full_path']);

 
               if ( ! empty($fields) )
 
               {
 
                   foreach ($fields as $field)
 
                   {
 
                       $email $field['EmailAddress'];
 
                       $first_name $field['FirstName'];
 
                       $last_name $field['LastName'];
 
                       $department_name $field['Department'];
 
                       $role $field['Role'] + 2// add 2 because csv contains 1 for department manager & 2 for employee (for simplicity)

 
                       // make sure they cannot add a user with a role of 1 (admin) or 2 (manager)
 
                       if ($role >= 3)
 
                       {
 
                           if ( ! $this->ion_auth->email_check($email) )
 
                           {
 
                               $salt $this->config->item('store_salt''ion_auth') ? $this->ion_auth->salt() : FALSE;
 
                               $password random_string('alnum'12);
 
                               $hashed_password $this->ion_auth->hash_password($password$salt);

 
                               // User must be created first, in order to have the id
 
                               $user_data = [
 
                                   'company_id' => $this->user->company_id,
 
                                   'password' => $hashed_password,
 
                                   'email' => $email,
 
                                   'active' => 1,
 
                                   'is_dep_manager' => $role == NULL
                                
];

 
                               $user_id $this->user_model->insert($user_data); // insert new user into users table

 
                               unset($user_data);

 
                               $this->ion_auth_model->add_to_group($role$user_id); // add user to users_groups table

 
                               // send welcome email to user
 
                               $email_data = ['email' => $email'password' => $password];
 
                               $this->email_lib->send_email($emaillang('setup_welcome_email_subject'), 'setup/emails/welcome'$email_data);

 
                               // once user is inserted, create their profile
 
                               $profile_data = [
 
                                   'user_id'           => $user_id,
 
                                   'first_name'        => $first_name,
 
                                   'last_name'         => $last_name,
 
                               ];
 
                               $this->profile_model->insert($profile_data);

 
                               $this->load->model('department_model');

 
                               // check if department for the company exists in the database
 
                               $department $this->department_model
                                                   
->fields('id')
 
                                                  ->where(['name' => $department_name'company_id' => $this->user->company_id])
 
                                                  ->get();

 
                               // if department does NOT exist, create one
 
                               if ( ! $department )
 
                               {
 
                                   $department_data = [];

 
                                   $department_data['name'] = $department_name;
 
                                   $department_data['company_id'] = $this->user->company_id;

 
                                   if ($role == 3// User is a department manager, so set them as the assigned user for the department
 
                                   {
 
                                       $department_data['assigned_user_id'] = $user_id;
 
                                   }

 
                                   $department_id $this->department_model->insert($department_data);

 
                                   // update the user with the new department id
 
                                   $this->user_model->update(['department_id' => $department_id], $user_id);
 
                               }
 
                               else
                                
{
 
                                   // a department already exists, so set the users department_id to the existing department
 
                                   $this->user_model->update(['department_id' => $department['id']], $user_id);
 
                               }

 
                               $this->flasher->set_success(lang('setup_import_successful'), NULLTRUE);
 
                           }
 
                           else
                            
{
 
                               $this->flasher->set_danger(sprintf(lang('setup_import_failed_email'), $email), NULLTRUE);
 
                           }
 
                       }
 
                       else
                        
{
 
                           $this->flasher->set_danger(lang('setup_import_failed_role'), NULLTRUE);
 
                       }
 
                   
 
               }
 
               else
                
{
 
                   $this->flasher->set_danger(lang('setup_import_failed_file_is_empty'), 'setup/import'TRUE);
 
               }

 
               unlink($data['full_path']); // delete the uploaded csv file

 
               $this->update_setup_step(6);

 
               redirect('setup/finish');
 
           }
 
       }
 
   }
 
   else
    
{
 
       $this->template->title(lang('setup_import_heading'))
 
                            ->set_js('bootstrap-filestyle.min')
 
                            ->build('setup/import'$this->data);
 
   }


Here is the CSVReader class:

PHP Code:
class CSVReader {

 
   var $fields/** columns names retrieved after parsing */ 
 
   var $separator ';'/** separator used to explode each line */
 
   var $enclosure '"'/** enclosure used to decorate each field */

 
   var $max_row_size 4096/** maximum row size to be used for decoding */

 
   function parse_file($p_Filepath)
 
   {
 
       $file           fopen($p_Filepath'r');
 
       $this->fields   fgetcsv($file$this->max_row_size$this->separator$this->enclosure);
 
       $keys_values    explode(','$this->fields[0]);

 
       $content    = array();
 
       $keys       $this->escape_string($keys_values);

 
       $i 1;
 
       while( ($row fgetcsv($file$this->max_row_size$this->separator$this->enclosure)) != false )
 
       {
 
           if $row != null // skip empty lines
 
           {
 
               $values explode(','$row[0]);
 
               if (count($keys) == count($values))
 
               {
 
                   $arr        = array();
 
                   $new_values = array();
 
                   $new_values $this->escape_string($values);
 
                   for ($j 0$j count($keys); $j++)
 
                   {
 
                       if ($keys[$j] != "")
 
                       {
 
                           $arr[$keys[$j]] = $new_values[$j];
 
                       }
 
                   }
 
                   $content[$i] = $arr;
 
                   $i++;
 
               }
 
           }
 
       }
 
       fclose($file);
 
       return $content;
 
   }

 
   function escape_string($data)
 
   {
 
       $result = array();
 
       foreach($data as $row)
 
       {
 
           $result[] = str_replace('"'''$row);
 
       }
 
       return $result;
 
     


Any advice is much appreciated.

I was thinking that maybe I'd be better using something like MySQL Local Load into a specific imports table and then working off that data.
Reply
#2

(This post was last modified: 04-25-2017, 11:26 AM by skunkbad.)

I use parseCSV.

https://github.com/parsecsv/parsecsv-for-php

Super easy, and I've used it for large files. In fact, I'm using this week on files with 60K rows, and there's no lag.
Reply
#3

(04-25-2017, 11:25 AM)skunkbad Wrote: I use parseCSV.

https://github.com/parsecsv/parsecsv-for-php

Super easy, and I've used it for large files. In fact, I'm using this week on files with 60K rows, and there's no lag.

Thanks. I'll have a look at it.
Reply
#4

The problem looks like you're doing several db interactions at each point during this loop. You're checking to see if the company exists, potentially inserting a new department and updating the user to show they belong to that department. That many db interactions is going to cause a slowdown.

The best thing you can do is to break that down into a few individual steps to minimize the number of times you hit the db. It might mean looping over the CSV and collecting different arrays of info, like users, departments, whatever. Then you could pull all departments in one query. You could collect the department/user correlations and perform a single insert_batch. Things like that.
Reply
#5

(This post was last modified: 04-25-2017, 04:57 PM by marksman.)

I use PHPOffice's PHPExcel and now it is called PHPSpreadsheet

PHPExcel: https://github.com/PHPOffice/PHPExcel
PHPSpreadsheet: https://github.com/PHPOffice/PhpSpreadsheet

I can easily read and write CSV, Excel, OpenOffice Spreadsheets, etc.
God Bless CI Contributors Smile
Reply
#6

(04-25-2017, 12:46 PM)kilishan Wrote: The problem looks like you're doing several db interactions at each point during this loop. You're checking to see if the company exists, potentially inserting a new department and updating the user to show they belong to that department. That many db interactions is going to cause a slowdown.

The best thing you can do is to break that down into a few individual steps to minimize the number of times you hit the db. It might mean looping over the CSV and collecting different arrays of info, like users, departments, whatever. Then you could pull all departments in one query. You could collect the department/user correlations and perform a single insert_batch. Things like that.

You're right. I always felt that doing a db interaction for each line wasn't the right choice. I will have to have a good think about how to improve it. Thanks.
Reply
#7

(04-25-2017, 04:57 PM)marksman Wrote: I use PHPOffice's PHPExcel and now it is called PHPSpreadsheet

PHPExcel: https://github.com/PHPOffice/PHPExcel
PHPSpreadsheet: https://github.com/PHPOffice/PhpSpreadsheet

I can easily read and write CSV, Excel, OpenOffice Spreadsheets, etc.

I do plan on providing an excel file upload in the future but I felt a csv file was easier to start with.
Reply
#8

I use 3 different methods to import csv into postgresql/mysql.

1, fgetcsv
2, https://github.com/goodby/csv
3, direct load into table (where i can create and load the full data from csv.

For me it is important to use transactions as otherwise the individual insert or update operations would take ages.
My CSV files i process are relatively large extracts from SAP (400K - 600K lines).

- fgetcsv gives me the best performance
- goodby-csv is slightly slower but only noticeable over 500k lines.
- direct load into tables only works for source files that i trust 100% that it will not change its structure and there won't be any characters in there that could break the csv files (line breaks, etc..)
Reply
#9

(This post was last modified: 04-26-2017, 02:20 PM by marksman.)

(04-26-2017, 03:06 AM)keithmclaughlin Wrote:
(04-25-2017, 04:57 PM)marksman Wrote: I use PHPOffice's PHPExcel and now it is called PHPSpreadsheet

PHPExcel: https://github.com/PHPOffice/PHPExcel
PHPSpreadsheet: https://github.com/PHPOffice/PhpSpreadsheet

I can easily read and write CSV, Excel, OpenOffice Spreadsheets, etc.

I do plan on providing an excel file upload in the future but I felt a csv file was easier to start with.

I use to work with CSV before (back to php4 days) manually implode / explode comas into text and set headers to export then boom! its CSV. I also think that way is easier but when I start to use PHPExcel I found it much easier! just like working with real spreadsheet but programatically implemented.
God Bless CI Contributors Smile
Reply




Theme © iAndrew 2016 - Forum software by © MyBB