Welcome Guest, Not a member yet? Register   Sign In
Inserting large dataset from CSV file with CI Model
#1
Question 

I have a CSV file with 5mn rows. I am using CSV library from thephpleague.com. It uses generators so reads the file line by line. So reading does not cause any problem.

Yet, I want to save those to the DB using CI model because I need to make sure that data integrity will be okay. So I want to use model's validators.

Here is my code:

PHP Code:
namespace App\Util;

use 
App\Models\TestModel;
use 
League\Csv\Reader;

class 
CSV
{
    public function readData(string $filePath, array $headers = [])
    {
      
        $filePath 
ROOTPATH.'test.5mn.csv';

        $maxExecutionTime ini_get("max_execution_time");
        ini_set('max_execution_time'300);

        $reader Reader::createFromPath($filePath'r');
        $reader->setHeaderOffset(0); //set the CSV header offset
        $reader->skipEmptyRecords();

        if (true === empty($headers)) {
            $headers $reader->getHeader();
        }

        $records $reader->getRecords($headers);

        $isGoodToGo true;
    
        
foreach ($records as $record) {
            if ($this->isEmptyWithNullValues($record)) {
                continue;
            }

            if ($this->isEmptyWithBlankString($record)) {
                continue;
            }

            $model model(TestModel::class);
            $model->save([
                'item_type' => $record["Item Type"],
                'sales_channel' => $record["Sales Channel"],
                'total_profit' =>  $record["Total Profit"],
            ]);
            log_message('error''#'.$model->getInsertID() . ' saved');
            $model null;
        }

        ini_set('max_execution_time'$maxExecutionTime);

        return $isGoodToGo;
    }
    ....



On controller I am calling this:

PHP Code:
public function index()
{
    $test = new CSV();
    $test->readData('path', []);
    dump('EXIT');


But I gives php memory error after a certain number of records. In dev environment, it can save only 123456 records, after throws memory error. In production, it saves 253321 records then throws memory error.
There is an overhead in somewhere I presume, yet I am not an expert. In model, or in connection, or in anywhere, is there a method which I can `flush` the memory.. I am thinking CI keeping some data even in production environment. How to proceed?
Reply
#2

One of these links should help you get the job done.

MySQLi - 13.2.7 LOAD DATA Statement

Insert a large data from csv file into database

How to Batch Import Huge CSV Fast using PHP (Million Records in Seconds)
What did you Try? What did you Get? What did you Expect?

Joined CodeIgniter Community 2009.  ( Skype: insitfx )
Reply
#3

(01-06-2022, 02:59 AM)InsiteFX Wrote: One of these links should help you get the job done.

MySQLi - 13.2.7 LOAD DATA Statement

Insert a large data from csv file into database

How to Batch Import Huge CSV Fast using PHP (Million Records in Seconds)
Thank you, All of them are helpful indeed, but I want to process it through my model, for validations, beforeInsert callbacks etc. 

When I set model variable to null, garbage collector should work. So free the memory. If after every insert can free the overhead, then it's only about timeout. But something does not free the memory I think.
Reply
#4

As I read the code, you save line by line. What if you first create a huge array or query to insert this with one request?
Reply
#5

(01-06-2022, 03:49 AM)sprhld Wrote: As I read the code, you save line by line. What if you first create a huge array or query to insert this with one request?
Huge array not working either, but now trying to make it 10000 rows at one insert.. Yet still it will only work if I am lucky.. Main thing is to find what fills memory in CI side.
Reply
#6

Found this while searching for something else minutes ago Wink

https://codeigniter4.github.io/CodeIgnit...ts-of-data
Reply
#7

I have had to do this in the past. The way I handled it there was to use the LOAD DATA command to load the CSV into a new, temporary table. Then would loop over the temp table doing validation and normalization of the data before inserting it into the correct table(s).

And, yes, chunk() is a perfect use case for that once you have it in the temp table.
Reply




Theme © iAndrew 2016 - Forum software by © MyBB