Welcome Guest, Not a member yet? Register   Sign In
Insert_Batch(), ON DUPLICATE KEY UPDATE hack
#1

[eluser]a tired machine[/eluser]
I'm often running into the issue where I have a dataset that I need to sync into my db. For example, 50 results from a web services API call. With the current active record class it would seem that we have two options:
1. Mass delete followed by a batch insert, but then we lose any other field data we may have in the table (such as a 'created' timestamp).
2. Loop through each result, querying the database to see if the entry already exists, and then either inserting or updating accordingly, which in my example grows to be 100 database calls!

The MySQL "ON DUPLICATE KEY UPDATE" clause is an elegant solution. Yet, due to portability concerns the functionality of the active record class doesn't contain functions such as this one. So here's how I hacked it into the code.

In mysql_driver.php I added this function:
Code:
/**
  * Insert_on_duplicate_update_batch statement
  *
  * Generates a platform-specific insert string from the supplied data
  * MODIFIED to include ON DUPLICATE UPDATE
  *
  * @access public
  * @param string the table name
  * @param array the insert keys
  * @param array the insert values
  * @return string
  */
function _insert_on_duplicate_update_batch($table, $keys, $values)
{
  foreach($keys as $key)
   $update_fields[] = $key.'=VALUES('.$key.')';
  
  return "INSERT INTO ".$table." (".implode(', ', $keys).") VALUES ".implode(', ', $values)." ON DUPLICATE KEY UPDATE ".implode(', ', $update_fields);
}

You can see I've simply copied the _Insert_Batch() function and appended the ON DUPLICATE UPDATE claus onto the end of the MySQL query it generates.

Then I added this function to DB_active_rec.php, which again is simply a copy of the Insert_Bach() function modified to call the above modified function.
Code:
/**
  * Insert_On_Duplicate_Update_Batch
  *
  * Compiles batch insert strings and runs the queries
  * MODIFIED to do a MySQL 'ON DUPLICATE KEY UPDATE'
  *
  * @access public
  * @param string the table to retrieve the results from
  * @param array an associative array of insert values
  * @return object
  */
function insert_on_duplicate_update_batch($table = '', $set = NULL)
{
  if ( ! is_null($set))
  {
   $this->set_insert_batch($set);
  }

  if (count($this->ar_set) == 0)
  {
   if ($this->db_debug)
   {
    //No valid data array.  Folds in cases where keys and values did not match up
    return $this->display_error('db_must_use_set');
   }
   return FALSE;
  }

  if ($table == '')
  {
   if ( ! isset($this->ar_from[0]))
   {
    if ($this->db_debug)
    {
     return $this->display_error('db_must_set_table');
    }
    return FALSE;
   }

   $table = $this->ar_from[0];
  }

  // Batch this baby
  for ($i = 0, $total = count($this->ar_set); $i < $total; $i = $i + 100)
  {

   $sql = $this->_insert_on_duplicate_update_batch($this->_protect_identifiers($table, TRUE, NULL, FALSE), $this->ar_keys, array_slice($this->ar_set, $i, 100));

   // echo $sql;
  
   $this->query($sql);
  }

  $this->_reset_write();


  return TRUE;
}

Presto, syncing data becomes as easy as one line of code in your model:
Code:
$this->db->insert_on_duplicate_update_batch('table',$data);
This hack took me all of 2 minutes, copying and pasting existing functions and adding three lines of code. It was quick and dirty. Realizing that if I ever want to update my CodeIgniter core I'll have to re-copy these changes back in, I didn't want to do anything too involved.

It would be probably more elegant and provide more flexability if it could be called in this way:
Code:
$this->db
     ->on_duplicate_update($fields_to_update)
     ->insert_batch('table',$data)
$fields_to_update would be an optional way of defining a subset of fields to update (ie, sometimes you know certain fields won't change but others will).

I don't see the harm in including functions into the active record class that are not universally supported by every database. (CodeIgniter requires certain system configurations such mod_rewrite, GD, etc. for many of its features). Considering how immensely useful while simple to implement, I would love to see this feature included into the CodeIgniter core. It would of course be possible to code the (inefficient) equivalents for other database drivers. But in general it would be nice to have database-specific functions available.


Messages In This Thread
Insert_Batch(), ON DUPLICATE KEY UPDATE hack - by El Forum - 11-04-2011, 07:36 AM
Insert_Batch(), ON DUPLICATE KEY UPDATE hack - by El Forum - 02-23-2012, 10:06 AM
Insert_Batch(), ON DUPLICATE KEY UPDATE hack - by El Forum - 11-21-2012, 04:36 AM



Theme © iAndrew 2016 - Forum software by © MyBB