• 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Insert_Batch(), ON DUPLICATE KEY UPDATE hack

#1
[eluser]a tired machine[/eluser]
I'm often running into the issue where I have a dataset that I need to sync into my db. For example, 50 results from a web services API call. With the current active record class it would seem that we have two options:
1. Mass delete followed by a batch insert, but then we lose any other field data we may have in the table (such as a 'created' timestamp).
2. Loop through each result, querying the database to see if the entry already exists, and then either inserting or updating accordingly, which in my example grows to be 100 database calls!

The MySQL "ON DUPLICATE KEY UPDATE" clause is an elegant solution. Yet, due to portability concerns the functionality of the active record class doesn't contain functions such as this one. So here's how I hacked it into the code.

In mysql_driver.php I added this function:
Code:
/**
  * Insert_on_duplicate_update_batch statement
  *
  * Generates a platform-specific insert string from the supplied data
  * MODIFIED to include ON DUPLICATE UPDATE
  *
  * @access public
  * @param string the table name
  * @param array the insert keys
  * @param array the insert values
  * @return string
  */
function _insert_on_duplicate_update_batch($table, $keys, $values)
{
  foreach($keys as $key)
   $update_fields[] = $key.'=VALUES('.$key.')';
  
  return "INSERT INTO ".$table." (".implode(', ', $keys).") VALUES ".implode(', ', $values)." ON DUPLICATE KEY UPDATE ".implode(', ', $update_fields);
}

You can see I've simply copied the _Insert_Batch() function and appended the ON DUPLICATE UPDATE claus onto the end of the MySQL query it generates.

Then I added this function to DB_active_rec.php, which again is simply a copy of the Insert_Bach() function modified to call the above modified function.
Code:
/**
  * Insert_On_Duplicate_Update_Batch
  *
  * Compiles batch insert strings and runs the queries
  * MODIFIED to do a MySQL 'ON DUPLICATE KEY UPDATE'
  *
  * @access public
  * @param string the table to retrieve the results from
  * @param array an associative array of insert values
  * @return object
  */
function insert_on_duplicate_update_batch($table = '', $set = NULL)
{
  if ( ! is_null($set))
  {
   $this->set_insert_batch($set);
  }

  if (count($this->ar_set) == 0)
  {
   if ($this->db_debug)
   {
    //No valid data array.  Folds in cases where keys and values did not match up
    return $this->display_error('db_must_use_set');
   }
   return FALSE;
  }

  if ($table == '')
  {
   if ( ! isset($this->ar_from[0]))
   {
    if ($this->db_debug)
    {
     return $this->display_error('db_must_set_table');
    }
    return FALSE;
   }

   $table = $this->ar_from[0];
  }

  // Batch this baby
  for ($i = 0, $total = count($this->ar_set); $i < $total; $i = $i + 100)
  {

   $sql = $this->_insert_on_duplicate_update_batch($this->_protect_identifiers($table, TRUE, NULL, FALSE), $this->ar_keys, array_slice($this->ar_set, $i, 100));

   // echo $sql;
  
   $this->query($sql);
  }

  $this->_reset_write();


  return TRUE;
}

Presto, syncing data becomes as easy as one line of code in your model:
Code:
$this->db->insert_on_duplicate_update_batch('table',$data);
This hack took me all of 2 minutes, copying and pasting existing functions and adding three lines of code. It was quick and dirty. Realizing that if I ever want to update my CodeIgniter core I'll have to re-copy these changes back in, I didn't want to do anything too involved.

It would be probably more elegant and provide more flexability if it could be called in this way:
Code:
$this->db
     ->on_duplicate_update($fields_to_update)
     ->insert_batch('table',$data)
$fields_to_update would be an optional way of defining a subset of fields to update (ie, sometimes you know certain fields won't change but others will).

I don't see the harm in including functions into the active record class that are not universally supported by every database. (CodeIgniter requires certain system configurations such mod_rewrite, GD, etc. for many of its features). Considering how immensely useful while simple to implement, I would love to see this feature included into the CodeIgniter core. It would of course be possible to code the (inefficient) equivalents for other database drivers. But in general it would be nice to have database-specific functions available.

#2
[eluser]Unknown[/eluser]
My blog post should do this for you,

http://nitmedia.tumblr.com/post/18132504...b-function

#3
[eluser]Unknown[/eluser]
Awesome, thank you for your solution.

But instead of hacking your way at the system code, extend it. It's the proper way to do it.
Tested on CodeIgniter 2.1.3:

1. Create MY_Loader.php in application/core/
Code:
&lt;?php if ( ! defined('BASEPATH')) exit('No direct script access allowed');

class MY_Loader extends CI_Loader {

public function database($params = '', $return = FALSE, $active_record = NULL)
{
  // Grab the super object
  $CI =& get_instance();

  // Do we even need to load the database class?
  if (class_exists('CI_DB') AND $return == FALSE AND $active_record == NULL AND isset($CI->db) AND is_object($CI->db))
  {
   return FALSE;
  }

  require_once(BASEPATH.'database/DB.php');

  $db = DB($params, $active_record);

  // Load extended DB driver
  $custom_db_driver = config_item('subclass_prefix').'DB_'.$db->dbdriver.'_driver';
  $custom_db_driver_file = APPPATH.'core/'.$custom_db_driver.'.php';

  if (file_exists($custom_db_driver_file))
  {
   require_once($custom_db_driver_file);

   $db = new $custom_db_driver(get_object_vars($db));
  }

  // Return DB instance
  if ($return === TRUE)
  {
   return $db;
  }

  // Initialize the db variable. Needed to prevent reference errors with some configurations
  $CI->db = '';
  $CI->db =& $db;
}
}

/* End of file MY_Loader.php */
/* Location: ./application/core/MY_Loader.php */


2. Create MY_DB_mysql_driver.php in application/core/ (or mysqli, or whatever you use)
Code:
&lt;?php
class MY_DB_mysql_driver extends CI_DB_mysql_driver {

   final public function __construct($params)
{
  parent::__construct($params);
}

  /**
   * Insert_On_Duplicate_Update_Batch
   *
   * Compiles batch insert strings and runs the queries
   * MODIFIED to do a MySQL 'ON DUPLICATE KEY UPDATE'
   *
   * @access public
   * @param string the table to retrieve the results from
   * @param array an associative array of insert values
   * @return object
   */
  function insert_on_duplicate_update_batch($table = '', $set = NULL)
  {
    if ( ! is_null($set))
    {
     $this->set_insert_batch($set);
    }

    if (count($this->ar_set) == 0)
    {
     if ($this->db_debug)
     {
      //No valid data array.  Folds in cases where keys and values did not match up
      return $this->display_error('db_must_use_set');
     }
     return FALSE;
    }

    if ($table == '')
    {
     if ( ! isset($this->ar_from[0]))
     {
      if ($this->db_debug)
      {
       return $this->display_error('db_must_set_table');
      }
      return FALSE;
     }

     $table = $this->ar_from[0];
    }

    // Batch this baby
    for ($i = 0, $total = count($this->ar_set); $i < $total; $i = $i + 100)
    {

     $sql = $this->_insert_on_duplicate_update_batch($this->_protect_identifiers($table, TRUE, NULL, FALSE), $this->ar_keys, array_slice($this->ar_set, $i, 100));

     // echo $sql;

     $this->query($sql);
    }

    $this->_reset_write();


    return TRUE;
  }

  /**
   * Insert_on_duplicate_update_batch statement
   *
   * Generates a platform-specific insert string from the supplied data
   * MODIFIED to include ON DUPLICATE UPDATE
   *
   * @access public
   * @param string the table name
   * @param array the insert keys
   * @param array the insert values
   * @return string
   */
  private function _insert_on_duplicate_update_batch($table, $keys, $values)
  {
    foreach($keys as $key)
     $update_fields[] = $key.'=VALUES('.$key.')';

    return "INSERT INTO ".$table." (".implode(', ', $keys).") VALUES ".implode(', ', $values)." ON DUPLICATE KEY UPDATE ".implode(', ', $update_fields);
  }

}
?&gt;

That way you can also update the system folder when needed. Hope it helps someone.


Digg   Delicious   Reddit   Facebook   Twitter   StumbleUpon  


  Theme © 2014 iAndrew  
Powered By MyBB, © 2002-2021 MyBB Group.