
Recently however I had a problem that was bothering me, and I thought some sort of set functionality would do the job. (Please tell me if this functionality has already been written).
Here's the problem:
I have a table that I download periodically and I want to update it into my database. The problem is that I have custom ID's for each row of my current table that are indexed throughout the rest of the database. I've found that I need to update existing records, delete any records that are not in my downloaded table, and I need to add new records that don't exist in my current table.
Here's an example:
On a regular basis, you get a list of 5000 products that your online store promotes from your affiliate. Once a week you need to download all of the products and update your products table. A standard drop and re-insert will not work, because you use custom id's in your products table that are referenced in your customers_products table and others. How do you update these 5000 records?

In the diagram, Set A is my current table that I want to update. Set B is the new table. Everything that is in Set A and not in Set B, I want to delete (Yellow). Everything in Set B not in Set A, I want to insert (Dark Green). And Everything that exists in both I want to update (Light Green).
Breaking the problem into a sets will allow you to realize what you need to do. One way of looking at a possible solution is to find what you need. From the diagram, all you need is everything in green (light or dark). In our product example all you need is a set of new products while maintaining the light green Primary Keys. (the dark green will generate new id's).
Implementation:
function updateProducts() {
$products= $this->Product->query("select distinct(prodid) as `match_id` from import_products as `Product`");
// Make the products array look like cake arrays...
foreach($products as $i => $product) {
foreach($product as $model=>$this_data) {
unset($products[$i][$model]);
$products[$i]['Product'] = $this_data;
}
}
$current_products = $this->Product->find('all',array('recursive'=>-1));
$res = $this->overlay($current_products, $products, "{n}.product.match_id");
$this->Product->query("truncate table products");
$this->Product->saveAll($res);
}
// function overlay:
// This function returns the $to array with merged $from elements.
// $from - array[0..x] of cake based array structure
// $to - array[0..x] of cake based array structure
// Path field for the overlay using set::extract's strucure
// eg. $path = "{n}.TableName.fieldname"
// * the Path must exist in each element in the $to and $from array
function overlay($from, $to, $path) {
$from_ary = Set::extract($from, $path);
$to_ary = Set::extract($to, $path);
if (count($to_ary) != count($to)) return $to;
if (count($from_ary) != count($from)) return $to;
$flipped_from = array_flip($from_ary);
$cnt = 0;
$results = array();
foreach($to as $key => $val) {
if (!empty($flipped_from[$to_ary[$cnt]])) {
$fkey = $flipped_from[$to_ary[$cnt]];
$results[$cnt] = Set::merge($from[$fkey], $to[$cnt]);
} else {
$results[$cnt] = $to[$cnt];
}
$cnt++;
}
return $results;
}
What does all this do?
Put simply, I gather my SET B (the new updated products that I downloaded), and then my SET B (my current set of products). I then call my function overlay(). Overlay simply returns SET B with elements from SET A merged with it based on the Path field. What is nice about this concept is that it allows you to simply pass the results overlay() directly to a cake saveAll() function.
I would not suggest using this code for giant tables. I did it on a zip code table with 42000 records, and the saveAll took a few minutes to rebuild the table. This may not be the most efficient solution, but it's clean and useful for "once-in-a-while" updates on tables.