Optimising Datafeedr Bulk Image Import

I’d covered this a while ago, back on the AWD site, but it’s worth covering again.

When Datafeedr Product Sets load products into your website, they don’t load images at that point in time. Instead, images are downloaded on-the-fly which means users will see slow performance if they are the first to view this particular set of products.

There is a tool in the Product Sets plugin to run a bulk image import, but if you’re at the size where you’re needing to run such a thing, you’ll be finding the bulk image import doesn’t run very quickly.

If you want to optimise it, find datafeedr-product-sets/functions/ajax.php and find the line that looks like this:

$id = $wpdb->get_var( "
 SELECT pm1.post_id AS post_id
 FROM $wpdb->postmeta AS pm1
 JOIN $wpdb->posts AS p
 ON p.ID = pm1.post_id
 LEFT JOIN $wpdb->postmeta AS pm2
 ON p.ID = pm2.post_id
 AND pm2.meta_key = '_thumbnail_id'
 LEFT JOIN $wpdb->postmeta AS pm3
 ON p.ID = pm3.post_id
 WHERE pm1.meta_key = '_dfrps_product_check_image'
 AND pm1.meta_value = '1'
 AND pm2.post_id IS NULL
 AND pm3.meta_key = '_dfrps_product_set_id'
 AND p.post_status = 'publish'
 ORDER BY post_id ASC
 " );

That’s the line that gets the next product id to grab an image for.

Alter that query to the following, very similar functionality:

$id = $wpdb->get_var( "
SELECT pm1.post_id AS post_id
FROM wp_postmeta AS pm1
WHERE pm1.meta_key = '_dfrps_product_check_image'
AND pm1.meta_value = '1'
and exists (select * from wp_posts where id = pm1.post_id and post_status = 'publish')
and exists (select * from wp_postmeta pm3 where meta_key = '_dfrps_product_set_id' and pm3.post_id = pm1.post_id)
and not exists (select * from wp_postmeta pm2 where meta_key = '_thumbnail_id' and pm2.post_id = pm1.post_id)
limit 1;
 " );

The difference in speed is fairly decent – on the example site I was using, there are 125,000 products and the original query takes about 1.7s whereas the replacement takes about 0.15s. That’s huge considering this is the overhead per-image to be imported.

The original query has a few failings – firstly, it queries the whole set, rather than LIMIT 1. I think it’s possible get_var might append LIMIT 1 on the users behalf but I’m not 100% certain of this. The 2nd thing is it sorts needlessly. Does it matter in which order the images are bulk imported? We just want them imported ASAP and with as low overhead as possible. Finally, there are LEFT JOIN’s used when they’re not needed – you should only join to a table if you intend to return columns from that table. If you’re only joining to a table in order to filter your result set, use EXISTS or NOT EXISTS instead – they are far faster.