Speeding up WP All Import imports using Scalability Pro

A customer asked me for helping speeding up their imports. They are using WP All Import to import their products and keep the prices and other details updated, and while the imports were working fast for them at the start, the imports gradually slowed down to the point where they kept failing.

Scalability Pro is the plugin I built to help optimise things like this – it adds missing indexes to key WordPress tables and even rewrites SQL queries where required to help them use indexes on their tables.

This case study shows the before and after speed boost from using Scalability Pro and an additional free plugin to eliminate some image sizes.

Import speed is really affected by 3 key things:

  1. Checking if an update or insert should happen (or if it should skip completely). Also includes selectively choosing which data to update.
  2. Performing the update/insert
  3. Importing images

Establishing a baseline and gathering metrics

Before running the case study, I need to establish a baseline of speed to compare before and after. While running this baseline, I can also gather slow queries in the mysql slow query log. I’ve gone into detail about configuring the slow query log to capture slow queries before, so check that article out if you plan to try this yourself.

image.png
Add query log parameters to mysql to gather slow queries for your import jobs

With the slow query log in place, and still without Scalability Pro installed or activated, I started up one of the imports this customer had been complaining about. Their main issue is they want to keep rerunning a very large import that normally will not need to update products, but may do so – that means there’s a heavy focus on the first part of any import – checking if the product exists or if the data has changed.

They provided a large 225,000 product import job for me to optimise. I know from experience that the ‘Time Elapsed’ shown from WP All Import is regularly wrong – for example, 20 minutes had passed since I started the import below yet it claims only 10 minutes have passed.

Establishing a baseline for WP All Import using a stopwatch – ignore their Time Elapsed, it’s not accurate

So, we have a baseline of 1100 updates in 20 minutes. Let’s install Scalability Pro and improve this import speed.

Speeding up imports with Scalability Pro

I installed and activated Scalability Pro and then on the settings page I created the indexes it uses. I set one option in Scalability Pro to ‘Defer term counting to nightly’.

I ran the import again, again timing it for 20 minutes since I can’t trust what WP All Import says in its elapsed timer.

Scalability Pro attempting to speed up imports

With Scalability 4.52 installed, defer term counting active, the indexes created, we’re seeing 1300 records rather than 1100 records being processed in 20 minutes. That’s pretty poor so clearly something changed recently.

Examining the Query Log

It’s been a while since I optimised Scalability Pro specifically for imports, so now I’ll investigate the query log to see if there are further speed boosts which can be added.

The query log is a list of slow running queries that exceeded whatever time limit you set in your mysql configuration file. In our case, we set 0.1 as the limit to be considered a slow query, or 100 milliseconds.

What I’m looking for first is query patterns – that is, queries that are effectively the same type but just the data changes.

WP All Import has added a new table that I need to optimise

The first things I saw in the slow query log were that WP All Import has new reference tables that it queries. These tables can become large with large imports, so indexes are required in order to keep good speed.

image.png
The wp_pmxi_hash table has a primary key index only

All the slow queries involving wp_pmxi_hash are using a post_id and an import_id. I investigated both of these and found the post_id to be the most selective, i.e. that’s the one that should get the index so that when it uses the index is gets quickly to the correct rows rather than having to scan the table.

Overall, I found three new cases where indexes can speed up imports with WP All Import:

create index wpallimport_boost1 on wp_wc_product_meta_lookup (sku);
create index wpallimport_boost2 on wp_pmxi_posts(unique_key(10));
create index wpallimport_boost3 on wp_pmxi_hash(post_id);

All three query types were simple lookups, all roughly 0.15 seconds to perform which was reduced to insignificant (< 0.001s) with the indexes c reated.

With those indexes added, I cleared the slow.log file and re-ran the import to gather more info.

So – we’re up to 2600 rows processed in this time compared to 1100 before so we have just over a 2-fold speed boost on imports from Scalability Pro with these 3 new indexes added. Still – I consider this slow, so something is going wrong with the optimisations. I cleared the slow.log file and re-ran the import again and found this:

Slow query patterns from a long import job with WP All Import

There were technically 3 SQL patterns (different types of SQL queries) I saw from the fresh import. However, 2 of these only occurred at the beginning when WP All Import is setting up the import to run – adding indexes adds storage requirements and can slow down imports and updates so I try to minimise how many are added. In this case, I’m only going to need to optimise one query.

SELECT COUNT( DISTINCT ID ) FROM wp_posts p
LEFT JOIN (
SELECT object_id
FROM wp_term_relationships
WHERE term_taxonomy_id IN ( 23838 )
) AS exclude_join
ON exclude_join.object_id = p.ID
INNER JOIN (
SELECT object_id FROM wp_term_relationships
INNER JOIN wp_term_taxonomy using( term_taxonomy_id )
WHERE term_id IN ( 23846 )
) AS include_join ON include_join.object_id = p.ID
WHERE 1=1
AND p.post_status = 'publish'
AND p.post_type = 'product'
AND exclude_join.object_id IS NULL;

What I can see from the slow.log is that this query above is repeated for every row imported. I also know from my benchmarks that we’re getting just over 3 items imported per second. This query is taking about 0.23 seconds, so if there are 3 of them happening inside a second that’s 0.7 seconds of time just for this one repeated query.

That means, if we can optimise this query to run in 0.01 seconds or something like that then we’ll shave off 0.7 seconds from every second and theoretically we’ll see imports running 3 times faster (0.3 seconds for 3 rows instead of 1 second = 3x faster). Given that Scalability Pro already boosted speed by just over double, that would result in 6-fold speed boost for imports with WP All Import.

The offending slow query is checking for products that are visible

I can also tell from the query that this query is counting visible items in the ‘Uncategorized’ category, i.e. it is trying to count the terms. There is a WordPress function that normally controls this – wp_defer_term_counting – but it looks like this is not being used by a particular process. The exclude_join table alias certainly doesn’t look like normally WordPress aliases that are generated in WP_Query. So – I searched the codebase for the exclude_join statement:

WooCommerce is bypassing the WordPress term counting code and running their own term counts

Optimising the WooCommerce Import with WP All Import

Investigating the wc-term-functions.php file, searching for exclude_join, I found the _wc_term_recount function which is bypassing the default WordPress rules for term recounts. However, I did find a useful filter inside the function:

Filter inside WooCommerce that can be used to speed up imports

To explain: counting terms means counting all the products per category in order to update the stored term counts in the database. This operation is expensive – the 0.23 seconds we saw earlier – and given that we are running an import, it makes no sense at all to run this count on every row when the count is going to change in literally less than a seconds time.

So – to optimise this I needed to alter the code to only run the term recounts after imports are completed.

Altering Scalability Pro to avoid term recounts every row on import

I altered Scalability Pro in two key ways here – firstly, if the user has the option of ‘defer term counts to nightly’ enabled, then all WooCommerce import tools will be speeded up by this functionality – however, term counts will be inaccurate until 2am the next day.

Secondly – if you are using WP All Import (the only importing tool I recommend – it’s awesome), then regardless of whether you have set defer term counts it will switch off term recounting at the start of the import batch and run the term counts at the end of the import batch.

If you are using a different import tool, let me know in the comments below and maybe it’ll be possible for me to add some code specific to that importing plugin to speed it up too – like I said, all import tools will be speeded up if you enabled ‘Defer Term Counts’ but the term counts will be inaccurate until they are recounted at 2am. If your import plugin provides before-import and after-import actions then I can add specific code for those actions to speed up that import tool.

Speeding up Image Import

Image importing is one of the slowest parts of importing products using any importing tool, not just WP All Import. Importing images can be speeded up by reducing how many images are imported, avoiding importing them altogether, importing them to a different server or speeding up your disks.

WordPress stores images in multiple sizes, so over the years of installing plugins you may end up with dozens of image sizes. These all need to be created for each imported item which uses a lot of CPU and storage.

Control over these image sizes is coming soon to Scalability Pro – in particular, we’re adding a feature to restrict image sizes only when importing specific types of post types (e.g. only use WooCommerce image sizes when importing products). In the meantime, there are two plugins that can help – our own External Images plugin completely eliminates the need to import images at all, so is by far the fastest. This means there must be a third-party location in your import files where the images will reside, e.g. an affiliate network CDN or some other third party server where the image is accessible.

Often, shop owners will want to have images on their own server, perhaps for SEO reasons, so in that case the best approach is to reduce how many image sizes are created.

In the case of this customer, update speed is more important than insert speed since they’re focusing on rerunning a particular import job, but I still wanted them to have fast initial imports so I installed the Stop Generating Unnecessary Thumbnails plugin and deactivated thumbnails they didn’t need.

image.png
Disabling unused image sizes will have a massive impact on your import speed for new items

Summary

I re-ran the main 225,000 product import to measure import speed after all my changes. In my original benchmark, 1100 records were being processed inside 20 minutes. Now, with the new additions to Scalability Pro to eliminate unnecessary term counting from WooCommerce, 10,900 records are being processed inside 20 minutes – that’s an increase in speed of 9.9x! Remember – eliminating load like this doesn’t just speed up the imports – it’s also going to make your website far more usable, snappy and quick while imports are occuring.

I achieved a 9.9x import speed improvement with Scalability Pro optimisations

If you have small import files, you won’t notice this kind of 10-fold speed boost quite so much, but if you have even larger import files then the impact will be even more significant. I would guess if you had 1 million products in your import file you would see something in the region of a 100x or possibly more speed boost on your imports.

Out of interest, the entire 225,000 weekly product import now completes in just over 6 hours whereas previously it kept stalling and failing and had to be restarted every day and ultimately it wasn’t even completing inside the week.

I’ve added these 3 new indexes, the WP All Import code and the updated ‘Defer Term Count’ code to Scalability Pro 4.63. If you have a different import tool, you will also see the speed boost but your term counts will be inaccurate after import until 2am. If you are using a different import tool, let me know in the comments below and I’ll investigate their code and see if I can add similar functionality to boost their import speed and still keep term counts always accurate.

Talk to me