Performance Optimisation for various XML Sitemap plugins

If you have a massive website and you’re using any sitemap plugin, you’ve probably found that generating the sitemap kills the performance of your website.

In this article, I compare and optimise 3 different XML sitemap plugins. The plugins optimised in this article include:

  • Yoast SEO (the XML sitemap functionality part)
  • Google XML Sitemaps
  • Better WordPress Google XML Sitemaps

I didn’t bother with the others, largely because they look like they don’t even attempt to be usable on larger websites.

What a difference an optimisation makes

These optimisations were performed for a client as part of our Performance Analysis service. This client has about 5 million posts in their wp_posts table. About 1.4 million products and about 4.5 million product variations.

As way of an example of the kind of boosts you can expect, or the kind of slowness you can expect, depending on how you look at it, the Yoast SEO plugin, in order to grab just 100 items for the sitemap (it builds it in pieces) takes 4 minutes and 13 seconds on this site.

4 minutes 13 seconds can be described in seconds as 253 seconds. Once my optimisations were implemented, this time dropped to 0.89 seconds which is a 284-fold speed boost.

You might notice above a missing statement – part 1 of my optimisation involves removing the != (not equals) check against the post_date.

The fact is, the query checks the post is published. If a post is published, it has a date. The only posts which have a date of 00-00-00 are drafts. Using a negative check like not equals prevents the full use of indexes. Indexes are required for large sites to avoid table scans. Table scans involve reading all data in the table or index and are the reason why 250 seconds becomes < 1 second for the same exact query. That’s a table scan versus an index seek.

Yoast SEO XML Sitemaps performance

The Yoast SEO plugin is probably the best SEO plugin available, but the XML sitemap functionality is severely lacking both in terms of performance and in terms of configuration options. Some examples of slowness are above which I’ve optimised below.

The first part of the optimisation involved finding this check for post_date != ’00-00-00 00:00:00′. I logged into the clients site and grepped inside the Yoast plugin to find that the file I needed to fix was wp-content/plugins/wordpress-seo-premium/inc/sitemaps/class-post-type-sitemap-provider.php

I was really hoping that Yoast might have added a filter for the where clause, but even though there is an inkling of hope at the top of the get_posts function inside that same file, it doesn’t let you filter out this post_date clause.

That leaves no option other than to either delete line 580 manually to remove this line:

  AND {$wpdb->posts}.post_date != '0000-00-00 00:00:00'  

Or, alternatively, a filter can be written on WPDB->query which I’ve done and added directly into our Scalability Pro plugin. If you understand SQL indexes, you will probably be able to figure out a suitable index to support the resulting query yourself, but here’s the one I added:

create index wpi_scalability_pro_sitemaps on wp_posts (post_status, post_password, post_type, post_modified)

The index above is also added to the latest version of Scalability Pro. Now that I’ve optimised the SQL to fetch the items for use in the sitemap, I can switch the sitemap functionality back on.

Where previously the sitemap was stuck, working through 100 items every 5 minutes and killing performance and using up all the RAM, now it’s powering through the sitemap generation:

You can see in the image above that the OFFSET is 909600, then when I ran the SHOW FULL PROCESSLIST command a minute later, I see the OFFSET has increased to 1287000.

Yoast XML sitemaps functionality is fundamentally difficult to optimise

Even after optimising the previous queries, we still have 3 types of slow queries, none of which are trivial to optimise.

Summary for Yoast

I’ve managed to make Yoast faster through the use of a single index and a query filter (both added to my Scalability Pro plugin), but I still wouldn’t recommend the Yoast plugin for your XML sitemaps simply because the other queries are impossible to optimise without fundamentally altering how the plugin code works and they provide no customisation or configuration options for your sitemaps.

Note: I *do* recommend Yoast SEO for the other SEO functionality but not for their sitemap functionality, although with my index and query rewrite it DOES become fast enough to be usable on medium-sized websites (but not large > 100,000 item websites).

Google XML Sitemaps plugin performance

The Google XML Sitemaps plugin is worse performance-wise than Yoast.

But these queries are actually easier to optimise than the Yoast ones since they’re not using != or other index-breaking statements. No alterations to code are required, so I don’t need to hunt for hooks and filters here, I can just add some indexes. This index should help BOTH the slow queries in the Google XML Sitemaps plugin:

create index wpi_google_xml_sitemaps1 on wp_posts (post_status, post_type, post_modified_gmt desc)

The first query drops from 5 minutes 44 seconds (344 seconds) to 5 seconds which is a 68-fold speed boost.

The 2nd query becomes so fast that the query speed can’t be measured with only 2 decimal places. So it gets reported as 0.00 seconds. If we call it 0.01 seconds, this is a 19500-fold speed-increase. That’s no joke. Indexes, and hitting those indexes, is a really important thing if you want to scale.

That leaves a final 3rd query for the Google XML Sitemaps plugin that is slow.

Unfortunately, since that query is counting posts per month, it needs to read the entire table. There’s nothing I can do to optimise this without thoroughly re-writing the code. On top of this, once a user (or googlebot) clicks on the link for December 2017 above, this plugin will try to display 1.1 million URLs on a single sitemap page. That’s craziness.

As a result, I cannot really recommend the Google XML Sitemaps plugin. Many of my clients will have 100s of thousands, if not millions of posts inside one month and this plugin will simply crash – even if you increase your timeout, it will run out of RAM.

Summary for Google XML Sitemaps

Whilst some of the underlying queries are easier to optimise than those in Yoast, the fact that they do not provide pagination for sitemaps is a killer. I cannot recommend this plugin at all for large sites because of this. The idea of loading 1 million+ URLs on a single sitemap is stupid. Clearly the authors did not consider large websites at all when they wrote this code.

Better WordPress Google XML Sitemaps performance

The Better WordPress Google XML Sitemaps plugin has not been updated in 3 years, but sitemap functionality has not changed at all in that time, so this is not necessarily a red flag.

The promising aspect is that this plugin promises to work on sites of unlimited size, so at least the plugin developers have considered those of us with large websites. It seems the main approach they have is to break down the sitemap into a maximum number of URLs per sitemap.

Firstly, when I install & activate this plugin, then view the sitemap, it’s the ONLY plugin that actually manages to generate the sitemap without any optimisations by myself. That’s very promising. Still – the sitemap index took 205 seconds, or 3 minutes 20 seconds.

The way the Better XML Sitemaps plugin works is it has an index page which links to all the other pages in the sitemap. It needs to know the count of items per post_type because it will pagination these sub-sitemaps. Here’s the slow query to generate this index page:

Since this query is counting everything in wp_posts, you might think adding an index will do nothing to help. However, if we create a covering index, we can make mysql read from the index rather than from the full table. Since this index will be smaller, it will involve far fewer disk reads and should also wipe less cache from memory. Here’s the ideal covering index for this query (note ID is the primary key, so is automatically included in every index, thus it’s not needed in the index):

create index wpi_better_wordpress_google_xml_sitemaps on wp_posts(post_status, post_password, post_type); 

After adding this index, the same query completes in 8.6 seconds rather than 200 seconds, so that’s a decent 25-fold speed increase just from this tiny covering index. That makes it possible to load this index page in a decent amount of time (given the size of this site) without having to cache the index page. It also means that the disk is in use for only 8 seconds instead of 200 seconds which means other users won’t notice the impact of your sitemap generation plugin.

The rest of the pages in the Better WordPress Google XML Sitemaps use these simple type of SQL queries. These are trivial to optimise:

SELECT p.*
FROM wp_posts p
WHERE p.post_status = 'publish'
AND p.post_password = ''
AND p.post_type = 'product'
ORDER BY p.post_modified DESC LIMIT 0,1000

There are no IN statements, no OR statements and no != statements. So the above can be made to be blindingly fast, regardless of how large your website ever becomes. That’s no understatement. This type of query, with a supporting index, will be FAST even if you have TRILLIONS of rows in your wp_posts table. There is no supporting index included in the plugin, but the index we need is very, very close to the one we already added – we just need one extra column (post_modified) to be added.

Here is the raw performance without a supporting index:

Instead of creating an additional index, I’ll replace the index I previously created:

drop index wpi_better_wordpress_google_xml_sitemaps on wp_posts;
create index wpi_better_wordpress_google_xml_sitemaps on wp_posts(post_status, post_password, post_type, post_modified);

Because this index will be larger, I’d expect the index query to be slightly slower, so I tested that first – it came back in 8.7 seconds, so pretty much identical. This is to be expected since a datetime field only adds 8 bytes per row in the index. Here is the massive performance boost for the actual sitemap pages:

0.69 seconds compared to the previous speed of 2 minutes 54 seconds is a 254-fold speed increase. The speed boost you see on your own site will depend on how large your wp_posts table is. If your wp_posts table is larger than this reference site then you will see a bigger speed boost whereas if you have fewer rows in wp_posts, you’ll see less impact.

Still – the use of the index means that regardless of how large your site becomes, it will not slow down due to your sitemap generation.

There is, however, one final issue – even with the Better WP Google XML Sitemaps plugin. For some dumb reason, the MySQL query optimizer cannot figure out that it does not need to read all of the data when it’s grabbing rows 600,000 -> 650,000. This causes queries for pages higher than page 1 to become slower and slower. This is fixable by rewriting the query. Instead of:

SELECT p.*
FROM wp_posts p
WHERE p.post_status = 'publish'
AND p.post_password = ''
AND p.post_type = 'product'
ORDER BY p.post_modified DESC LIMIT 600000,50000

We can rewrite the above as the following, which fools the stupid MySQL query optimiser into using the index to get the 50,000 items and only reading all the data at the end.

select p.*
from
(
SELECT p.id
FROM wp_posts p
WHERE p.post_status = 'publish'
AND p.post_password = ''
AND p.post_type = 'product'
ORDER BY p.post_modified DESC LIMIT 600000,50000
) smallset
join wp_posts p
on smallset.id = p.id

The performance difference, as always, is staggering. When grabbing 5,000 rows with an offset of 600,000, the speed changes from 312 seconds to 4.97 seconds which is a 62-fold speed boost.

To fix this and get this better performance, I need to filter the query using the filter available inside wpdb->query. Query is called for get_results which is what BWP Google XML Sitemaps is using.

The WPDB query function includes a filter so we can modify this sub-optimal query

I’ve coded up this optimisation as a filter in the updated version of our Scalability Pro plugin. Scalability Pro includes many other optimisations too to help my customers and clients who have decent sized websites (> 5000 posts/products, often hundreds of thousands and even millions).

If you’d like to code this up yourself, you can simply modify the BWP Google XML sitemaps plugin and change the SQL code it generates. The function that generates the SQL is called get_results and is inside the class-bwp-gxs-module.php file.

With these optimisations in place, no matter which sitemap page I load, it loads in about 10 seconds. That’s not superfast, but remember it is grabbing 5,000 items, so there’s a network latency effect. Also, the other plugins fail to load sitemaps AT ALL on this large website, even when optimised as far as possible.

All sitemap pages now generate in under 10 seconds, including time to download
This is page 192, which is still as fast as page 52.


Additional useful options in the Better WP Google XML Sitemaps plugin

In addition to being easy to optimise, this sitemaps plugin also has some great performance-related features for large site owners.

  • You can cache the XML sitemaps for whatever duration you like. That means they can be generated once then re-used by any search bots without hurting your server or other traffic using your server
  • You can prime the cache – that means that the sitemaps will always be quick for the robots and robots won’t abandon indexing your site just because of slow XML sitemaps
  • You can alter the size of the index pages. If doing this, you should also change the Advanced Options to allow the same number of rows to be returned as pages in each sitemap index.

Summary

In conclusion, do not use the Google XML Sitemaps plugin because its queries are not optimisable, and their code loads ALL posts from a specific month. That’s sloppy coding because in one month you might perform an import of millions of products and then your sitemaps are broken.

You *can* use the Yoast SEO plugin, but only with the addition of the index specified above and a code alteration (or use our Scalability Pro plugin which includes these edits), and only then on medium-sized websites.

Overall, for performance and functionality, the Better WP Google XML Sitemaps plugin wins our performance and scalability award. There’s still an index required to make it fast, as well as a code alteration – both of these are also included in the latest version of our Scalability Pro plugin.

It may be old, it might not have been updated in 3 years, but it’s the only sitemap plugin that can be made fast enough for large and huge websites.

Don’t hammer your server just to generate your XML sitemaps. Regardless of the size of your WordPress website, it’s possible and fairly easy to keep everything running fast and lightweight.

If you’re interested in having these optimisations yourself, you can create the indexes and alter the code as described above, or you’ll find them conveniently inside our Scalability Pro plugin.

Dave Hilditch

I'm the owner of WP Intense. I code plugins to help with performance and automation and I write stack-building guides.

Chat to me directly through our on-site chat bubble.