Skip to content

11 Comments

  1. Luigi Nica
    February 4, 2019 @ 3:27 pm

    Hello Dave, thanks for the insights. Would be a good idea for you to look into the Seopress (seopress.org)?. Thanks, Luigi

    Reply

    • Dave H.
      February 4, 2019 @ 7:11 pm

      Thanks for the suggestion. I didn’t check it, because they don’t have anywhere close to the options in BWP and because of this I presume they don’t paginate their results, or optimise their queries. I might be wrong however. Are you using it on large sites?

      Reply

  2. Marius
    February 5, 2019 @ 11:15 am

    Great article, Dave.

    I feel like one of the biggest reasons to use the same website for SEO and XML Sitemaps is that they will be in sync when it comes to noindexed URLs.

    When using different plugins, noindexing URLs (or entire post types, taxonomies, etc.) requires changing the settings twice: once in the SEO plugin, the other in the XML sitemaps plugin. It’s easy to make mistakes and forget noindexed URLs in the sitemaps or having URLs which should be indexable that are missing from the sitemap.

    Reply

    • Dave H.
      February 5, 2019 @ 11:30 am

      I would recommend you leave your XML sitemaps to include all your URLs, and use your SEO plugin to control noindex.

      Imagine you have a page which is currently in your sitemap and is currently set to indexable. Image you change it to noindex. Because it is in your sitemap, googlebot will find it more quickly and will spot the meta noindex tag and will thus quickly remove it from their index.

      Reply

      • Marius
        February 5, 2019 @ 4:57 pm

        I’m not sure how that would impact the crawl budget. I don’t think search engines are too interested to crawl URLs which are not meant to be indexed in the first place. Often times, these URLs are not linked from anywhere else besides the sitemap.

        Many plugins (such as page builders) create post types for their templates which are not actually public. XML Sitemaps plugins will usually pick them up and list them in the sitemap.

        In the end, there is a solution to each problem. I prefer using the same plugin for convenience purposes.

        Reply

        • Dave H.
          February 5, 2019 @ 9:13 pm

          Sure – when you set up your sitemap at first, you should ensure that you don’t include anything in it that you don’t want indexed, simply because it will use up your crawl budget. But – if there IS an item in your sitemap index which is subsequently set in your normal SEO plugin to noindex, then having this in the sitemap will help the bots index the fact that this page is now no-indexed immediately and can help you remove the item very quickly from the google/search engine index. If you don’t have the item in your sitemap, how can google then read the fact that this page is now set to noindex? Especially if it’s not linked elsewhere on your site anymore?

          So, what I’m saying, is you should be able to set up your sitemaps nicely once, and maybe revisit if you add further custom post types, but other than that, you can have your seo plugin manage the noindex, and leave these noindex items in the sitemap. They have a ‘lastmod’ property on them so when they get set to noindex, they will notify bots, they’ll see the changed date, they’ll read noindex, they’ll remove the item from the cache, they won’t index it again until the lastmod property changes again.

          Reply

  3. Rarst
    February 5, 2019 @ 12:17 pm

    The query you claim to be slow for Yoast SEO (SELECT wp_posts.ID…) is not the actual SQL the plugin executes.

    It’s a part of larger query, optimized with join to work around natural MySQL limitations with large offsets. I am not sure why would you cut it out and run separately, that’s not what plugin does.

    I had personally worked on refactoring Yoast SEO sitemaps module in the past and closely monitored feedback from people with 100Ks of posts scale sites (it’s in the open on their issue tracker). It _should_ be able handle it fine and, importantly, with native WP database indexes.

    If you experience performance issues with it I would recommend to report it on tracker.

    Reply

    • Dave H.
      February 5, 2019 @ 1:23 pm

      “It _should_ be able to handle it fine” – Yes, it should, but it doesn’t, not on large sites.

      You are correct, the part I optimised is the sub-query of a larger query. Here is the larger query from directly inside their code:

      $sql = "
          SELECT l.ID, post_title, post_content, post_name, post_parent, post_author, post_modified_gmt, post_date, post_date_gmt
          FROM (
              SELECT {$wpdb->posts}.ID
              FROM {$wpdb->posts}
              {$join_filter}
              {$where}
                  {$where_filter}
              ORDER BY {$wpdb->posts}.post_modified ASC LIMIT %d OFFSET %d
          )
          o JOIN {$wpdb->posts} l ON l.ID = o.ID
      ";
      

      You can also see this full query including the sub-query with late row lookups in the screenshots I make after optimisation.

      From the above code, you can see they have implemented an old optimisation which I also implement for the BWP plugin as part of my optimisations in this article – namely, late row lookup. Here are two reference URLs – the first, is specifically from someone in Germany to get Yoast to optimise their previous query (when they just did select p.* … order by limit…):

      https://wordpress.org/support/topic/plugin-wordpress-seo-by-yoast-performance-suggestion/

      And here’s a more in-depth article explaining why this optimisation helps:

      https://explainextended.com/2009/10/23/mysql-order-by-limit-performance-late-row-lookups/

      BUT – since Yoast originally wrote that optimisation, someone else on their team has modified their code to include this redundant check:

      post_date != '00-00-00 00:00:00'

      Now – if the index was only required for seeking against post_type, post_password, post_status and post_date then this would be the end of the matter. But there is an ORDER BY statement inside the sub-query on post_modified. This post_date != part of the query, even if we have an index on post_type, post_password, post_status, post_date, post_modified prevents use of the index PAST the post_date. Because we did a != instead of =, we cannot guarantee that the remainder of the items in the index are in order. So the query optimiser is forced to perform a sort on the entire remainder of the index. In this case, that’s 1.4 million rows.

      By removing the post_date != ’00-00-00 00:00:00′ I allow this index to be used and I eliminate the sorting operation. That’s where this speed boost comes from.

      As for why I cut out the sub-query to analyse it separately, I did that because this is the part of the overall query that was slow. This is the part that needed optimisation. Like I said, you can see the full query in the screenshots with the optimised part in the subquery, and the outer query of select p.* from (select p.id …) x join wp_posts p on x.id = p.id …

      If you doubt that this index-breaking post_date clause is part of Yoast, you can check it here:

      https://github.com/Yoast/wordpress-seo/blob/a1628d1570a8e8cca896312509ac986182836c30/inc/sitemaps/class-post-type-sitemap-provider.php

      In the above, you’ll find this line:

      AND {$wpdb->posts}.post_date != '0000-00-00 00:00:00'

      As for WHY this code got altered, I cannot find the answer. Here’s the blame link:

      https://github.com/Yoast/wordpress-seo/blame/a1628d1570a8e8cca896312509ac986182836c30/inc/sitemaps/class-post-type-sitemap-provider.php

      In there, there’s one red herring, which is really just someone mostly refactoring, but is not when this != was added:

      https://github.com/Yoast/wordpress-seo/commit/0aaf711796d62656ff78d14ec53309014742d077

      Anyway, as per your suggestion, I’ve submitted a bug to Yoast’s bug tracker. This is the 2nd bug I’ve submitted to them (although the first was over twitter) and the last time they were very quick at implementing the suggestions.

      https://github.com/Yoast/wordpress-seo/issues/12161

      Reply

  4. cory
    February 14, 2019 @ 4:30 am

    Did you need to change anything in the .conf file to make this work? I get a 404 error when I try to generate a sitemap.

    Reply

    • Dave H.
      February 14, 2019 @ 10:38 pm

      Can you visit Settings -> Permalinks and hit save and try again?

      Reply

  5. Pound Natel
    September 16, 2019 @ 11:14 pm

    Hi Dave, thanks for the artiicle. I wanted to ask whether it is helpful or harmful to have two xml sitemaps for one site…say both yoast and Google xml sitemaps together?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Join our Discord community and read the rules for a 20% discount coupon.