Scraping using Regex

With Price Comparison Pro, you can scrape using CSS, xPath and Regex. In this KB, you’ll see how to scrape usuing Regex.

Many websites now provide schema.org JSON objects inside their raw HTML which you can scrape with Price Comparison Pro.

Here is an example schema.org JSON object with matching regex:

<script type="application/ld+json">
{
    "@context": "http://schema.org",
    "@type": "Product",
    "description": "Cumpara Monitor LED IPS Samsung 24'', Full HD, 75Hz, 5ms, AMD FreeSync, VGA, HDMI, LF24T356FHRXEN de la eMAG! Ai libertatea sa platesti in rate, Beneficiezi de promotiile zilei, deschiderea coletului la livrare, easybox, retur gratuit in 30 de zile si Instant Money Back.",
    "name": "Monitor LED IPS Samsung 24'', Full HD, 75Hz, 5ms, AMD FreeSync, VGA, HDMI, LF24T356FHRXEN",
            "image": "https://s13emagst.akamaized.net/products/40151/40150570/images/res_b90acf7a211d87ff21dd5d3675533a6a.png?width=80&amp;height=80&amp;hash=319436B497B491188B43A379CC64481E",
        "url":"https://www.emag.ro/monitor-led-ips-samsung-24-full-hd-75hz-5ms-amd-freesync-vga-hdmi-lf24t356fhrxen/pd/DLWZ9PMBM/",
    "sku":"DLWZ9PMBM",
    "brand": {
        "@type": "Brand",
        "name": "Samsung"
    },
            "aggregateRating": {
            "@type": "AggregateRating",
            "ratingValue": "4.77",
            "reviewCount": "26",
            "bestRating": "5",
            "worstRating": "1"
        },
        "offers": {
        "@type": "Offer",
        "seller": "eMAG",
        "availability": "",
        "price": "649.99",
        "priceCurrency": "RON"
    }
}
</script>

To grab the price from this, the regex would be:

/"price": "([^"]+)"/
  1. First, there are wrapping / characters to contain the expression.
  2. Next, the identifying marker before the price value. In this case it’s “price”: “
  3. Then the capturing area inside ( circular brackets ) – in this capturing area we are saying [^”]+. [] means any of the characters inside these brackets. The ^ is a special character to say any character except the following character. So [^”] means match any character except a double quote. Matching a double quote would mean we know the price value has ended. Finally, the + after the square brackets says match 1 or more character.
  4. Then we have the closing ” symbol and finally the closing /

To use Regex in Price Comparison Pro, you do so in the same way you configure CSS or xPath. Visit Settings > Price Comparison Pro > Price Comparison and choose ‘Expression Type’ of RegEXP.

Be the first to comment and we'll reply right away.

Leave a reply

Super Speedy Plugins
Logo