• Hot
  • Top
  • New
1
votes

You have ... votes left in this category for this week!

Scrape multiple variations

On a page where multiple variatons of a product are listed and the CSS pulls out the correct price in response to a user selection, it’s not currently possible to get Price Comparison Pro to pull out all the variations of a product from a single page to populate the database.

So this feature request is for PCP to be able to extract multiple variations from a single page to populate the database.

e.g. https://www.animeddirect.co.uk/metacam-1-5mg-ml-oral-suspension-for-dogs

4
votes

You have ... votes left in this category for this week!

Cloudflare security bypass for Price Comparison Pro

A customer has found a site with ‘Im under attack’ mode enabled.

We should still be able to scrape these pages – except when a captcha is requested – although even then, it would be useful to be able to display the captcha to the admin and ask them to enter it to re-enable scraping from their server.

The example site here is:

https://www.size.co.uk/product/white-puma-x-ralph-sampson-low-og/140715/

The library to use to perform the bypass would be this:

https://github.com/Anorov/cloudflare-scrape

This service will have to be enabled through our own Phantom JS scraping service.

An additional check should be added to the start of any scraping session to check for cloudflare protection and then use the library.

The process is pretty much:

1. Check if cloudflare protection exists
2. Use the cloudflare-bypass library
3. Store the cookies for this target site for this source IP
4. When scraping after that, append the cookies to the cURL request (along with any other cookies specified by user in normal way) so that all requests bypass the protection page.
5. If cookies fail, re-run the cloudflare-bypass library to get a fresh set of cookies for this site + server IP