Skip to content
  • Hot
  • Top
  • New
4
votes

You have ... votes left in this category for this week!

Cloudflare security bypass for Price Comparison Pro

Dave Hilditch shared this feature

July 4, 2019

A customer has found a site with ‘Im under attack’ mode enabled.

We should still be able to scrape these pages – except when a captcha is requested – although even then, it would be useful to be able to display the captcha to the admin and ask them to enter it to re-enable scraping from their server.

The example site here is:

https://www.size.co.uk/product/white-puma-x-ralph-sampson-low-og/140715/

The library to use to perform the bypass would be this:

https://github.com/Anorov/cloudflare-scrape

This service will have to be enabled through our own Phantom JS scraping service.

An additional check should be added to the start of any scraping session to check for cloudflare protection and then use the library.

The process is pretty much:

1. Check if cloudflare protection exists
2. Use the cloudflare-bypass library
3. Store the cookies for this target site for this source IP
4. When scraping after that, append the cookies to the cURL request (along with any other cookies specified by user in normal way) so that all requests bypass the protection page.
5. If cookies fail, re-run the cloudflare-bypass library to get a fresh set of cookies for this site + server IP

2 Comments

  1. Dave H.
    July 12, 2019 @ 3:41 pm

    Just an update on this so far:

    I’ve created a test service using the library listed above to use this python cfscrape (cloudflare security bypass) library.

    It’s working for some cloudflare protected sites, but not yet all.

    I’m still working on this – you can see, for example, this URL times out:

    http://scraper.wpintense.com:6543/?url=https://www.size.co.uk/product/white-puma-x-ralph-sampson-low-og/140715/

    I’m still debugging the http errors generated by this to figure out this bypass.

    Reply

  2. Dave H.
    July 26, 2019 @ 11:16 am

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *