• 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Preventing site scraping?

#1
[eluser]cwscribner[/eluser]
Hi all.

Just out of curiosity, is there a mechanism in CI to prevent site scraping? I've been seeing a lot of buzz around people's sites being scraped and wanted to know if there was a protection method.

#2
[eluser]Ayeyermaw[/eluser]
As far as I know there is no effective way to stop people scraping your site.

Have a look at this: http://blockscraping.com/prevent-scraping.html

#3
[eluser]Jaketoolson[/eluser]
A lot of my job requires me to scrape public data made available on public websites. Every so often I come across a site that is 'nearly' impossible to scrape. This is usually when the data I want to scrape is 'appended' or 'inserted' into the DOM using Javascript <em>after</em> the page has loaded and is therefore not directly accessible.

A lot of sites use forms to display the data requested but when a hash or token is required for the form to be changed, this usually requires an additional step or two on the scrapers part.

It is pretty difficult though to prevent. Not only do I run daily scrapes, I also manage a website which we dont want scraped! Some times I simply output charts and raw data to the page in an image form using a php image class.

#4
[eluser]xjohnson[/eluser]
Hi, Jaketoolson -

You said:
Quote:I simply output charts and raw data to the page in an image form using a php image class.

Can you share with me how to do the same thing?



Warm Regards,

xjohnson


Digg   Delicious   Reddit   Facebook   Twitter   StumbleUpon  


  Theme © 2014 iAndrew  
Powered By MyBB, © 2002-2021 MyBB Group.