Welcome Guest, Not a member yet? Register   Sign In
Preventing site scraping?
#3

[eluser]Jaketoolson[/eluser]
A lot of my job requires me to scrape public data made available on public websites. Every so often I come across a site that is 'nearly' impossible to scrape. This is usually when the data I want to scrape is 'appended' or 'inserted' into the DOM using Javascript <em>after</em> the page has loaded and is therefore not directly accessible.

A lot of sites use forms to display the data requested but when a hash or token is required for the form to be changed, this usually requires an additional step or two on the scrapers part.

It is pretty difficult though to prevent. Not only do I run daily scrapes, I also manage a website which we dont want scraped! Some times I simply output charts and raw data to the page in an image form using a php image class.


Messages In This Thread
Preventing site scraping? - by El Forum - 08-06-2011, 12:31 AM
Preventing site scraping? - by El Forum - 08-06-2011, 12:54 AM
Preventing site scraping? - by El Forum - 08-06-2011, 12:32 PM
Preventing site scraping? - by El Forum - 08-09-2011, 01:39 PM



Theme © iAndrew 2016 - Forum software by © MyBB