Welcome Guest, Not a member yet? Register   Sign In
Scraping sites for more info (on the fly)
#4

[eluser]zimco[/eluser]
I don't know if they will help your situation but i also utilized a couple of really basic scraping and parsing classes like:

Http.php written by Troy Wolf a Screen-scraping class with caching. Includes image_cache.php companion script. Includes static methods to extract data out of HTML tables into arrays or XML. Now supports sending XML requests and custom verbs with support for making WebDAV requests to Microsoft Exchange Server.

Parser.php a parsing class with various parsing functions used to "help" parse an HTML file for data:
-Remove forbidden HTML tags using the PHP strip_tags function
-Remove unwanted attributes from HTML source using the PHP preg_replace function
-Reformat an HTML document this will remove HTML tags, javascript sections and white space. It will also convert some common HTML entities to their text equivalent.
-Split the page HTML

But i really found that writing the parsing part myself using regexes was easier than trying to figure out how to make somebody else's parser fit the needs of my situation.


Messages In This Thread
Scraping sites for more info (on the fly) - by El Forum - 11-08-2008, 03:05 PM
Scraping sites for more info (on the fly) - by El Forum - 11-08-2008, 05:50 PM
Scraping sites for more info (on the fly) - by El Forum - 11-09-2008, 01:38 AM
Scraping sites for more info (on the fly) - by El Forum - 11-09-2008, 10:25 AM



Theme © iAndrew 2016 - Forum software by © MyBB