How do you parse a file and pull a block of markup?

[eluser]Yorick Peterse[/eluser]
[quote author="nubianxp" date="1241481200"]@yorick: thanks for the code man, nice to have something to start with... one question, is it possible to maybe parse a remote file? e.g.
$remote = 'http://example.com/data.html';
$DOM = new DOMDocument();
$content = $DOM->getElementById('somediv');
and do either
echo $content; OR

i tested the above, but i get some parsing errors when using $dom->load():
Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: error parsing attribute name in http://example.com/forecast.html, line: 19 in E:\localweb\projects\test\getfile.php on line 5

and a blank page if i use $dom->loadHTML()... anyway, appreciate the help. :lol:

@dam1an: 5,344?! and all of those are built-in?! :wow:[/quote]

I'm not really sure if you can parse a remote file using the DOM. However you could use file_get_contents() to fetch whatever is in that file and then parse it later on.

It would look like the following:

// Get the content from the remote file
$remote = file_get_contents('your_remote_file.html');

// Load the DOM
$DOM = new DOMDocument();

// Open the $remote variable using the DOM, not sure if it's possible

// Parse it

[quote author="Dam1an" date="1241453054"]Don't worry, it doesn;t make you a n00b lol
With 5344 documented PHP functions no one is expected to know them all

(and no, I didn't count them all manually Tongue)[/quote]

I just counted them and it's 5343.

Just kidding. Big Grin

[eluser]Sen Hu[/eluser]
Good question. Excellent responses so far.

Here is an alternate. You want to extract block-i-need-to-get from

Quote:...bunch of markups...
<div id="block-i-need-to-get">
<div>some content</div>
<div>some more content</div>
...more bunch of markups...

in file "someremotefile" ? (BTW: Thanks for accurately posting your requirements.)

Here is a small script.

var str content ; cat "[b]someremotefile[/b]" > $content
stex -c "^<div id=\"^]" $content > null
stex -c "[^\"^" $content > null
# What's remaining in $content is block-i-need-to-get . Print it.
echo $content

This script is written in biterscripting ( http://www.biterscripting.com/install.html ) . You can transform it to php.


[eluser]Evil Wizard[/eluser]
[quote author="Dam1an" date="1241481571"]@nubianxp: Yeah they're all the built in PHP functions, then there's another million extra user created functions Tongue
PHP has grown to be huge!!![/quote]

The DOMDocument and DOMXPath are not built in functions of php, you do need to make sure your installation has been compiled with the appropriate libraries, but as of php 5 it is pretty much as standard

Use the DOMDocument to parse the html file and then you can use the DOMXPath to get at the elements

I've used Simple HTML DOM (a library for PHP) to parse HTML before and it works great.
I wrote an article on how I used it to scrape myspace artist tour dates off of myspace. Hope this helps.


in PHP HTML DOM you could do
$dom_obj = new simple_html_dom();

* Where [id] is the id of the div element you want to find.
$return = $dom_obj ->find('div[id]');

print_r($return); // usually an associative array of results.

