Login

05-04-2009, 03:13 PM

[eluser]Yorick Peterse[/eluser]
[quote author="nubianxp" date="1241481200"]@yorick: thanks for the code man, nice to have something to start with... one question, is it possible to maybe parse a remote file? e.g.

Code:
$remote = 'http://example.com/data.html';

$DOM = new DOMDocument();

$DOM->load($remote);

$content = $DOM->getElementById('somediv');

and do either

Code:
echo $content; OR

print_r($content);

i tested the above, but i get some parsing errors when using $dom->load():

Code:
Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: error parsing attribute name in http://example.com/forecast.html, line: 19 in E:\localweb\projects\test\getfile.php on line 5

and a blank page if i use $dom->loadHTML()... anyway, appreciate the help. :lol:

@dam1an: 5,344?! and all of those are built-in?! :wow:[/quote]

I'm not really sure if you can parse a remote file using the DOM. However you could use file_get_contents() to fetch whatever is in that file and then parse it later on.

It would look like the following:

Code:
&lt;?php

// Get the content from the remote file

$remote = file_get_contents('your_remote_file.html');

// Load the DOM

$DOM = new DOMDocument(); 

// Open the $remote variable using the DOM, not sure if it's possible

$DOM->load($remote);

// Parse it

.......

05-04-2009, 07:19 PM

[eluser]TheFuzzy0ne[/eluser]
[quote author="Dam1an" date="1241453054"]Don't worry, it doesn;t make you a n00b lol
With 5344 documented PHP functions no one is expected to know them all

(and no, I didn't count them all manually Tongue

)[/quote]

I just counted them and it's 5343.

Just kidding. Big Grin

05-27-2009, 09:38 AM

[eluser]Sen Hu[/eluser]
Good question. Excellent responses so far.

Here is an alternate. You want to extract block-i-need-to-get from

Quote:...bunch of markups...
<div id="block-i-need-to-get">
<div>some content</div>
<div>some more content</div>
</div>
...more bunch of markups...

in file "someremotefile" ? (BTW: Thanks for accurately posting your requirements.)

Here is a small script.

Code:
var str content ; cat "[b]someremotefile[/b]" > $content

stex -c "^<div id=\"^]" $content > null

stex -c "[^\"^" $content > null

# What's remaining in $content is block-i-need-to-get . Print it.

echo $content

This script is written in biterscripting ( http://www.biterscripting.com/install.html ) . You can transform it to php.

Sen

05-27-2009, 09:57 AM

[eluser]Evil Wizard[/eluser]
[quote author="Dam1an" date="1241481571"]@nubianxp: Yeah they're all the built in PHP functions, then there's another million extra user created functions Tongue

PHP has grown to be huge!!![/quote]

The DOMDocument and DOMXPath are not built in functions of php, you do need to make sure your installation has been compiled with the appropriate libraries, but as of php 5 it is pretty much as standard

Use the DOMDocument to parse the html file and then you can use the DOMXPath to get at the elements

05-27-2009, 10:01 AM

[eluser]tekhneek[/eluser]
I've used Simple HTML DOM (a library for PHP) to parse HTML before and it works great.
I wrote an article on how I used it to scrape myspace artist tour dates off of myspace. Hope this helps.

http://www.crainbandy.com/programming/us...ff-myspace
http://simplehtmldom.sourceforge.net/

05-27-2009, 10:15 AM

[eluser]tekhneek[/eluser]
in PHP HTML DOM you could do

Code:
$dom_obj = new simple_html_dom();

/**

 * Where [id] is the id of the div element you want to find.

 */

$return = $dom_obj ->find('div[id]');

print_r($return); // usually an associative array of results.