• 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How do you parse a file and pull a block of markup?

#11
[eluser]Yorick Peterse[/eluser]
[quote author="nubianxp" date="1241481200"]@yorick: thanks for the code man, nice to have something to start with... one question, is it possible to maybe parse a remote file? e.g.
Code:
$remote = 'http://example.com/data.html';
$DOM = new DOMDocument();
$DOM->load($remote);
$content = $DOM->getElementById('somediv');
and do either
Code:
echo $content; OR
print_r($content);

i tested the above, but i get some parsing errors when using $dom->load():
Code:
Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: error parsing attribute name in http://example.com/forecast.html, line: 19 in E:\localweb\projects\test\getfile.php on line 5

and a blank page if i use $dom->loadHTML()... anyway, appreciate the help. :lol:

@dam1an: 5,344?! and all of those are built-in?! :wow:[/quote]

I'm not really sure if you can parse a remote file using the DOM. However you could use file_get_contents() to fetch whatever is in that file and then parse it later on.

It would look like the following:

Code:
<?php
// Get the content from the remote file
$remote = file_get_contents('your_remote_file.html');

// Load the DOM
$DOM = new DOMDocument();

// Open the $remote variable using the DOM, not sure if it's possible
$DOM->load($remote);

// Parse it
.......

#12
[eluser]TheFuzzy0ne[/eluser]
[quote author="Dam1an" date="1241453054"]Don't worry, it doesn;t make you a n00b lol
With 5344 documented PHP functions no one is expected to know them all

(and no, I didn't count them all manually Tongue)[/quote]

I just counted them and it's 5343.

Just kidding. Big Grin

#13
[eluser]Sen Hu[/eluser]
Good question. Excellent responses so far.

Here is an alternate. You want to extract block-i-need-to-get from

Quote:...bunch of markups...
<div id="block-i-need-to-get">
<div>some content</div>
<div>some more content</div>
</div>
...more bunch of markups...

in file "someremotefile" ? (BTW: Thanks for accurately posting your requirements.)

Here is a small script.

Code:
var str content ; cat "[b]someremotefile[/b]" > $content
stex -c "^<div id=\"^]" $content > null
stex -c "[^\"^" $content > null
# What's remaining in $content is block-i-need-to-get . Print it.
echo $content

This script is written in biterscripting ( http://www.biterscripting.com/install.html ) . You can transform it to php.

Sen

#14
[eluser]Evil Wizard[/eluser]
[quote author="Dam1an" date="1241481571"]@nubianxp: Yeah they're all the built in PHP functions, then there's another million extra user created functions Tongue
PHP has grown to be huge!!![/quote]

The DOMDocument and DOMXPath are not built in functions of php, you do need to make sure your installation has been compiled with the appropriate libraries, but as of php 5 it is pretty much as standard

Use the DOMDocument to parse the html file and then you can use the DOMXPath to get at the elements

#15
[eluser]tekhneek[/eluser]
I've used Simple HTML DOM (a library for PHP) to parse HTML before and it works great.
I wrote an article on how I used it to scrape myspace artist tour dates off of myspace. Hope this helps.

http://www.crainbandy.com/programming/us...ff-myspace
http://simplehtmldom.sourceforge.net/

#16
[eluser]tekhneek[/eluser]
in PHP HTML DOM you could do
Code:
$dom_obj = new simple_html_dom();

/**
* Where [id] is the id of the div element you want to find.
*/
$return = $dom_obj ->find('div[id]');

print_r($return); // usually an associative array of results.


Digg   Delicious   Reddit   Facebook   Twitter   StumbleUpon  


  Theme © 2014 iAndrew  
Powered By MyBB, © 2002-2021 MyBB Group.