• 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
regular expression help

#1
[eluser]megabyte[/eluser]
I have a ton of static links pages I want to put in a database. Trying to figure out regular expressions to do this.

I have 3 parts. the title of the link, the url, and the description.

its formatted in the page like this for each link

Code:
<div align="Left"> <b> <font face="Verdana" size="3"> <a href="http://www.air-quality-eng.com" target="new"> Air
Cleaners-Air Quality Engineering </a> </font> </b> <font face="Verdana" size="1">
&lt;!--;;lineage;;--&gt;
</font>
<p style="margin-left: 0; margin-top: 0; margin-bottom: 10"> <font color="#000000" face="Verdana" size="2"> Manufactures
and distributes air cleaners and filtration systems for commercial, home and
industrial use.</font> </p>
</div>

So I need to use preg_match_all I am supposing and having no luck.

Can someone who is a wizard give me a hand, would so save me sometime I don't really have.

#2
[eluser]Crafter[/eluser]
How about
Code:
preg_match_all("/<a href="(.*?)">/", $html, $matches);
print_r($matches);

#3
[eluser]Flayra[/eluser]
To get link, title and description try something like this:
Code:
$pattern = "/<a href=\"(.*?)\".*?&gt;(.*?)</a>.*?<p.*?&gt;(.*?)</font>/i";
$html = str_replace("\r\n","", $html);
preg_match_all($pattern, $html, $matches);
print_r($matches);
I'm a bit rusty, so I couldn't remember how to make it multi-line (since . won't accept linebreaks) so I just removed all the line-endings and made all expressions non-greedy.

It should work, but I have only tested it on text in the e editor.

#4
[eluser]megabyte[/eluser]
Thanks to both of you.


Flayra


I'm getting this error from your code.

Warning: preg_match_all(): Unknown modifier 'a'

#5
[eluser]sophistry[/eluser]
the pattern has to be changed from this:
Code:
$pattern = "/<a href=\"(.*?)\".*?&gt;(.*?)</a>.*?<p.*?&gt;(.*?)</font>/i";
to this:
Code:
$pattern = "!<a href=\"(.*?)\".*?&gt;(.*?)</a>.*?<p.*?&gt;(.*?)</font>!i";

i replaced the pattern delimeter forward slash with ! - it can be anything, but using forward slash was confusing the PRCE parser into thinking the pattern was over when it hit the </a> tag embedded in there.

cheers.

#6
[eluser]megabyte[/eluser]
well they both only get the links and not the description. Plus you have to remove the </font> tag or it doesn't grab anything.

I think I have a better understanding though, and will work with what you guys have helped me with.

I appreciate it and will keep you posted.

This forum rocks for help.

#7
[eluser]megabyte[/eluser]
Looking fro help to get link description that is between a font tag. There are many links per page so I am doing this. It obviously doesn't work. Can someone please help? All I am getting is the very last instance found and not all of them.

Code:
//html to search for:
//<font color="#000000" face="Verdana" size="2"> link descriptions go here</font>
&lt;?php
preg_match_all ("|size=\"2\">(.*?)</font>|",
                    $var, &$matches);


$matches = $matches[0];
    $list = array();

    foreach($matches as $var)
    {    
        print($var."<br>");
    }
?&gt;

there are other font tags in the pages so I need to search for just this specific one.

I'm having so many issues wrapping my head around this stuff.


Digg   Delicious   Reddit   Facebook   Twitter   StumbleUpon  


  Theme © 2014 iAndrew  
Powered By MyBB, © 2002-2020 MyBB Group.