Welcome Guest, Not a member yet? Register   Sign In
php grab http or www links from text and return full url
#11

[eluser]new_igniter[/eluser]
ok, I have been working hard on this, but am guessing on something now and would love your help to complete this.

This is the scenario. I have a text blog, and am looking through it for things that start with http or www and then grabbing the matches. I can get it to get the domain, but I need the full URL, like anything after the .com, or .net etc... Is there anyway to adjust this regex so that it looks for anything that starts with the http or www and grabs everything until there is a space?

Code:
if(preg_match("/(http:\/\/)((www\.)|([a-z0-9A-Z]\.)+)?(([a-z0-9A-Z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}(:[0-9]+)?)){2,}\.([a-zA-Z]){2,3}\/?/",$text,$hma\
tches))
            {
                echo $hmatches[0];
            }
#12

[eluser]Randy Casburn[/eluser]
Yes,

Try this one...

Code:
((http|https|mailto)+://\3)|(([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\@\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?)

The assertion up front says: "find a match with one these http or https or mailto with ://" plus a url, if you don't find that, find just a url. I still don't think this is perfect. But we're getting closer.

Randy
#13

[eluser]Randy Casburn[/eluser]
So here's the improvement...

Code:
$text = "This www.firsttest.com?id=123456789 is a test of the JavaScript http://www.testdomain.com?id=3254234 RegExp mydomain.com/text/test?id=444444444  object";
  
  if(preg_match_all("/((http|https|mailto)+:\/\/\\3)|(([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\@\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?)/",$text,$hmatches))
    {
        echo '<pre>';
        print_r($hmatches[0]);
    }

Note I change this to match_all since I have three urls in the string.

the output...
Code:
Array
        (
            [0] => This
            [1] => www.firsttest.com?id=123456789
            [2] => is
            [3] => a
            [4] => test
            [5] => of
            [6] => the
            [7] => JavaScript
            [8] => http://www.testdomain.com?id=3254234
            [9] => RegExp
            [10] => mydomain.com/text/test?id=444444444
            [11] => object
        )
So now we need to limit the search to full strings that are in the match that include at least on period (.) Once that's figured out, you'll have your result. Can search an entire block of text and return every url in one giant array.

Randy
#14

[eluser]new_igniter[/eluser]
thanks so much for your help, Randy!




Theme © iAndrew 2016 - Forum software by © MyBB