Welcome Guest, Not a member yet? Register   Sign In
Check a URL is valid
#1

[eluser]TheFuzzy0ne[/eluser]
Hi everyone. Thought I'd share this function as I've found it quite useful. I've extended the URL helper function. This function actually checks the URL. If the page exists, it returns TRUE, if it doesn't (due to a 404 or any other kind of error), FALSE will be returned.

./system/application/helpers/MY_url_helper.php
Code:
<?php

function url_is_working($url=FALSE)
{
    if ( ! $url) { return FALSE; }
    
    $url = prep_url($url);
    
    $header_arr = get_headers($url);
    
    return in_array('HTTP/1.0 200 OK', $header_arr);
}

// End of file: MY_url_helper.php
// Location: ./system/application/helpers/MY_url_helper.php

Just call it using the URL as a parameter and if the link is good, TRUE is returned.

I tend to use it with the form validation library, to check if a URL is indeed valid.

Hope this helps someone.

EDIT: I'd be interested to know if this works with PHP Safe Mode enabled.
#2

[eluser]tonydewan[/eluser]
This is nice. It looks like you have a typo, though.

Code:
return in_array('HTTP/1.0 200 OK'. $header_arr);

I think this should be:

Code:
return in_array('HTTP/1.0 200 OK', $header_arr);
#3

[eluser]TheFuzzy0ne[/eluser]
You're right. Well spotted. Thanks.
#4

[eluser]sophistry[/eluser]
nice, thanks.

a few things:

1) this is PHP5 only - get_headers() is not available for PHP4 but this php.net manual page has some ideas for writing your own: http://www.php.net/manual/en/function.get-headers.php

2) could tighten the check to only look at the 0 index rather than the whole array - that's where the response code is.

3) check only for the response code rather than the HTTP/1.0 string because not all servers are the same protocol version, but all servers should return 200.

4) the code thinks google.com is not a valid URL because it returns a 301 redirect code. you only get a 200 if you use the www. hostname.

i've shown the response headers below running php from the CLI
Code:
php -r 'print_r(get_headers("http://google.com"));'
Array
(
    [0] => HTTP/1.0 301 Moved Permanently
    [1] => Location: http://www.google.com/
    [2] => Content-Type: text/html; charset=UTF-8
    [3] => Date: Mon, 16 Feb 2009 22:25:06 GMT
    [4] => Expires: Wed, 18 Mar 2009 22:25:06 GMT
    [5] => Cache-Control: public, max-age=2592000
    [6] => Server: gws
    [7] => Content-Length: 219
    [8] => Connection: Close
    [9] => HTTP/1.0 200 OK
    [10] => Cache-Control: private, max-age=0
    [11] => Date: Mon, 16 Feb 2009 22:25:07 GMT
    [12] => Expires: -1
    [13] => Content-Type: text/html; charset=ISO-8859-1
    [14] => Set-Cookie: PREF=ID=:TM=:LM=:S=; expires=Wed, 16-Feb-2011 22:25:07 GMT; path=/; domain=.google.com
    [15] => Server: gws
)

these issues may or may not be important to your application...

cheers.

EDIT: i just realized the array from google.com does contain the 200 code, but later on in the array. i suppose it is doing redirects... and you can scratch #2 from the list above. ;-)




Theme © iAndrew 2016 - Forum software by © MyBB