Login

08-29-2008, 09:58 AM

[eluser]sophistry[/eluser]
ha ha! wow, thanks for the link to the "impenetrable RFC 2822 email address parser." so, that's how you'd do it if you wanted to be completely impractical and gosh-darned complex! ;-)

the set of atext "visible chars" i suggested above could become the CI-approved email address standard which would cover 99% of email addresses anyone would ever really use. the gargantuan email address parser is really dealing mostly with edge cases and quoted literals (which would be a real bugbear in an autolink function).

maybe the accepted email chars should go into a config setting like the permitted uri setting? in fact, shouldn't the whole "email detector" regex go into its own function to make it more adaptable/extendable?

BTW, there is another problem with the current email detector: it detects a domain with a hostname ( e.g., [email protected] ) but, it puts the hostname in $matches[2] and the domain and tld in $matches[3]. But, with an email address with no hostname $matches[2] has the domain alone and $matches[3] has the tld alone.

i know you can easily explode('@',$matches[0]); to get the data but, the function should standardize what it captures in the sub-patterns. so that $matches contains a standard set of captured data.

here's some new test code that standardizes $matches:

Code:

&lt;?php



class Test extends Controller {



    function Test()

    {

        parent::Controller();

    }

        

    function index()

    {

        $chars = '-.a-zA-Z0-9#!$%&*+/\'=?^_`{}|~';

        $len = strlen($chars);

        $i=0;

        print_r($chars); echo '<br>';

        while ($i<$len) 

        {

            preg_match(":[$chars]:", $chars[$i], $matches);

            $i++;

            print_r($matches[0]);

        }

        

        // test on "real" addresses

        $strs = array(

                    "back|to=school~w0w.does+this^[email protected]",

                    "back{to}school-does+this^[email protected]",

                    'back{to}school#[email protected]',

                    '[email protected]',

                    '[email protected]',

                    '[email protected]',

                    '[email protected]',

                    '[email protected]',

                    'h#[email protected]',

                    '[email protected]',

                    'h$[email protected]',

                    'h%[email protected]',

                    'h&r;@example.com',

                    'h*[email protected]',

                    '[email protected]',

                    'h/[email protected]',

                    "h'[email protected]",

                    '[email protected]',

                    '[email protected]',

                    'h^[email protected]',

                    '[email protected]',

                    'h`[email protected]',

                    'h{[email protected]',

                    'h}[email protected]',

                    'h|[email protected]',

                    '[email protected]',

                    '[email protected]',

                    );

        $chars_not_dot = '-a-z0-9#!$%&*+/\'=?^_`{}|~';

        foreach ($strs as $s)

        {

            preg_match_all(";([$chars_not_dot][$chars_not_dot.]+)@((?:[-a-z0-9]+)\.(?:[-.a-z0-9]*));i", $s, $matches);

            p($matches);

        }

    

    }





}



/* End of file test.php */