preg_match, utf-8 and accent characters |
[eluser]Rob Gordijn[/eluser]
Ok, a got a problem which is driving me NUTS. I need to validate some input (not via CI) and the accent chars (like é) are causing a MAYOR headache. Here we go: - htaccess: AddDefaultCharset UTF-8 - doctype: 4.01 Transitional - html-head: Content-Type utf-8 - error_reporting: E_ALL, E_STRICT I'm use this regular expression: Code: $regex = '/^[ÀÁÂÃÄÅàáâãäåÒÓÔÕÖØòóôõöøÈÉÊËéèêëÇçÌÍÎÏìíîïÙÚÛÜùúûüÿÑñ]*$/'; Make an educated guess on the output of the following code. Code: var_dump(preg_match($regex, 'é')); Correct! so the é does match the regex. Now, we take a pretty standard form, fill it with é and submit it. Code: <?php Yes that is 'int(1)' how cute, but it's WRONG! it actualy prints 'int(0)' so the é does not match the regex So, without posting it works fine, but when posted things go bad. What is bugging me? some sort of config in php? my webserver? my code itself? TIA.
[eluser]Rob Gordijn[/eluser]
ok, after some hours of searching, reading, testing and reading some more... i got it ![]() ![]() http://www.php.net/manual/en/regexp.refe...nicode.php \p{L} does the job. my regex is now like this: Code: $regex_text = "/^([\p{L}a-zA-Z0-9]*)$/i"; thanks for reading.
[eluser]Unknown[/eluser]
Thank you, I had to create an account to say that it did the trick for me. |
Welcome Guest, Not a member yet? Register Sign In |