Welcome Guest, Not a member yet? Register   Sign In
preg_match, utf-8 and accent characters
#1

[eluser]Rob Gordijn[/eluser]
Ok, a got a problem which is driving me NUTS.

I need to validate some input (not via CI)
and the accent chars (like é) are causing a MAYOR headache.

Here we go:

- htaccess: AddDefaultCharset UTF-8
- doctype: 4.01 Transitional
- html-head: Content-Type utf-8
- error_reporting: E_ALL, E_STRICT

I'm use this regular expression:
Code:
$regex = '/^[ÀÁÂÃÄÅàáâãäåÒÓÔÕÖØòóôõöøÈÉÊËéèêëÇçÌÍÎÏìíîïÙÚÛÜùúûüÿÑñ]*$/';

Make an educated guess on the output of the following code.
Code:
var_dump(preg_match($regex, 'é'));
Yes, thats is 'int(1)'
Correct! so the é does match the regex.

Now, we take a pretty standard form, fill it with é and submit it.
Code:
<?php
$string = isset($_POST['string']) ? $_POST['string'] : null;
?>
<form action="test.php" method="POST" acceptcharset="utf-8">
<input type="text" name="string" value="<?php echo $string; ?>">
<input type="submit" value="gooo">
</form>
<?php
var_dump(preg_match($regex, $string));
?>
And take another aducated guess on the output.
Yes that is 'int(1)'
how cute, but it's WRONG! it actualy prints 'int(0)' so the é does not match the regex

So, without posting it works fine, but when posted things go bad.
What is bugging me? some sort of config in php? my webserver? my code itself?
TIA.
#2

[eluser]Rob Gordijn[/eluser]
ok, after some hours of searching, reading, testing and reading some more...
i got it Smile it was just so simple Sad

http://www.php.net/manual/en/regexp.refe...nicode.php

\p{L} does the job.

my regex is now like this:
Code:
$regex_text = "/^([\p{L}a-zA-Z0-9]*)$/i";

thanks for reading.
#3

[eluser]Unknown[/eluser]
Thank you, I had to create an account to say that it did the trick for me.




Theme © iAndrew 2016 - Forum software by © MyBB