CodeIgniter Forums
preg_match, utf-8 and accent characters - Printable Version

+- CodeIgniter Forums (https://forum.codeigniter.com)
+-- Forum: Archived Discussions (https://forum.codeigniter.com/forumdisplay.php?fid=20)
+--- Forum: Archived Development & Programming (https://forum.codeigniter.com/forumdisplay.php?fid=23)
+--- Thread: preg_match, utf-8 and accent characters (/showthread.php?tid=27185)



preg_match, utf-8 and accent characters - El Forum - 02-03-2010

[eluser]Rob Gordijn[/eluser]
Ok, a got a problem which is driving me NUTS.

I need to validate some input (not via CI)
and the accent chars (like é) are causing a MAYOR headache.

Here we go:

- htaccess: AddDefaultCharset UTF-8
- doctype: 4.01 Transitional
- html-head: Content-Type utf-8
- error_reporting: E_ALL, E_STRICT

I'm use this regular expression:
Code:
$regex = '/^[ÀÁÂÃÄÅàáâãäåÒÓÔÕÖØòóôõöøÈÉÊËéèêëÇçÌÍÎÏìíîïÙÚÛÜùúûüÿÑñ]*$/';

Make an educated guess on the output of the following code.
Code:
var_dump(preg_match($regex, 'é'));
Yes, thats is 'int(1)'
Correct! so the é does match the regex.

Now, we take a pretty standard form, fill it with é and submit it.
Code:
<?php
$string = isset($_POST['string']) ? $_POST['string'] : null;
?>
<form action="test.php" method="POST" acceptcharset="utf-8">
<input type="text" name="string" value="<?php echo $string; ?>">
<input type="submit" value="gooo">
</form>
<?php
var_dump(preg_match($regex, $string));
?>
And take another aducated guess on the output.
Yes that is 'int(1)'
how cute, but it's WRONG! it actualy prints 'int(0)' so the é does not match the regex

So, without posting it works fine, but when posted things go bad.
What is bugging me? some sort of config in php? my webserver? my code itself?
TIA.


preg_match, utf-8 and accent characters - El Forum - 02-05-2010

[eluser]Rob Gordijn[/eluser]
ok, after some hours of searching, reading, testing and reading some more...
i got it Smile it was just so simple Sad

http://www.php.net/manual/en/regexp.reference.unicode.php

\p{L} does the job.

my regex is now like this:
Code:
$regex_text = "/^([\p{L}a-zA-Z0-9]*)$/i";

thanks for reading.


preg_match, utf-8 and accent characters - El Forum - 01-14-2013

[eluser]Unknown[/eluser]
Thank you, I had to create an account to say that it did the trick for me.