Directions how to make your CI UTF-8 Compliant |
[eluser]j2more[/eluser]
Hi all, I have an international audience in my CI-app. Here are some steps somebody might find useful to make sure your CI php app speaks UTF-8. You might know that PHP up to version 6 is natively unaware of encodings and CI is also not consistent in this area. Rgds Ben 1. Setup PHP ================ make sure in php.ini is mbstring is enabled and configured as follows to overrride php's multi-byte unsafe operations (doesn't detect if a UTF-8 string has more than 1 byte (256 characters): extension=php_mbstring.dll [mbstring] mbstring.language = Neutral ; Set default language to Neutral(UTF-8) (default) mbstring.internal_encoding = UTF-8 ; Set default internal encoding to UTF-8 mbstring.encoding_translation = On ; HTTP input encoding translation is enabled mbstring.http_input = auto ; Set HTTP input character set dectection to auto mbstring.http_output = UTF-8 ; Set HTTP output encoding to UTF-8 mbstring.detect_order = auto ; Set default character encoding detection order to auto mbstring.substitute_character = none ; Do not print invalid characters default_charset = UTF-8 ; Default mbstring.func_overload = 7 2. Setup Database ======================= make sure the database encoding is UTF8 3. Setup CI ================ make sure you have defined the charset in CI config right: $config['charset'] = "UTF-8"; (which is used in some methods) 4. Change CI core (1.7.1) to support UTF-8 =================================== Note: Here i hardcoded UTF-8. Instead you could also take the encoding from the config which would be the better way. Enforce browsers to submit UTF-8: Line 54 of system/helpers/form_helper.php to: $form = '<form action="'.$action.'" accept-charset="utf-8"'; adjust potentially php utf-8 unsafe methods: htmlentities (from htmlentities($var) to htmlentities($var, ENT_QUOTES, 'UTF-8') ) line 443 in system/libraries/Xmlrpc.php htmlspecialchars (from htmlspecialchars($val) to htmlspecialchars($val,ENT_COMPAT, 'UTF-8') ) see http://us3.php.net/manual/en/function.ht...lchars.php "The default character set is ISO-8859-1. " ) line 579 system/helpers/form_helper.php line 1900 system/libraries/email.php line 674, 794, 1362 in system/libraries/Xmlrpc.php 5. Setup your CI App ======================= in all view files header('Content-type: text/html; charset=UTF-8') ; <meta http-equiv="Content-type" value="text/html; charset=UTF-8" />
[eluser]j2more[/eluser]
Addition: Basically all CI library validation methods are also not UTF-8 compliant. They just validate basic English: function alpha($str) { return ( ! preg_match("/^([a-z])+$/i", $str)) ? FALSE : TRUE; } You might replace all the alpha* functions with the relevant UTF-8 character codes instead. Check http://unicode.org/cldr/utility/list-unicodeset.jsp for the character sets that you are interested in. Below is English,apostrophe,whitespace,Latin-1 Supplement - With excess removed, only leaving characters,Latin-1 Supplement - Extra characters,Latin-1 Supplement - Extra characters(for french), and greek. Concrete application uses it to check for valid first/lastname. function alpha($str) { return ( ! preg_match("/^([\x{0041}-\x{005a}\x{0061}-\x{007a}\x{0027}\s\x{00c0}-\x{00ff}\x{0131}-\x{0132}\x{0152}\x{0178}\x{0391}-\x{03a9}\x{03b1}-\x{03c9}])+$/u", $str)) ? FALSE : TRUE; } |
Welcome Guest, Not a member yet? Register Sign In |