Welcome Guest, Not a member yet? Register   Sign In
Directions how to make your CI UTF-8 Compliant
#1

[eluser]j2more[/eluser]
Hi all,

I have an international audience in my CI-app. Here are some steps somebody might find useful to make sure your CI php app speaks UTF-8. You might know that PHP up to version 6 is natively unaware of encodings and CI is also not consistent in this area.

Rgds Ben

1. Setup PHP
================
make sure in php.ini is mbstring is enabled and configured as follows to overrride php's multi-byte unsafe operations (doesn't detect if a UTF-8 string has more than 1 byte (256 characters):
extension=php_mbstring.dll
[mbstring]
mbstring.language = Neutral ; Set default language to Neutral(UTF-8) (default)
mbstring.internal_encoding = UTF-8 ; Set default internal encoding to UTF-8
mbstring.encoding_translation = On ; HTTP input encoding translation is enabled
mbstring.http_input = auto ; Set HTTP input character set dectection to auto
mbstring.http_output = UTF-8 ; Set HTTP output encoding to UTF-8
mbstring.detect_order = auto ; Set default character encoding detection order to auto
mbstring.substitute_character = none ; Do not print invalid characters
default_charset = UTF-8 ; Default
mbstring.func_overload = 7

2. Setup Database
=======================
make sure the database encoding is UTF8

3. Setup CI
================
make sure you have defined the charset in CI config right: $config['charset'] = "UTF-8"; (which is used in some methods)

4. Change CI core (1.7.1) to support UTF-8
===================================
Note: Here i hardcoded UTF-8. Instead you could also take the encoding from the config which would be the better way.

Enforce browsers to submit UTF-8: Line 54 of system/helpers/form_helper.php to: $form = '<form action="'.$action.'" accept-charset="utf-8"';
adjust potentially php utf-8 unsafe methods:
htmlentities (from htmlentities($var) to htmlentities($var, ENT_QUOTES, 'UTF-8') )
line 443 in system/libraries/Xmlrpc.php
htmlspecialchars (from htmlspecialchars($val) to htmlspecialchars($val,ENT_COMPAT, 'UTF-8') )
see http://us3.php.net/manual/en/function.ht...lchars.php "The default character set is ISO-8859-1. " )
line 579 system/helpers/form_helper.php
line 1900 system/libraries/email.php
line 674, 794, 1362 in system/libraries/Xmlrpc.php

5. Setup your CI App
=======================
in all view files
header('Content-type: text/html; charset=UTF-8') ;
<meta http-equiv="Content-type" value="text/html; charset=UTF-8" />
#2

[eluser]j2more[/eluser]
Addition: Basically all CI library validation methods are also not UTF-8 compliant. They just validate basic English:

function alpha($str)
{
return ( ! preg_match("/^([a-z])+$/i", $str)) ? FALSE : TRUE;
}

You might replace all the alpha* functions with the relevant UTF-8 character codes instead. Check http://unicode.org/cldr/utility/list-unicodeset.jsp for the character sets that you are interested in. Below is English,apostrophe,whitespace,Latin-1 Supplement - With excess removed, only leaving characters,Latin-1 Supplement - Extra characters,Latin-1 Supplement - Extra characters(for french), and greek.

Concrete application uses it to check for valid first/lastname.


function alpha($str)
{
return ( ! preg_match("/^([\x{0041}-\x{005a}\x{0061}-\x{007a}\x{0027}\s\x{00c0}-\x{00ff}\x{0131}-\x{0132}\x{0152}\x{0178}\x{0391}-\x{03a9}\x{03b1}-\x{03c9}])+$/u", $str)) ? FALSE : TRUE;
}




Theme © iAndrew 2016 - Forum software by © MyBB