CodeIgniter Forums
CI 2.0 and UTF-8 strings - Printable Version

+- CodeIgniter Forums (https://forum.codeigniter.com)
+-- Forum: Archived Discussions (https://forum.codeigniter.com/forumdisplay.php?fid=20)
+--- Forum: Archived Development & Programming (https://forum.codeigniter.com/forumdisplay.php?fid=23)
+--- Thread: CI 2.0 and UTF-8 strings (/showthread.php?tid=39646)



CI 2.0 and UTF-8 strings - El Forum - 03-16-2011

[eluser]Unknown[/eluser]
Hi, I found the problem in validating text fields in form. Values just disappeared. Log message was:
Quote:ERROR - 2011-03-16 22:03:48 --&gt; Severity: Notice --&gt; iconv() [...] : Wrong charset, conversion from `UTF-8' to `UTF-8//IGNORE' is not allowed <path>/system/core/Utf8.php 89
So I checked Utf8.php and found this:
Code:
function clean_string($str)
    {
        if ($this->_is_ascii($str) === FALSE)
        {
            $str = @iconv('UTF-8', 'UTF-8//IGNORE', $str);
        }

        return $str;
    }
Can someone tell me what is the purpose of cripling UTF-8 string? I choose UTF-8 because I need to deal with chars like "ąęźćż" and those are far beyond x7F (_is_ascii($str) check). For now I just commented out check and just return $str and problem is gone. Sorry for errors english isn't my native language.


CI 2.0 and UTF-8 strings - El Forum - 04-07-2011

[eluser]Arjen van Bochoven[/eluser]
I had the same problem and narrowed it down to iconv not working correctly. Passing a correct utf-8 string into
Code:
iconv('UTF-8', 'UTF-8//IGNORE', $str)
results in an empty string

I use MAMP 1.9.5 and filed a bug report. I can confirm the issue is not present in MAMP 1.7.1

The correct way to work around this is to extend the core class with your own as described in
Creating Core System Classes

* Create a new file: application/core/MY_Utf8.php
* Copy/paste the code below

Code:
&lt;?php if ( ! defined('BASEPATH')) exit('No direct script access allowed');

class MY_Utf8 extends CI_Utf8 {
    
    // --------------------------------------------------------------------

    /**
     * Overload for clean string for environments with
     * an incorrect iconv
     *
     * @access    public
     * @param    string
     * @return    string
     */
    function clean_string($str)
    {
        return $str;
    }

}



CI 2.0 and UTF-8 strings - El Forum - 04-07-2011

[eluser]Unknown[/eluser]
I used mb_convert_encoding instead of iconv. Works perfect.


CI 2.0 and UTF-8 strings - El Forum - 04-07-2011

[eluser]InsiteFX[/eluser]
This is from PHP.net

To strip bogus characters from your input (such as data from an unsanitized or other source which you can't trust to necessarily give you strings encoded according to their advertised encoding set), use the same character set as both the input and the output, with //IGNORE on the output charcter set.
Code:
&lt;?php
// assuming '†' is actually UTF8, htmlentities will assume it's iso-8859  
// since we did not specify in the 3rd argument of htmlentities.
// This generates "&acirc;[bad utf-8 character]"
// If passed to any libxml, it will generate a fatal error.
$badUTF8 = htmlentities('†');

// iconv() can ignore characters which cannot be encoded in the target character set
$goodUTF8 = iconv("utf-8", "utf-8//IGNORE", $badUTF8);
?&gt;

The result of the example does not give you back the dagger character which was the original input (it got lost when htmlentities was misused to encode it incorrectly, though this is common from people not accustomed to dealing with extended character sets), but it does at least give you data which is sane in your target character set.

InsiteFX