Welcome Guest, Not a member yet? Register   Sign In
CI 2.0 and UTF-8 strings
#4

[eluser]InsiteFX[/eluser]
This is from PHP.net

To strip bogus characters from your input (such as data from an unsanitized or other source which you can't trust to necessarily give you strings encoded according to their advertised encoding set), use the same character set as both the input and the output, with //IGNORE on the output charcter set.
Code:
<?php
// assuming '†' is actually UTF8, htmlentities will assume it's iso-8859  
// since we did not specify in the 3rd argument of htmlentities.
// This generates "â[bad utf-8 character]"
// If passed to any libxml, it will generate a fatal error.
$badUTF8 = htmlentities('†');

// iconv() can ignore characters which cannot be encoded in the target character set
$goodUTF8 = iconv("utf-8", "utf-8//IGNORE", $badUTF8);
?>

The result of the example does not give you back the dagger character which was the original input (it got lost when htmlentities was misused to encode it incorrectly, though this is common from people not accustomed to dealing with extended character sets), but it does at least give you data which is sane in your target character set.

InsiteFX


Messages In This Thread
CI 2.0 and UTF-8 strings - by El Forum - 03-16-2011, 03:59 PM
CI 2.0 and UTF-8 strings - by El Forum - 04-07-2011, 02:27 AM
CI 2.0 and UTF-8 strings - by El Forum - 04-07-2011, 02:42 AM
CI 2.0 and UTF-8 strings - by El Forum - 04-07-2011, 11:13 AM



Theme © iAndrew 2016 - Forum software by © MyBB