Welcome Guest, Not a member yet? Register   Sign In
Handling UTF-8 character encoding across your app
#1

[eluser]Matthew Pennell[/eluser]
I've been having ongoing problems with character encodings on a recent site I developed, mostly due to UTF-8 not being applied consistently across all aspects of the application. So today I decided to get to the root of the problem - and, with a little help from this article, this is what I ended up doing.

Amended database/drivers/mysql/mysql_driver.php:

Code:
/**
* Non-persistent database connection
*
* @access    private called by the base class
* @return    resource
*/    
function db_connect()
{
    $link = @mysql_connect($this->hostname, $this->username, $this->password, TRUE);
    mysql_query("SET NAMES 'utf8';", $link);
    mysql_query("SET CHARACTER SET 'utf8';", $link);
    return $link;
}

// --------------------------------------------------------------------

/**
* Persistent database connection
*
* @access    private called by the base class
* @return    resource
*/    
function db_pconnect()
{
    $link = @mysql_pconnect($this->hostname, $this->username, $this->password);
    mysql_query("SET NAMES 'utf8';", $link);
    mysql_query("SET CHARACTER SET 'utf8';", $link);
    return $link;
}

Added these lines to the top of index.php:

Code:
# Set PHP's internal character encoding to UTF-8
mb_internal_encoding('UTF-8');

# Set the character encoding to UTF-8 for all page output
header('Content-type: text/html; charset=UTF-8');

It seems to have done the trick, but obviously I'm not comfortable with hacking the core database driver files. Does anyone else have any suggestions for how to do the same thing in a more unobtrusive manner (maybe using Hooks)?
#2

[eluser]Derek Jones[/eluser]
What version of CodeIgniter are you using, Buddy? 1.6.x handles the setting of the client database character set for you, and needn't hack the driver. And I personally wouldn't use server headers to send the content-type, but would include that in the <head> of your document. Now, string handling within your application is really another matter altogether. That's really only going to be beneficial to you if you consistently use the multi-byte string functions throughout your application, which may require extending parts of the core application, or if string functions are overloaded with the multi-byte string functions, and if the server's internal encoding for multi-byte string functions isn't already set to UTF-8. It's not going to hurt you by any means to set that explicitly, but it's not a magic bullet.
#3

[eluser]Matthew Pennell[/eluser]
I'm using 1.5.2 - I didn't know 1.6 added the character set config item, thanks (and does that also address whatever the 'SET NAMES' query is doing?)

I'm setting server headers in case the server is sending any default Content-Type, as that would override anything I set in the <head> with a meta tag.
#4

[eluser]Derek Jones[/eluser]
[quote author="Buddy Bradley" date="1204465974"]I'm using 1.5.2 - I didn't know 1.6 added the character set config item, thanks (and does that also address whatever the 'SET NAMES' query is doing?)[/quote]

Yes it does.

Quote:I'm setting server headers in case the server is sending any default Content-Type, as that would override anything I set in the <head> with a meta tag.

Are you at least also setting the meta tag? Some browsers will ignore what the server sends, and without the presence of this tag might fall back to the wrong character set.
#5

[eluser]Matthew Pennell[/eluser]
[quote author="Derek Jones" date="1204486215"]Are you at least also setting the meta tag? Some browsers will ignore what the server sends, and without the presence of this tag might fall back to the wrong character set.[/quote]
Yes, I always set both. I'm not aware of any browsers ignoring server headers, but there is always the possibility of someone saving a page and then viewing it offline, locally, later.




Theme © iAndrew 2016 - Forum software by © MyBB