Welcome Guest, Not a member yet? Register   Sign In
Displaying UTF-8 strings
#1

[eluser]Kenneth Allen[/eluser]
I am converting an existing web site to a new site using CI and adding dynamic database support (previously they generated static pages from an Access database). Part of the process is to convert the contents of the Access database to MYSQL. I wrote some C# code ona windows VM to read the Access database and convert the text to UTF8 and write SQL INSERT statements to text file. I then transfer the tex file to my iMac and execute it against a schema I defined there.

I have confirmed that the SQL file contains all UTF characters and that the special characters, especially the ellipses, appear properly in a text editor as well as on the command line.

I also confirmed that the tables are all defined as shown below.

Code:
CREATE TABLE `GraveNotes` (
    `id`                    INTEGER            NOT NULL AUTO_INCREMENT,
    `GraveID`                INTEGER            NOT NULL,
    `CreateDate`            DATETIME        NOT NULL,
    `NoteText`                TEXT            NOT NULL,
    CONSTRAINT `PK_GraveNotes` PRIMARY KEY(`id`),
    CONSTRAINT `FK_GraveNotes_GraveID_Graves` FOREIGN KEY(`GraveID`) REFERENCES `Graves`(`id`)
) ENGINE=InnoDB CHARACTER SET utf8 COLLATE utf8_unicode_ci;

From the command line this looks OK:

Code:
mysql> select * from GraveNotes WHERE GraveID=685;
+-----+---------+---------------------+---------------------------------------------------------------------------------+
| id  | GraveID | CreateDate          | NoteText                                                                        |
+-----+---------+---------------------+---------------------------------------------------------------------------------+
| 204 |     685 | 2010-08-24 00:00:00 | Obit in 'Calgary Herald' but he is not listed in any of 8 Calgary cemeteries… |
+-----+---------+---------------------+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)

When I generate a web page that demonstrates some of the database content, however, the ellipses — as an example — are not being rendered properly: see the lines that have 685 and 692 in the second column below.

Code:
202    668    Presunka, Vernon Ivan Dennis    Tue 24 Aug    Died on holidays:Was living in Devon, AB
203    682    Cameron, James Campbell    Tue 24 Aug    Exact DOD not known
204    685    De Grace, Mark Cletus    Tue 24 Aug    Obit in 'Calgary Herald' but he is not listed in any of 8 Calgary cemeteries…
205    688    Denham, Ronald Arthur    Tue 24 Aug    RCMP upright granite
206    692    Fontana, Ronald James    Tue 24 Aug    Obit in 'Calgary Herald' but he is not listed in any of 8 Calgary cemeteries…
207    697    Guyer, Charles Clarence Stephen    Tue 24 Aug    Brass marker at ground level in cement. Small crest of RCMP center top. Good condition

I have also verified that the we page has the proper setting in the header, as shown below.

Code:
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />

I even went into the CI config. php file and added the line shown below, all to no effect.

Code:
@setlocale(LC_CTYPE, array("en_CA.UTF-8", "en_US.UTF-8"));

I am at a loss to explain why the text from the database is not being rendered properly. Does anyone know if there is some step that I have missed or omitted?
#2

[eluser]mddd[/eluser]
I have found that in some cases I needed to tell the database explicitly to send the data back to me in UTF8 format. Check out SET CHARACTER and SET NAMES in the Mysql manual.
#3

[eluser]WanWizard[/eluser]
Also check in your browser if you are really receiving utf8. I've had cases where on the CI and database side everything was setup properly, but when I checked in firefox (view, character encoding), it still said "ISO 5589-1".
This was caused by a config parameter in apache's httpd.conf (AddDefaultCharset iso5589-1), which overrides document encoding if no explicit http header is set (which often isn't the case).

Either remove this directive, or change it to 'AddDefaultCharset utf8".

Alternatively, you can make sure your CI output is utf8 by adding:
Code:
// every thing we do here is text/html, in UTF-8!
$this->output->set_header('Content-Type:text/html; charset=UTF-8');
to your (MY_)Controller.
#4

[eluser]Kenneth Allen[/eluser]
[quote author="mddd" date="1282732411"]I have found that in some cases I needed to tell the database explicitly to send the data back to me in UTF8 format. Check out SET CHARACTER and SET NAMES in the Mysql manual.[/quote]

But MYSQL seems to be returning the correct information, as shown in my original email; that is, it is working from the command line without setting any options. It is only when it is rendered via CI to a browser page that it does not look right( and running entirely local to my iMac).

Here are my settings in MYSQL initially:
Code:
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+--------------------------------------------------------------+
| Variable_name            | Value                                                        |
+--------------------------+--------------------------------------------------------------+
| character_set_client     | latin1                                                       |
| character_set_connection | latin1                                                       |
| character_set_database   | latin1                                                       |
| character_set_filesystem | binary                                                       |
| character_set_results    | latin1                                                       |
| character_set_server     | latin1                                                       |
| character_set_system     | utf8                                                         |
| character_sets_dir       | /usr/local/mysql-5.0.85-osx10.5-x86_64/share/mysql/charsets/ |
+--------------------------+--------------------------------------------------------------+
8 rows in set (0.00 sec)

mysql> SELECT * FROM GraveNotes WHERE GraveID=685;
+-----+---------+---------------------+---------------------------------------------------------------------------------+
| id  | GraveID | CreateDate          | NoteText                                                                        |
+-----+---------+---------------------+---------------------------------------------------------------------------------+
| 204 |     685 | 2010-08-24 00:00:00 | Obit in 'Calgary Herald' but he is not listed in any of 8 Calgary cemeteries… |
+-----+---------+---------------------+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)

And with use of SET NAMES:

Code:
mysql> SET NAMES 'utf8';
Query OK, 0 rows affected (0.00 sec)


mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+--------------------------------------------------------------+
| Variable_name            | Value                                                        |
+--------------------------+--------------------------------------------------------------+
| character_set_client     | utf8                                                         |
| character_set_connection | utf8                                                         |
| character_set_database   | latin1                                                       |
| character_set_filesystem | binary                                                       |
| character_set_results    | utf8                                                         |
| character_set_server     | latin1                                                       |
| character_set_system     | utf8                                                         |
| character_sets_dir       | /usr/local/mysql-5.0.85-osx10.5-x86_64/share/mysql/charsets/ |
+--------------------------+--------------------------------------------------------------+
8 rows in set (0.00 sec)


mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | utf8_general_ci   |
| collation_database   | latin1_swedish_ci |
| collation_server     | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)


mysql> SELECT * FROM GraveNotes WHERE GraveID=685;
+-----+---------+---------------------+-------------------------------------------------------------------------------------+
| id  | GraveID | CreateDate          | NoteText                                                                            |
+-----+---------+---------------------+-------------------------------------------------------------------------------------+
| 204 |     685 | 2010-08-24 00:00:00 | Obit in 'Calgary Herald' but he is not listed in any of 8 Calgary cemeteries… |
+-----+---------+---------------------+-------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Now the MYSQL command line has the same error as the web page!

When I added the SET NAMES 'utf8'; command in the script that loads the data into the database, all seems to work well from that point on for the web site, but I then need that command in my MYSQL command line as well. Too bad there was no way to have the command line pick that attribute up from the database itself.
#5

[eluser]stevezissou[/eluser]
Are you using any character escaping functions anywhere in your server-side code? For instance, are you using htmlentities() and html_entity_decode()? If so, you should explicitly set the charset for these functions:

Code:
htmlentities( $myVar, ENT_QUOTES, 'UTF-8' );
html_entity_decode( $myVar, ENT_QUOTES, 'UTF-8' );
#6

[eluser]Georgi Budinov[/eluser]
Check this one here http://dev.mysql.com/doc/refman/5.0/en/c...abase.html

You can set the default character set connection internal variable to UTF8.

Edit: Also setup your client that you use for the connection to the mysql to show the results as utf8
#7

[eluser]Kenneth Allen[/eluser]
[quote author="stevezissou" date="1282757469"]Are you using any character escaping functions anywhere in your server-side code? For instance, are you using htmlentities() and html_entity_decode()? If so, you should explicitly set the charset for these functions:

Code:
htmlentities( $myVar, ENT_QUOTES, 'UTF-8' );
html_entity_decode( $myVar, ENT_QUOTES, 'UTF-8' );
[/quote]

Nt currently, but I shall keep these in mind.
#8

[eluser]WanWizard[/eluser]
The issue here is that your database (and/or tables) are setup to use latin1. This doesn't say anything about the data in the records, as far as MySQL is concerned that's just a row of bytes.

The data in the records however isn't latin1, it's utf8. This is no problem as long as you connect to the database using the same character set as defined for that database. In that case MySQL knows that no translation is needed, so it sends you the data back as-is, which, as it is utf8, works fine.

Now you connect to the database using CI. CI expects the database to use a utf8 characterset, and configures it as such (in database.php). Now you're telling MySQL that the client connecting uses a different characterset than used to store the data, and it starts converting between the charactersets. Which is the cause of your problem, as converting utf8 text using latin1-to-utf8 conversion rules creates a mess. You get the same from the commandline if you force the connection to be anything else than the database default.

Two possible solutions:
- have CI connect to the database using the characterset the database was setup with, and live with the fact that certain features don't work as expected (for example everything related to collating sequence).
- dump the database, create a new database in utf8, and load the data again. There are several articles on the internet on how to do this.




Theme © iAndrew 2016 - Forum software by © MyBB