Welcome Guest, Not a member yet? Register   Sign In
Problem retrieving words with accents when they come from MySQL
#1

[eluser]caperquy[/eluser]
Hello
I am implementing a Codeigniter application where I have to retrieve from a MySQL database all documents which contain a specific word, no matter the spelling. In my example I am looking for Eglise which can be written Eglise, église or Église.
If I run the following instructions :

Code:
$pattern="/(e|è|é|ê|ë)gl(i|ì|í|î|ï)s(e|è|é|ê|ë)/i";
$texte="xxxx églises yyyy Eglise zzzz Église tttt";
$nb=preg_match_all($pattern, $texte, $matches, PREG_OFFSET_CAPTURE);
echo "Matches found : $nb <br />";
for ($i=0;$i<$nb;$i++)
    {
        echo "Matches[0][$i][0] = ".$matches[0][$i][0]."<br />";
    }

three matches are found which means that all three spellings have been detected.

If now the text provided comes from the MySQL database then the spelling église is not found. What can explain that difference.
Many thanks to whoever can give me a clue.

CapErquy
#2

[eluser]xatrix[/eluser]
http://www.phpbuilder.com/board/showthre...t=10344217
#3

[eluser]caperquy[/eluser]
I looked to the document you mentioned. It seems to me that utf8_general_ci should be appropriate.
Does that mean that in config.php I should code :

Code:
$config['charset'] = "utf8_general_ci";

instead of

Code:
$config['charset'] = "UTF-8";

CapErquy
#4

[eluser]rogerwaldrup[/eluser]
http://forums.mysql.com/read.php?103,392215,392215 check this out.
#5

[eluser]caperquy[/eluser]
I checked what you said. I then changed all fields in the MySQL table I am using to utf8_general_ci first then to utf8_unicode_ci.
In both cases it still does not work.
I really do not know what to do.
CapErquy
#6

[eluser]xatrix[/eluser]
No, no. When you create your database or table (in phpMyAdmin or other tool) you should specify your preferred collation (i.e.: utf8_general_ci).

What is the result of the query that you're running preg_match_all?
#7

[eluser]Kobus M[/eluser]
One more thing you need to consider is that your document's character set should be exactly the same as your database character set. These are set in the &lt;head&gt;&lt;/head> tags for HTML, and as a call to the PHP header() function in PHP.

In HTML it would be something like this:

Code:
&lt;meta http-equiv="Content-Type" content="text/html;charset=UTF-8" /&gt;

In PHP it would look like this:

Code:
header('Content-type : text/html; charset=utf-8');

If your markup/script language does not coincide with your database charset, you will have some problems.

Kobus
#8

[eluser]caperquy[/eluser]
To answer xatrix question this is the text that I pass to the preg_match_all function :

Chaque étape est montrée dans des églises de différents coins de France

the word église I am looking for is really there

On the other hand I did what Kobus said :
I recreated my database using the following command :

Code:
CREATE DATABASE `ci_docavy` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;

I also set my HTML code to look like this :

Code:
&lt;html &gt;
&lt;head&gt;
&lt;meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /&gt;
&lt;meta http-equiv="Content-Style-Type" content="text/css" /&gt;
&lt;?php
header('Content-type : text/html; charset=utf-8');
?&gt;

Unfortunately there is no change.

CapErquy
#9

[eluser]Kobus M[/eluser]
[quote author="caperquy" date="1296836064"]To answer xatrix question this is the text that I pass to the preg_match_all function :

Chaque étape est montrée dans des églises de différents coins de France

the word église I am looking for is really there

On the other hand I did what Kobus said :
I recreated my database using the following command :

Code:
CREATE DATABASE `ci_docavy` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;

I also set my HTML code to look like this :

Code:
&lt;html &gt;
&lt;head&gt;
&lt;meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /&gt;
&lt;meta http-equiv="Content-Style-Type" content="text/css" /&gt;
&lt;?php
header('Content-type : text/html; charset=utf-8');
?&gt;

Unfortunately there is no change.

CapErquy[/quote]

There are a few final things I can think of for you to try:

1. Check the setting of your character sets in your server configuration files. php.ini, my.cfg, etc. All should also be set to UTF8
2. Even when recreating your database with UTF8, individual fields could still be set to ISO-8859-1 or something else. Make sure all data is in UTF8 format too.

If this does not help, I am sorry - I am out of ideas. Having struggled with this myself, I solved my issues by doing the things I suggested.

Kobus




Theme © iAndrew 2016 - Forum software by © MyBB