Welcome Guest, Not a member yet? Register   Sign In
Experiencing a problem with UTF-8 and diacritics (characters with accents, etc.)
#1

[eluser]tinawina[/eluser]
Ok -

I have a form that has worked flawlessly for years. Now we are allowing people to add text to our database that might include characters with accents (eg., ñ instead of n; ó instead of o). In the end, what we want to do is translate these diacritics into HTML characters -- so, for example, ü gets changed to ü

For some reason (help) I can't get the input into the system without a weird character translation happening. Here's an example:

Input entered into the form (just words with diacritics included):

Code:
años son sobresalientes sólo existía un puñado

Echoing this form POST data, I get this weird garbled stuff back:

Code:
años son sobresalientes sólo existía un puñado

Here's some info about my form file:
Quote:The header includes
Code:
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />

The form tag looks like this:
Code:
<form id="add_title" action="/searchTitles" method="post" accept-charset="utf-8">

The input element:
Code:
<input type="text" name="title" id="title" accept-charset="utf-8" size="75" value="" />

Ultimately I would use a small script that checks for diacritics and changes them on the fly -- what shows below has been tested and works great.

Code:
$search = explode(",","À,È,Ì,Ò,Ù,à,è,ì,ò,ù,Á,É,Í,Ó,Ú,Ý,á,é,í,ó,ú,ý,Â,Ê,Î,Ô,Û,â,ê,î,ô,û,Ã,Ñ,Õ,ã,ñ,õ,Ä,Ë,Ï,Ö,Ü,Ÿ,ä,ë,ï,ö,ü,ÿ");
$replace = explode(",","À,È,Ì,Ò,Ù,à,è,ì,ò,ù,Á,É,Í,Ó,Ú,Ý,á,é,í,ó,ú,ý,Â,Ê,Î,Ô,Û,â,ê,î,ô,û,Ã,Ntilde;,Õ,ã,ñ,õ,Ä,Ë,Ï,Ö,Ü,Ÿ,ä,ë,ï,ö,ü,ÿ");
$new_input = str_replace($search, $replace, $string);

But my point is something is happening before I can even get my "hands" on this form input to change an accented letter because it is coming to me in the POST data as garbled up.

Any help, insight, ideas - appreciated!
#2

[eluser]Rok Biderman[/eluser]
You didn't post your whole view so it's impossible to know what exactly could affect your encoding. It might only be your browser. When I tried this with your input echo displayed perfectly.

Code:
public function index()
{
  $this->load->helper('form');
  if ($this->input->post()) {
   echo $this->input->post('something');
  }
  $this->load->view('test');
}

and the view test.php

Code:
<!DOCTYPE HTML>
&lt;html lang="es-ES"&gt;
&lt;head&gt;
&lt;meta charset="UTF-8"&gt;
&lt;title&gt;&lt;/title>
&lt;/head&gt;
&lt;body&gt;
&lt;?php echo form_open(''); ?&gt;
&lt;?php echo form_input('something'); ?&gt;
&lt;?php echo form_submit('submit', 'Submit'); ?&gt;
&lt;?php echo form_close(); ?&gt;
&lt;/body&gt;
&lt;/html&gt;
#3

[eluser]tinawina[/eluser]
Hi and thanks for your reply. I've continued to try to fix this problem. To simplify things I created a single text input form that you can see here: http://www.issuelab.org/lb_test. Here's the contents of the controller:

This sets up the form:

Code:
function index()
{
  $rules['title']  = "required";
  $fields['title']  = 'Title';

  $this->validation->set_rules($rules);
  $this->validation->set_fields($fields);

  $this->validation->set_error_delimiters('<span class="error">', '</font>');

  if ($this->validation->run() == false)
  {
   echo $this->the_form();
  }
  else
  {
   echo $this->input->post('title');
  }
}

This sets up the view which I simply echo to the screen:

Code:
function the_form()
{
  $string = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  &lt;html&gt;
  &lt;head&gt;
  &lt;meta http-equiv="Content-Type" content="text/html; charset=utf-8"&gt;
  &lt;/head&gt;
  &lt;body&gt;
  &lt;form action="/lb_test/index" method="post" accept-charset="utf-8"&gt;
  <p>
   Title: &lt;input type="text" name="title" id="title" accept-charset="utf-8" size="75" value="" /&gt;
  </p>
  <p>
   &lt;input type="submit" name="submit" value="Search for duplicates" /&gt;
  </p>
  &lt;/form&gt;
  &lt;/body&gt;
  &lt;/html&gt;';
  return $string;
}

When I input "años son sobresalientes sólo existía un puñado" into the input box and click submit, I get back "años son sobresalientes sólo existía un puñado".

Any ideas about why this is - most appreciated!
#4

[eluser]Rok Biderman[/eluser]
You're presenting charset only if validation==false. If you check out the source, you'll find you only get echo of string, no encoding is set.

This works, but is fugly:

Code:
function index()
{
  $this->load->library('form_validation');
  $this->form_validation->set_rules('title', 'Title', 'required');

  if ($this->form_validation->run() == false)
  {
    echo $this->the_form(null);
  }
  else
  {
   echo $this->the_form($this->input->post('title'));
  }
  }

function the_form($var = null)
{
  if (!$var) {
      $var = '&lt;form action="krneki" method="post" accept-charset="utf-8"&gt;&lt;p>Title: &lt;input type="text" name="title" id="title" accept-charset="utf-8" size="75" value="" /&gt;&lt;/p><p>&lt;input type="submit" name="submit" value="Search for duplicates" /&gt;&lt;/p>&lt;/form&gt;';
  }
  $string = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   &lt;html&gt;
   &lt;head&gt;
   &lt;meta http-equiv="Content-Type" content="text/html; charset=utf-8"&gt;
   &lt;/head&gt;
   &lt;body&gt;'.$var.
   '&lt;/body&gt;
   &lt;/html&gt;';
   return $string;
  }

Just so you know, a kitten dies every time you write your html in an echo. Says so here. Put it in your view instead.
#5

[eluser]tinawina[/eluser]
Hi Coccodrillo - thanks for responding again. I see your point and changed my code so that the form input echos (only echoing from controller because I'm testing, never in actual production site -- no kittens are harmed during testing!) to the screen with the appropriate character set info. I have a breakdown of how this panned out in my testing below. First - the code change:

Code:
echo
'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
&lt;html&gt;
&lt;head&gt;
&lt;meta http-equiv="Content-Type" content="text/html; charset=utf-8"&gt;
&lt;/head&gt;
&lt;body&gt;
' . $this->input->post('title') . '
&lt;/body&gt;
&lt;/html&gt;';

I am checking my output in Chrome, Firefox, and IE, on Windows XP and Ubuntu Linux. Here's a breakdown of my testing -- an "ok" means the input echoed to the screen with diacritics in place; "no" means I got something garbled. I don't have IE on my Linux machine so no testing for that. Here are the results:

Code:
------------With proper HTML---|---Without proper HTML------
             Windows | Linux   |     Windows | Linux
                     |         |             |
Chrome           ok  |  ok     |        no   |   no
Firefox          ok  |  ok     |        ok   |   ok
IE               ok  |  --     |        no   |   --

So this change does what I want -- input goes in with diacritics and comes out with diacritics intact. Perfect. HOWEVER: When I try to do something with this input other than echo it to the screen, I'm back to square one.

I just tried to translate the diacritics in my string to HTML characters, and then echo to screen. Here's the code:

Code:
$search = explode(",","À,È,Ì,Ò,Ù,à,è,ì,ò,ù,Á,É,Í,Ó,Ú,Ý,á,é,í,ó,ú,ý,Â,Ê,Î,Ô,Û,â,ê,î,ô,û,Ã,Ñ,Õ,ã,ñ,õ,Ä,Ë,Ï,Ö,Ü,Ÿ,ä,ë,ï,ö,ü,ÿ");
$replace = explode(",","&Agrave;,&Egrave;,&Igrave;,&Ograve;,&Ugrave;,&agrave;,&egrave;,&igrave;,&ograve;,&ugrave;,&Aacute;,&Eacute;,&Iacute;,&Oacute;,&Uacute;,&Yacute;,&aacute;,&eacute;,&iacute;,&oacute;,&uacute;,&yacute;,&Acirc;,&Ecirc;,&Icirc;,&Ocirc;,&Ucirc;,&acirc;,&ecirc;,&icirc;,&ocirc;,&ucirc;,&Atilde;,Ntilde;,&Otilde;,&atilde;,&ntilde;,&otilde;,&Auml;,&Euml;,&Iuml;,&Ouml;,&Uuml;,&Yuml;,&auml;,&euml;,&iuml;,&ouml;,&uuml;,&yuml;");
$new_input = str_replace($search, $replace, $this->input->post('title'));
  
echo '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
&lt;html&gt;
&lt;head&gt;
&lt;meta http-equiv="Content-Type" content="text/html; charset=utf-8"&gt;
&lt;/head&gt;
&lt;body&gt;
' . $new_input . '
&lt;/body&gt;
&lt;/html&gt;';

I entered "años son sobresalientes sólo existía un puñado" into the form and this is what I see on my screen:

Code:
a�os son sobresalientes s�lo exist�a un pu�ado

If I simply echo $new_input to the screen without any HTML directives I get the same as I would with directives in place -- garbled text.

When all of this is said and done, what I need to be able to do is 1) accept a text string that might include diacritics, 2) translate any diacritics in the string into HTML entities, and 3) store the string in my database. I don't get why echoing to the screen gives me the right output, but the same form input can't be manipulated.

Any other ideas you might have are appreciated! Thanks for reading all of this!

#6

[eluser]tinawina[/eluser]
Hold the phone - I got it to work. I ran utf8_decode() on the input and then ran it through my diacritics clean up script and it's doing what I need it to do:

Code:
$input = utf8_decode($this->input->post('title')); // Not sure why I would need to decode here, but this fixes it!
  
$search = explode(",","À,È,Ì,Ò,Ù,à,è,ì,ò,ù,Á,É,Í,Ó,Ú,Ý,á,é,í,ó,ú,ý,Â,Ê,Î,Ô,Û,â,ê,î,ô,û,Ã,Ñ,Õ,ã,ñ,õ,Ä,Ë,Ï,Ö,Ü,Ÿ,ä,ë,ï,ö,ü,ÿ");
$replace = explode(",","&Agrave;,&Egrave;,&Igrave;,&Ograve;,&Ugrave;,&agrave;,&egrave;,&igrave;,&ograve;,&ugrave;,&Aacute;,&Eacute;,&Iacute;,&Oacute;,&Uacute;,&Yacute;,&aacute;,&eacute;,&iacute;,&oacute;,&uacute;,&yacute;,&Acirc;,&Ecirc;,&Icirc;,&Ocirc;,&Ucirc;,&acirc;,&ecirc;,&icirc;,&ocirc;,&ucirc;,&Atilde;,Ntilde;,&Otilde;,&atilde;,&ntilde;,&otilde;,&Auml;,&Euml;,&Iuml;,&Ouml;,&Uuml;,&Yuml;,&auml;,&euml;,&iuml;,&ouml;,&uuml;,&yuml;");
$new_input = str_replace($search, $replace, $input);
  
echo  $new_input; // $this->input->post('title')

I input "son sobresalientes sólo existía un puñado" -- IN IE -- and got back this as the source code which is correct:

Code:
a&ntilde;os son sobresalientes s&oacute;lo exist&iacute;a un pu&ntilde;ado

So - I guess that does it. Hopefully!




Theme © iAndrew 2016 - Forum software by © MyBB