Experiencing a problem with UTF-8 and diacritics (characters with accents, etc.) |
[eluser]tinawina[/eluser]
Ok - I have a form that has worked flawlessly for years. Now we are allowing people to add text to our database that might include characters with accents (eg., ñ instead of n; ó instead of o). In the end, what we want to do is translate these diacritics into HTML characters -- so, for example, ü gets changed to ü For some reason (help) I can't get the input into the system without a weird character translation happening. Here's an example: Input entered into the form (just words with diacritics included): Code: años son sobresalientes sólo existía un puñado Echoing this form POST data, I get this weird garbled stuff back: Code: años son sobresalientes sólo existÃa un puñado Here's some info about my form file: Quote:The header includes Ultimately I would use a small script that checks for diacritics and changes them on the fly -- what shows below has been tested and works great. Code: $search = explode(",","À,È,Ì,Ò,Ù,à,è,ì,ò,ù,Á,É,Í,Ó,Ú,Ý,á,é,í,ó,ú,ý,Â,Ê,Î,Ô,Û,â,ê,î,ô,û,Ã,Ñ,Õ,ã,ñ,õ,Ä,Ë,Ï,Ö,Ü,Ÿ,ä,ë,ï,ö,ü,ÿ"); But my point is something is happening before I can even get my "hands" on this form input to change an accented letter because it is coming to me in the POST data as garbled up. Any help, insight, ideas - appreciated!
[eluser]Rok Biderman[/eluser]
You didn't post your whole view so it's impossible to know what exactly could affect your encoding. It might only be your browser. When I tried this with your input echo displayed perfectly. Code: public function index() and the view test.php Code: <!DOCTYPE HTML>
[eluser]tinawina[/eluser]
Hi and thanks for your reply. I've continued to try to fix this problem. To simplify things I created a single text input form that you can see here: http://www.issuelab.org/lb_test. Here's the contents of the controller: This sets up the form: Code: function index() This sets up the view which I simply echo to the screen: Code: function the_form() When I input "años son sobresalientes sólo existía un puñado" into the input box and click submit, I get back "años son sobresalientes sólo existÃa un puñado". Any ideas about why this is - most appreciated!
[eluser]Rok Biderman[/eluser]
You're presenting charset only if validation==false. If you check out the source, you'll find you only get echo of string, no encoding is set. This works, but is fugly: Code: function index() Just so you know, a kitten dies every time you write your html in an echo. Says so here. Put it in your view instead.
[eluser]tinawina[/eluser]
Hi Coccodrillo - thanks for responding again. I see your point and changed my code so that the form input echos (only echoing from controller because I'm testing, never in actual production site -- no kittens are harmed during testing!) to the screen with the appropriate character set info. I have a breakdown of how this panned out in my testing below. First - the code change: Code: echo I am checking my output in Chrome, Firefox, and IE, on Windows XP and Ubuntu Linux. Here's a breakdown of my testing -- an "ok" means the input echoed to the screen with diacritics in place; "no" means I got something garbled. I don't have IE on my Linux machine so no testing for that. Here are the results: Code: ------------With proper HTML---|---Without proper HTML------ So this change does what I want -- input goes in with diacritics and comes out with diacritics intact. Perfect. HOWEVER: When I try to do something with this input other than echo it to the screen, I'm back to square one. I just tried to translate the diacritics in my string to HTML characters, and then echo to screen. Here's the code: Code: $search = explode(",","À,È,Ì,Ò,Ù,à,è,ì,ò,ù,Á,É,Í,Ó,Ú,Ý,á,é,í,ó,ú,ý,Â,Ê,Î,Ô,Û,â,ê,î,ô,û,Ã,Ñ,Õ,ã,ñ,õ,Ä,Ë,Ï,Ö,Ü,Ÿ,ä,ë,ï,ö,ü,ÿ"); I entered "años son sobresalientes sólo existía un puñado" into the form and this is what I see on my screen: Code: aÃ�os son sobresalientes sÃ�lo existÃ�a un puÃ�ado If I simply echo $new_input to the screen without any HTML directives I get the same as I would with directives in place -- garbled text. When all of this is said and done, what I need to be able to do is 1) accept a text string that might include diacritics, 2) translate any diacritics in the string into HTML entities, and 3) store the string in my database. I don't get why echoing to the screen gives me the right output, but the same form input can't be manipulated. Any other ideas you might have are appreciated! Thanks for reading all of this!
[eluser]tinawina[/eluser]
Hold the phone - I got it to work. I ran utf8_decode() on the input and then ran it through my diacritics clean up script and it's doing what I need it to do: Code: $input = utf8_decode($this->input->post('title')); // Not sure why I would need to decode here, but this fixes it! I input "son sobresalientes sólo existía un puñado" -- IN IE -- and got back this as the source code which is correct: Code: años son sobresalientes sólo existía un puñado So - I guess that does it. Hopefully! |
Welcome Guest, Not a member yet? Register Sign In |