CodeIgniter Forums
Tutorial - encoding issue - Printable Version

+- CodeIgniter Forums (https://forum.codeigniter.com)
+-- Forum: Using CodeIgniter (https://forum.codeigniter.com/forumdisplay.php?fid=5)
+--- Forum: General Help (https://forum.codeigniter.com/forumdisplay.php?fid=24)
+--- Thread: Tutorial - encoding issue (/showthread.php?tid=69240)



Tutorial - encoding issue - Mokhona - 10-24-2017

Hello

I'm new to CodeIgniter and I did the tutorial part about creating a news application.
I got an issue at the end with the slug encoding.
If I wrote words with accents (ex: é à è), accents are replaced with ??? in my slug field in my database whereas they are displayed correctly in others fields (title and text). I looked at url_title but there isn't an encode param.

[Image: 1508848542-issue.jpg]

In application/config/database.php, char_set is utf8 and dbcollat is utf8_general_ci


Thanks for help Smile


RE: Tutorial - encoding issue - ivantcholakov - 10-24-2017

I think this line is the cause: https://github.com/bcit-ci/CodeIgniter/blob/3.1.6/system/helpers/url_helper.php#L508


RE: Tutorial - encoding issue - PaulD - 10-24-2017

I think ivantcholakov is right. A quick look at php.net suggests this:

PHP Code:
$lower_case_str mb_strtolower($str, 'UTF-8'); 

http://php.net/manual/en/function.mb-strtolower.php

Although there are some other functions in the strtolower user notes like this:

PHP Code:
<?php
function strtolower_utf8($inputString) {
    $outputString    utf8_decode($inputString);
    $outputString    strtolower($outputString);
    $outputString    utf8_encode($outputString);
    return $outputString;
}
?>

Paul.


RE: Tutorial - encoding issue - jreklund - 10-24-2017

Personally I'm using the convert_accented_characters before using the string with url_title. So that you get e a e instead of é à è for cleaner URLs. The previous posts are correct, strtolower aren't UTF-8 safe.

You can add this into application/helpers/url_helper.php, i have replaced it for you.
https://pastebin.com/3dhY5SmY

@PaulD: Your second quote will replace all illegal characters (in ISO-8859-1) with ?, so that's not recommended in my opinion.

Just took a quick look at CI 2.2.6 and compared it with 3.1.6. In the old version all illegal characters where removed but they are now kept. And now it generates illegal URLs, in my opinion.

PHP Code:
$trans = array(
    
'&.+?;'                 => '',
    
'[^a-z0-9 _-]'          => '',
    
'\s+'                   => $separator,
    
'('.$q_separator.')+'   => $separator
); 

PHP Code:
$trans = array(
    
'&.+?;'                  => '',
    
'[^\w\d _-]'             => '',
    
'\s+'                    => $separator,
    
'('.$q_separator.')+'    => $separator
); 

https://github.com/bcit-ci/CodeIgniter/commit/6f371aeb25ad3b8b2934401661632aec468540f1

This is referenced here:
https://github.com/bcit-ci/CodeIgniter/issues/4993

Personally, i think it should be changed back into, if everything else than a-z are considered illegal. Any thoughts on this?
PHP Code:
'[^a-z0-9 _-]'          => ''



RE: Tutorial - encoding issue - Mokhona - 10-25-2017

Hello

thanks for your replies.
I will use convert_accented_characters before using url_title to have cleaner urls without accents.


RE: Tutorial - encoding issue - InsiteFX - 10-25-2017

Read this article on supporting the full Unicode character set.

How to support full Unicode in MySQL databases