CodeIgniter Forums
utf-8 character count help - Printable Version

+- CodeIgniter Forums (https://forum.codeigniter.com)
+-- Forum: Archived Discussions (https://forum.codeigniter.com/forumdisplay.php?fid=20)
+--- Forum: Archived Development & Programming (https://forum.codeigniter.com/forumdisplay.php?fid=23)
+--- Thread: utf-8 character count help (/showthread.php?tid=16230)



utf-8 character count help - El Forum - 02-28-2009

[eluser]Bahodir[/eluser]
Hi,

Is there a build-in function for counting utf8-characters?
I want to limit long lines to specific length using character_limiter(), but PHP is not giving me the proper lengths of a string.
For example, the following code should output 'при...', but its echoing 'привет...'
Code:
<?php
$str = "привет";
echo character_limiter($str, 3);

Any help would be awesome.


utf-8 character count help - El Forum - 02-28-2009

[eluser]pistolPete[/eluser]
Try the mb_strlen function:

http://www.php.net/mb_strlen


utf-8 character count help - El Forum - 02-28-2009

[eluser]Bahodir[/eluser]
I tried. It doesn't work.


utf-8 character count help - El Forum - 02-28-2009

[eluser]pistolPete[/eluser]
What's your php internal encoding setting?
Try:
Code:
mb_strlen($string,'UTF-8');

If that doesn't work, try this:
Code:
$strlen = preg_match_all("/.{1}/us",$utf8string,$dummy);



utf-8 character count help - El Forum - 02-28-2009

[eluser]Bahodir[/eluser]
pistolPete,
thank you for your help.
This code you gave me correctly counts the number of characters.
Code:
mb_strlen($string,'UTF-8');

But how do I trim my string using character_limiter()?
I tried this
Code:
$str = "привет";
echo character_limiter(utf8_decode($str), 3);

And, i think it is giving me the correct length, except the decoded characters show up as ????...

Now how can I show the correct characters?


utf-8 character count help - El Forum - 02-28-2009

[eluser]Bahodir[/eluser]
[quote author="pistolPete" date="1235886037"]What's your php internal encoding setting?
[/quote]

Oh, I think it is utf-8


utf-8 character count help - El Forum - 02-28-2009

[eluser]pistolPete[/eluser]
[quote author="Bahodir" date="1235891451"]Oh, I think it is utf-8[/quote]

You can check it using this function:
Code:
/* Display current internal character encoding */
echo mb_internal_encoding();

I modified the helper to work with utf8 strings:
Code:
function character_limiter($str, $n = 500, $end_char = '…')
{
        // set encoding to UTF-8
        mb_internal_encoding('UTF-8');
        mb_regex_encoding('UTF-8');
        
        if (mb_strlen($str) < $n)
        {
            return $str;
        }
        
        $str = preg_replace("/\s+/", ' ', str_replace(array("\r\n", "\r", "\n"), ' ', $str));
        if (mb_strlen($str) <= $n)
        {
            return $str;
        }

        $out = '';
        $split_str = mb_split(' ',trim($str));
        
        foreach ($split_str as $val)
        {
            $out .= $val.' ';
            
            if (mb_strlen($out) >= $n)
            {
                $out = trim($out);
                return (mb_strlen($out) == mb_strlen($str)) ? $out : $out.$end_char;
            }        
        }
}

Have a look at "Extending" Helpers.


utf-8 character count help - El Forum - 03-03-2009

[eluser]Bahodir[/eluser]
thank you once more,

i haven't checked it yet, but i hope it works