Login

09-14-2010, 11:44 PM

[eluser]DimCI[/eluser]
Found this issue a day ago, and couldn't find any info about on this forum.

Well, I have a field in one of my mysql tables for web pages TITLE element, for example. The length is 70 chars. The web site under construction is in Russian.

I found that when a page's title element contains only cyrillic letters, one half of them 'd be lost after submitting the add/edit form... If we enter only latin letters, it's OK. if we uses both, the field content'd be shortened by 1/2 of cyrillic letters.. e.g. if we have 20 chars in latin and 20 chars cyrillic, the result 'd be 30 chars.

Found that this is due to db_clean() usage in the corresponding model. When I removed db_clean (simply $_POST['title'] instead of db_clean($_POST['title'],70)), the bug disappeared. Global xss_clean true/false seems has not any influence on this issue.

May be this concerns xss_clean regular expressions functionality imho... Guess some locale settings should be corrected?

09-15-2010, 12:35 AM

[eluser]WanWizard[/eluser]
Looks like your PHP installation doesn't have multibyte support installed, needed for proper utf-8 handling. Make sure the php package mbstring is installed.

09-15-2010, 01:44 AM

[eluser]DimCI[/eluser]
[quote author="WanWizard" date="1284550540"]Looks like your PHP installation doesn't have multibyte support installed, needed for proper utf-8 handling. Make sure the php package mbstring is installed.[/quote]

Tnank you but.. nope - mbstring is enabled..

And there're some differences between my home server and the isp one: isp installation permits Ukrainian text, but the home PC server does not like any: nor Rus, nor Ukr...

-----------------
home: PHP Version 5.3.2-1ubuntu4.2

mbstring
Multibyte Support enabled
Multibyte string engine libmbfl
HTTP input encoding translation disabled

mbstring extension makes use of "streamable kanji code filter and converter", which is distributed under the GNU Lesser General Public License version 2.1.

Multibyte (japanese) regex support enabled
Multibyte regex (oniguruma) backtrack check On
Multibyte regex (oniguruma) version 4.7.1

Directive Local Value Master Value
mbstring.detect_order no value no value
mbstring.encoding_translation Off Off
mbstring.func_overload 0 0
mbstring.http_input pass pass
mbstring.http_output pass pass
mbstring.http_output_conv_mimetypes ^(text/|application/xhtml\+xml) ^(text/|application/xhtml\+xml)
mbstring.internal_encoding no value no value
mbstring.language neutral neutral
mbstring.strict_detection Off Off
mbstring.substitute_character no value no value

-----------

ISP Tongue

HP Version 5.2.14

mbstring
Multibyte Support enabled
Multibyte string engine libmbfl
Multibyte (japanese) regex support enabled
Multibyte regex (oniguruma) version 4.4.4
Multibyte regex (oniguruma) backtrack check On

mbstring extension makes use of "streamable kanji code filter and converter", which is distributed under the GNU Lesser General Public License version 2.1.

Directive Local Value Master Value
mbstring.detect_order no value no value
mbstring.encoding_translation Off Off
mbstring.func_overload 0 0
mbstring.http_input pass pass
mbstring.http_output pass pass
mbstring.internal_encoding no value no value
mbstring.language neutral neutral
mbstring.strict_detection Off Off
mbstring.substitute_character no value no value

----------------

In UTF-8 one Cyr. symbol takes 2 bytes, not 1 as in the case of usual (latin) characters... The question is how to treat this and preserve utf-8 settings...