Welcome Guest, Not a member yet? Register   Sign In
XSS validation and foreign language characters like ëäï
#1

[eluser]moonbeetle[/eluser]
I'm building some interfaces for populating a database table with data.
In the form I setup I use some <textarea> fields among other form elements.
All input is XSS validated.

e.g. $description = $this->input->post('description',TRUE);

So far so good, but the language of my users is Dutch and in Duct we have some characters that don't exist in english. Like the "ë" which is used in words like "zeeën" (seas), "ideeën" (ideas) etc.

For some reason when I submitted my form, all fields containing such words where caught by the XSS filter and for those fields no data was entered in the database.
So I have two questions:

1. is it possible to allow certain characters to the XSS filter?
2. what's the best way to workaround this problem and still make it safe?
#2

[eluser]Derek Jones[/eluser]
Which version of CodeIgniter are you running, and are you auto-sanitizing input with XSS clean?
#3

[eluser]moonbeetle[/eluser]
I'm running 1.5.4 and I'm not auto sanitizing ( $config['global_xss_filtering'] = FALSE; )
hence why I call it explicitly with the TRUE argument.

The charset of the view page is set to iso-8859-1, which should be OK. This is also the charset being used by the security helper that comes with CI
#4

[eluser]Derek Jones[/eluser]
My initial hunch would be something involving the entity standardization and those high ASCII characters coming through with ISO-8859-1 encoding. You really should consider serving your site with UTF-8, but failing that, I would simply step through the xss_clean() function with var_dump($str); until you see where the problem is (make sure you do it from an ISO-8859-1 form). There's also the possibility that the PHP / PCRE version on the server is resulting in one of the preg_replace()'s returning a NULL string. The var_dump() will catch that as well. Give that go, and let me know what you find.
#5

[eluser]moonbeetle[/eluser]
Derek, that's some very good advice. I did the var_dump() test.

With a view serving as ISO-8859-1
The result of var_dump() : string(0) ""

However if I do what you suggested, serving the the view page as UTF-8
Code:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
The result of var_dump() : string(7) "ideeën"

So I guess I only have to make sure I serve my webpages as UTF-8.
Hm....if UTF-8 is such a standaard and a solution to the different charstes, then why is ISO-8859-1 so much used still?
#6

[eluser]Derek Jones[/eluser]
Out of curiosity, following which preg_replace() was the string being emptied?

Quote:Hm....if UTF-8 is such a standaard and a solution to the different charstes, then why is ISO-8859-1 so much used still?

I really couldn't say; I've never understood why people choose ISO-8859-1 over UTF-8, even for English speaking / Western European sites. It's certainly more convenient in my eyes to use UTF-8. Perhaps there are legacy applications that people are using that only support (or by default use) ISO-8859-1.
#7

[eluser]moonbeetle[/eluser]
line 482 in /system/libraries/input.php

Code:
$str = preg_replace('#(&\#*\w+)[\x00-\x20]+;#u',"\\1;",$str);

ah, apparantly there was a similar thread allready:
http://ellislab.com/forums/viewthread/48460/

My apologies for starting a new thread on this topic.
#8

[eluser]Derek Jones[/eluser]
That's not the current version, either, joris. The xss_clean() in 1.5.4 does not use the 'u' modifier on any PCRE functions. That will result in a NULL (not empty) string in PHP environments that do not support unicode PCRE patterns.




Theme © iAndrew 2016 - Forum software by © MyBB