Welcome Guest, Not a member yet? Register   Sign In
Filtering out non Latin-1 characters
#1

[eluser]cairo140[/eluser]
I'm having a problem filtering out non Latin-1 (ISO/IEC 8859-1) characters from a textarea POST submission.

It's just a run of the mill textarea with no JS. I fetch its contents with $this->input->post. At this point, it includes non-Latin-1 characters: curly quotes and the em dash. An example of a sentence is:

Code:
I know you’ve seen the video.

The apostrophe is a curly.

I run this:

Code:
$output = preg_replace('/[^\x20-\x7e^\x0a^\x0d^\xa0-\xff]+/','FOO', $output);

But all that gives me is this:

Code:
I know you?FOOve seen the video.

Kudos to the regex for finding the invalid character, but it fails to properly encapsulate it for replacement. Any ideas? Something about CI's preprocessing of encoded characters I'm missing?




Theme © iAndrew 2016 - Forum software by © MyBB