Welcome Guest, Not a member yet? Register   Sign In
sanitize textarea input for database insertion - best practice

I thought I'd start this thread after a pretty thorough effort of searching / reading.

My Situation
I have a simple textarea that users are copy/pasting blocks of "text" (mostly from MS Word) into.
This text can contain quite a lot of extended characters, such as eacute, right slanted apostrophe's, open and closing double quotes, semi-colons, etc.

This is admin protected and not public facing.

Right now I am simply:
1.) Applying form validation (using a custom alpha callback that allows these certain characters)
2.) using the input class
3.) using CI's insert.

I attempted to use CI's $this->db->escape(), but it was double escaping some things.. for example, inserting "\r\n" into the textarea fields (when updating), and adding starting / ending single quotes.

Anyway, what is 'Best Practice' for this type of situation?  Should be doing any more sanitizing beyond my 3 steps above?



Cutting and pasting from MSWord can be a pain to deal with. I'm getting the impression that that is what is to be expected. Any thoughts to filtering out the MSWord special characters? Any thoughts to using an WYSIWYG editor instead of a regular textarea? Also, CI has built in escaping which could cut down on any current issues you may be having. Also, are you willing to display the code that you are using to insert the copied text into the database?

You can try this one, there are others out there on the web.

PHP Code:
function cleanText(string $string) : string
   $search = [
       "\xC2\xAB"    // « (U+00AB) in UTF-8
       "\xC2\xBB"    // » (U+00BB) in UTF-8
       "\xE2\x80\x98"// ‘ (U+2018) in UTF-8
       "\xE2\x80\x99"// ’ (U+2019) in UTF-8
       "\xE2\x80\x9A"// ‚ (U+201A) in UTF-8
       "\xE2\x80\x9B"// ‛ (U+201B) in UTF-8
       "\xE2\x80\x9C"// “ (U+201C) in UTF-8
       "\xE2\x80\x9D"// ” (U+201D) in UTF-8
       "\xE2\x80\x9E"// „ (U+201E) in UTF-8
       "\xE2\x80\x9F"// ‟ (U+201F) in UTF-8
       "\xE2\x80\xB9"// ‹ (U+2039) in UTF-8
       "\xE2\x80\xBA"// › (U+203A) in UTF-8
       "\xE2\x80\x93"// – (U+2013) in UTF-8
       "\xE2\x80\x94"// — (U+2014) in UTF-8
       "\xE2\x80\xA6"// … (U+2026) in UTF-8

   $replacements = [

   return str_replace($search$replacements$string);

What did you Try? What did you Get? What did you Expect?

Joined CodeIgniter Community 2009.  ( Skype: insitfx )

Thx for the feedback. Below seems to work fairly well.

@InsiteFX.. I threw your code into my helper file. Thx.

So here is what I have:
- Validate:
$this->form_validation->set_rules('f_text', 'Review Text', 'required|trim|callback_customAlphaTwo');
 - the callback
   public function customAlphaTwo($str)
       if (!preg_match('/^[a-zA-Z0-9 .,\-\'\&\–\—\«\»\…\$\[\]\/\(\)\“\”\‘\’\!\;\:\é]*$/m', $str)) {
           return FALSE;
       } else {
           return TRUE;
- clean Text
$tCleanText = cleanText($this->input->post('f_text'));

- Save Data
$aDataToSave = array(
 'far_text' => $tCleanText,
 // more stuff
$tinsert = $this->admindata->savedata($aDataToSave);

Note: cleanText() is a helper

- Insert
   // Save Data
   public function savedata($tdata)
       return $this->db->insert('db_table', $tdata);

Theme © iAndrew 2016 - Forum software by © MyBB