Welcome Guest, Not a member yet? Register   Sign In
Sanitizing user input for blog commenting
#1

[eluser]mikegioia[/eluser]
Hi guys -

I'm in the process of writing a script to handle user comments for a blog. Right now I'm stripping out unwanted tags and running the comment through CI's XSS filter and then the typography helper to format it nice.

The problem I'm having is that unclosed tags like '<em>Text</e>' are breaking in the page. I'd like the script to render that as text and not as code instead of trying to output the code and having it break.

Another problem is that I'm stripping out unwanted tags like '<div>'. I'd like it to just display the text '<div>' instead of stripping that text out completely.

Are there any libraries anyone's used that can do this for me? I've looked around and I can't find anything.

Thanks for any of your help,
Mike
#2

[eluser]crikey[/eluser]
Hi Mike

Perhaps look into the htmlentities or htmlspecialchars functions. They convert certain characters to their html entity equivalents.

Hope that helps.
#3

[eluser]Aea[/eluser]
htmlentities and you're usually done, that is unless you need to use things like <b></b>, then it's a bit more challenging depending on your implementation, easiest probably would be to go through and search for the htmlentities result of <b> and convert it back to html Wink
#4

[eluser]onejaguar[/eluser]
I use HTMLPurifier for this sort of thing. It strips any tags you don't want (leaving text intact) removes XSS attacks AND has the bonus of validating the HTML so you never have to worry about unclosed tags.


Go to http://htmlpurifier.org/download.html and download the "Lite Distribution". Place it in your "application/libraries" folder so the directory structure looks like:

application/libraries/HTMLPurifier/
(.auto.php, .func.php, etc)
application/libraries/HTMLPurifier/HTMLPurifier/
(many more .phps and subdirectories)

Then create a new file, application/libraries/html_clean.php :

Code:
&lt;?php  if (!defined('BASEPATH')) exit('No direct script access allowed');

class Html_clean {

  var $purifier;

  function __construct() {
    require_once('HTMLPurifier/HTMLPurifier.auto.php');
    $config = HTMLPurifier_Config::createDefault();

    // Set all your custom config options here    
    $config->set('Core', 'Encoding', 'utf-8');
    $config->set('HTML', 'Doctype', 'HTML 4.01 Strict');

    $this->purifier = new HTMLPurifier($config);
  }

  function purify($html) {
    return $this->purifier->purify($html);
  }

}

?&gt;


And do this to process something in your controller:

Code:
$this->load->library('html_clean');
$their_post = $this->html_clean->purify($their_post);

Season to taste. Note I had to call the CodeIgniter "wrapper" library something other than HTMLPurify because the script itself is an object with that name. I chose html_clean but you can call it whatever you want. You could have CodeIgniter call the HTMLPutify object directly but it requires making a small change to the files; with my wrapper I can just copy and past over the HTMLPurifier directly when it's updated for the latest XSS attacks. If you want to use additional HTMLPurifier functionality you can make methods for them in the the CodeIgniter wrapper or call $this->html_clean->purifier-> directly.
#5

[eluser]mikegioia[/eluser]
I appreciate all the suggestions. I had tried the htmlentities approach but I think I'd just end up writing a super elaborate parser.

Thanks for that library and help, onejaguar! I think thats exactly what I was looking for. I'll definitely get that up and running today.

Mike




Theme © iAndrew 2016 - Forum software by © MyBB