Welcome Guest, Not a member yet? Register   Sign In
Input Class & HTML Entities
#1

[eluser]Madalina.C[/eluser]
---------------------------------------------------------------------------------------------
PLEASE EXCUSE THE IMPROPER HTML ENTITY NOTATION.
I assume this form uses the same engine which is addressing the same issue (I can inject HTML entities into this page)

I did some searching around and couldn't find the exact topic I'm discussing here

Hello

I found a bit of redundancy regarding the input class and properly (and safely) displaying user input on views using HTML Entities.


Essentially, to properly sanitize input from the user (Lets say we're talking about GET requests) we use the Input Class.


Data such as <html or <html> or <body> (etc) is being transformed to html_entities almost immediately. I am left with
Code:
& lt;html
which, to the browser, looks the same as
Code:
<html

Now this doesn't prevent ALL html injections.

Tags such as &lt;b>, <i>, etc, are not being sanitized since I assume they don't pose a threat. But they still need to be escaped.

So now I'm using HTML Entities on what is already HTML entitied (which are few elements).

Therefore a user's input of
&lt;html&gt;
Gets turned into
Code:
& lt;html& lt;

Which furthermore gets html entitied (by myself) and becomes
Code:
& amp;& lt;html& amp;& lt;

So here are the solutions I thought of, maybe you can provide some input

1. Add all tags to the list that get escaped. Not very smart since there may be tags that I miss. Plus then I'm modifying the core CI library which is unwise.

2. Extend the class. But then I would still have to spend a huge amount of time tracking what gets escaped and what doesn't get escaped by the Input Class

3. Not use the class. But the class does more than just HTML entities so I would rather keep it.

Thanks for any input.
#2

[eluser]pickupman[/eluser]
You basically don't want to trust user input. So if the field won't be numeric or bool, then you will probably want to run input through xss_clean from the security class. This is also easily done if you are using form validation class
Code:
$this->form_validation->set_rules('field', 'field name', 'trim|xss_clean');
//or
$this->input->post('field', TRUE);
//OR
$this->input->xss_clean('field');

Using xss_clean will strip in unwanted html tags. I would recommend using encoding the input into the DB, and decoding the output to the browser. This is usually how most wysiwyg editors like FCKeditor or tinymce will send the input. If characters are double encode, a user will need to be responsible for using the correct tags.

Example, if you post code here on the boards without using the code button, and then copy the code you will get the html entitites. If it is in a code block, they will copy correctly. Same goes if you try to edit a message carets will be converted if outside of code tag.
#3

[eluser]Madalina.C[/eluser]
Hi
Thanks for the reply

I think you misunderstood what I'm getting at

If you type
Code:
&lt; html
the default CI escapes the &lt; and leaves it as
Code:
& lt;

So , as you said, when I display it on the page (either from DB or right away after page load) I obviously run a htmlentities function call myself (to prevent any html injections)

So what is happening is the
Code:
&lt; html
[which became
Code:
& lt;
] now becomes
Code:
& amp; lt;

xss_clean doesn't just strip unwanted HTML tags. It escapes the tags on SOME occasions

I can probably trick this forum (assuming it's using the same library) into doing the same thing I'm mentioning
#4

[eluser]pickupman[/eluser]
Sounds like to may want to run a regex to strip a space between <>. If you are encoding and decoding properly, this should be fine.

The xss clean function does strip dangerous tags like iframe, script, html, and a few other tags allowing 3rd party site access.




Theme © iAndrew 2016 - Forum software by © MyBB