Welcome Guest, Not a member yet? Register   Sign In
Are you sure about handling SECURITY?!
#1

Hi to all developers...

NOTICE! Dear developer... If you're tired or has no sufficient time to read this, please skip it for now and come back later! We would try to this post be a good and complete reference about the repetitive and not complete issue... input and output security.

I know there were some posts about this issue and I read almost all of them. But I think there are still some misunderstanding about security issue and how handling it.

SCENARIO 1
Suppose I prepare a form for collecting info about users, like their name, job, state, username, password and etc. let's consider one:

PHP Code:
<?php echo form_open('user/complete_info'); ?>

   <input type="text" name="first_name" value="<?= set_value('first_name'); ?>" placeholder="enter your first name"/>
   
   <!-- other inputs -->

   <button type="submit" name="complete_info">Send</button>
   
<?php echo form_close(); ?>

Now the user insert something in this input and assume submit it. Now in controller:
    1. I filter or validate input (as @mwhitney frequently mentioned filter (validate) input, escape output )
   
PHP Code:
$this->form_validation->set_rules('first_name''name''required|trim|regex_match[/^[^0-9,#@%&~_\!\?\$\*\+\=\(\)\|\/\\\>\<\.\:\-\^]+$/]'); 

As you see, I validate input via regex and allow just a-z and A-Z chars. (also for now, the user can insert [ and ] and I could not handle this issue!)

    2. If the input is valid, I will be store it into database.

Controller:
PHP Code:
if($this->form_validation->run() === FALSE){
 
   //some codes and return to form

} else{
 
   $info = array(
 
     'first_name'     => $this->input->post('first_name')
      , 
//other inputs
 
   );
    
 
   $db_result $this->User_model->insert_user_info($info);
    
 
   //some codes


Model:
PHP Code:
public function insert_user_info($info){
   
$this->db->insert('students'$info);
    
   if(
$this->db->affected_rows() > 0){
      return 
TRUE;
        
   } else{
      return 
FALSE;
   }

In exists post and their replies, some developer said:
  • Use php build-in filter data type functions
  • Escape via html_escape() or htmlspecialchars()
  • HTML Purifier classes
  • Escape queries before inserting in database
  • ...
Q1. Now that I do filter inputs via regex, for strings, for numbers or... and could accept just defined chars, ex. alphabets for name, numbers for zip code, alphabets and numbers for username and more, should I use first 3 mentioned options for filtering and encoding inputs?

Q2. Now that I used CI 3 Query Builders, Should I escape my queries?

After I handle these, user can see completed info in his/her panel. Now first, I must retrieve and read info from database, the info are input (as @mwhitney mentioned here) and the output is HTML code that sent to user browser, user panel.

Q3. How could I handle this properly? Just retrieve them and put in xss_clean() function and the result pass to view file? Or use htmlspecialchars()? Or some other approach?
Q4. As @mwhitney said, now the input is read info from db, so how could validate them as inputs?

SCENARIO 1'
Suppose I prepare a simple WYSIWYG editor for users and accept some basic tags, in this situation what should I do? As you know, some these editors (like CKEditor) would encode some chars, but I know I must validate them on server-side.

Q5. How do this? Just use htmlspecialchars or xss_clean() or it's better to user HTML Purifier classes? Or some other actions?


SCENARIO 2
I, as an admin, prepare some posts with a world of tags! and stored them into database in order to allow some users read them. What about this situation?
SCENARIO 3
Some developers said about filtering and validating http_request, urls that redirect to them or other foreign input. Is it possible to discuss about these and give practical example?
* As I said, I use latest version of CI.
* I enable CSRF too.
* I know about OWASP  and see it quickly and will read more.
Thanks to all experts for arrived to this line Smile and want to share info. If you could explain more and in details in order to reference other issues to this post in the future.
Reply
#2

(This post was last modified: 07-15-2016, 07:25 AM by PaulD.)

Your regex is overkill as form validation has alpha, alphanumeric, alpha spaces and much more already available.

Here are some quick answers - hope they help.

Q1 Makes no sense. "Because I validate input do I still have to validate input"?

Q2 If you use query builder it will attempt to escape everything for you. But see note later about depends...

Q3 xss_clean is better than html special chars as it does a bit more in addition. However, what you clean depends on the situation and levels of trust and where the content is going (js, html, headers, email content etc). See depends later. CI advice is to always clean output.

Q4 You do not have to validate db output, except I suppose in that if it is null or not, or if it is set or not, or is it a valid value or not etc (I suppose that is validation but I do not think of it like that). But you may have to clean it still as it may have unwanted content. ie if you put user content into your database then you need to clean it before outputting it again to the screen.

Q5 Wysiwyg is a difficult one. Personally I use HTML purifier to prevent user input breaking screen layouts etc. You cannot htmlspecialchars it or xss clean it as it won't come out as html, as is the whole point of it in the first place.

Scenario 2
This is about trust, do you trust yourself or admins not to post dangerous stuff or not? Of course you do although validation is still needed on input, as mistakes still get made.

Scenario 3
Not sure what you mean.

I think you might get better or more in depth answers if you stuck to one thing at a time. You cannot really expect people to write a massive essay in response. And you do not need to add pseudo code just to describe posting data and writing it to a database.

I also think the right place to go for better advice is here: http://www.codeigniter.com/user_guide/ge...urity.html and that is the right place for a complete guide to security.

Paul.
Reply
#3

Q1: At every step of the process, filter/validate input. The closer you get to the output steps, the more you need to be sure that your data is properly filtered/validated.

This is something I can't stress enough, in part because I am as guilty as anyone of making bad assumptions about how my data is being treated somewhere else in the process. For example, I perform validation and filter data in my controller, because it's often easier to give the user more direct feedback about what's wrong with the input if I need to reject it. I usually store validation rules in my model so I can apply them consistently if multiple controllers can modify the same data (retrieving them from the model to use in the controller). So, I retrieve the validation rules from the model and validate the data in the controller, then pass it to the model, so I'm safe to escape the data for the database and go, right?

NO.
I need to validate/filter the data passed to the model. Why? Within the model, I have no way of verifying that the data has been filtered/validated except to validate it. Maybe there's some other controller out there which didn't validate the data, or created its own validation rules and didn't retrieve the model's rules. Maybe I was under a lot of pressure one day to pull off some massive form which passed data to multiple models and had a ton of complex relationships to manage and left it up to the model to reject my data.

It doesn't matter whether I use the Form_validation library or some other method to validate/filter my input, it only matters that I do it, often (and) repeatedly.

With your specific scenario, as PaulD pointed out, the regex is problematic. One of the reasons I try to avoid using regex validation (when possible) is that I'm not one of those people who can instantly recognize whether anything but the simplest regex is correct. One of the common problems with using them for names is that people tend to exclude characters outside the a-z/A-Z range, but people don't usually like being forced to substitute characters in their names.

Q2: The built-in escaping in Query Builder works fairly well, but is not a perfect solution. Unfortunately, what makes it not a perfect solution is the same reason that you're not going to get a lot of specific answers on security questions which can be applied to every circumstance. Any generalized solution can potentially be attacked under specific conditions, and you need to be wary of preventing those conditions. If you've filtered and validated your input properly, the escaping done by Query Builder will probably be all you need. If you need to disable it for a particular query, you're going to have to do it yourself. If you're not careful about how you construct your queries, especially in combination with data received from other parts of the system, there's always going to be an attack vector somewhere.

Q3: PaulD's answer has a pretty good handle on this, though I don't agree with the level of trust he places on other parts of the system. One of the benefits of HTML Purifier is that you can control what is allowed, and it is a whitelist-based system (which is generally the better option for security). Additionally, whenever I come across a situation in which I think I need to let the user input HTML to be saved and retrieved later, I set that assumption aside and re-evaluate it a few times before implementation. Sometimes there are better options, but it's very much a situational consideration.

Q4: Validate data from the database just as you would data from anywhere else in the system, or as you would data from the user. The biggest difference here is that most of the functionality provided by the Form_validation library is not useful here. You can still use it to process rules, if you wish, but you do know that the data at least meets the requirements to be in the database in the first place, which reduces some of the types of validation you need to perform. The main point here is to make sure you are validating/filtering the data for the current use. Most of us do this at a high level without much consideration:
- If no data was returned, I might display a different view, or my view might display a message to the user that no data was found.
- If data is missing from a specific field, I might need to replace it with a specific value, like an empty string or 0.

Think about what you would do in a public function/method if you received the data and had no idea where it came from. In fact, I've gotten in the habit of isolating my database/model calls in my controllers to such an extent that I often don't know whether my data came from a database, or even if I have created a model for the data (I can find out when I'm writing the code, but that's not the point).

Q5: I would generally say the same thing I've always said, filter/validate the input, escape on output. The best tool to use to escape the output is probably going to be either xss_clean() or HTML Purifier. HTML Purifier has some documentation related to improving performance for output. One thing to remember, though, is that the document is prefaced with the idea that you've already tried using HTML Purifier as you output the data to the page, and found it too slow, so they give you a couple of alternatives and detail some of the problems which may arise from using them.

Finally, one of my main points when I harp on filter/validate input everywhere is that you may know the source of the data now, but you won't always know the data source later. Did I get this from the database, or did someone pass it straight from a form? Sometimes you can save yourself massive amounts of effort in the long term by taking a little extra effort in the short term. If I always treat my inputs as suspect, it's a lot easier (and safer) to re-use my functions/methods for uses for which they may not originally have been designed. 

For example, I have a page which displays a public list of events which have been approved by an administrator, so I could probably make an assumption that those events are pretty safe, but I didn't. So, the other day I had a situation come up in which the administrator wanted to preview events before approval, and all I had to do was wire up the existing list to show the event(s) he wanted to preview. The data comes from a different source, and is not as safe, but I didn't have to worry about it.

The more common scenario, though, is that someone manages to get past your security somewhere and stores something dangerous in your database. If you assume the data coming out of your database is safe, you're going to have to track it down and fix it.

Another scenario which is becoming more common is that, somewhere down the road, a public API is opened up which eventually uses some code which you originally designed to be used internally, and now you have security issues because there were mixed assumptions about where the data was going to be filtered, and it didn't actually happen.
Reply
#4

Thanks for your attention...

about regex that you mentioned it's overkill! I think it is so powerful and I use it. In this case, maybe you're right, but in other cases, it will be useful and I can validate input better. Suppose users must fill an input like this: 2016-11-05. With regex I have more control on format and also inserted values that must be integers. 

As I thought (please if it is true, say is true):
  • I can validate inputs (SCENARIO 1) with form validation library while accepting data via form submitting. If they are valid, insert them into database. When I read these values to send as HTML output, I must validate via PHP built-in functions like intval, filter_var or..., if they passed validations (validating inputs is done), now with xss_clean or HTML Purifier, I escape them and finally echo them in view file and send them to user browser, right?
  • In SCENARIO 2 (that I've HTML tags), the validation is not simple. so I use HTML Purifier and then store them into database. In this case when I want to output them as HTML, should I use HTML Purifier again? Or store them without HTML Purifier and when I output them, use it?
  • Both of users that replied to this question, do not say clearly xss_clean is better or HTML Purifier (as third party)! When I use xss_clean, if there is an <script> tag, it will print [removed], and HTML Purifier remove it.
  • One more thing is I, as admin, write posts with full of tags. In this case, I use <script> tags and must be there. How could I handle this? When I use xss_clean or HTML Purifier, they removed <script>! What should I do?
thanks.
Reply
#5

Quote:about regex that you mentioned it's overkill! I think it is so powerful and I use it. In this case, maybe you're right, but in other cases, it will be useful and I can validate input better. Suppose users must fill an input like this: 2016-11-05. With regex I have more control on format and also inserted values that must be integers.

Well fair enough, regex is very powerful. But your example of a date is a bad one. There is nothing more frustrating that a form that expects a date in a format you do not normally use. Forcing a user to input dates in a certain format can kill the user experience. I would use a datepicker. Validate length, and if it is not a valid input it would fail the date conversion and an appropriate message would appear.

Quote:I can validate inputs (SCENARIO 1) with form validation library while accepting data via form submitting. If they are valid, insert them into database. When I read these values to send as HTML output, I must validate via PHP built-in functions like intval, filter_var or..., if they passed validations (validating inputs is done), now with xss_clean or HTML Purifier, I escape them and finally echo them in view file and send them to user browser, right?
Input:
When you insert them into your database you need to escape the values. (Or use query builder that will attempt to escape them for you). So yes, validate (is the type of data you were expecting) then escape (make it safe for the database).
Output:
I would say that it depends on the usage. Lets say a user is inputting a comment. Your app generates the date. When you read it from the database you do not need to validate that it is a date. Suppose the date needs to be a future date, then you would validate that it is a future date. If it is user generated text, you may need to validate that it exists, or is not null. That is validation. You would then need to filter on output, so xss_clean it, if that is appropriate in your scenario.

Quote:In SCENARIO 2 (that I've HTML tags), the validation is not simple. so I use HTML Purifier and then store them into database. In this case when I want to output them as HTML, should I use HTML Purifier again? Or store them without HTML Purifier and when I output them, use it?
I purify on input, but that is not right. You should validate (is it set, the right length, etc) and accept the users input (fully escaped) for the database. It is on output that you need to use purifier in this example, as you want the tags intact, but scripts and potentially dangerous content removed. You can white-list the allowed tags. But if you purify on input you are changing the users input in an irreversible way, which might be problematic in certain usages. You only need to purify it once.

Quote:Both of users that replied to this question, do not say clearly xss_clean is better or HTML Purifier (as third party)! When I use xss_clean, if there is an <script> tag, it will print [removed], and HTML Purifier remove it.
Is an apple better than a pear? They are different tools that do different things.

Quote:One more thing is I, as admin, write posts with full of tags. In this case, I use <script> tags and must be there. How could I handle this? When I use xss_clean or HTML Purifier, they removed <script>! What should I do?
I would address why you are putting scripts into posts. If you must, you could write an exception somewhere that says 'if this post was written by an admin do not purify it' or similar, but I think putting scripts into posts is not a good idea in the first place. Your posts output should treat all posts as though they were from un-trusted sources. Any js should really not be in the middle of your page, so you need a new system to add javascript files or inline to your page, dependent on the post. Personally I do not ever put js into a database.

Having said all that, I think MWhitney's answer was very thorough, and is a far more experienced and better coder than I am, so I would reread and listen to what he said more than me. As he said:
Quote:you're not going to get a lot of specific answers on security questions which can be applied to every circumstance.

Best wishes,

Paul.
Reply
#6

(07-17-2016, 01:22 PM)PaulD Wrote:
Quote:about regex that you mentioned it's overkill! I think it is so powerful and I use it. In this case, maybe you're right, but in other cases, it will be useful and I can validate input better. Suppose users must fill an input like this: 2016-11-05. With regex I have more control on format and also inserted values that must be integers.

Well fair enough, regex is very powerful. But your example of a date is a bad one. There is nothing more frustrating that a form that expects a date in a format you do not normally use. Forcing a user to input dates in a certain format can kill the user experience. I would use a datepicker. Validate length, and if it is not a valid input it would fail the date conversion and an appropriate message would appear.

Quote:I can validate inputs (SCENARIO 1) with form validation library while accepting data via form submitting. If they are valid, insert them into database. When I read these values to send as HTML output, I must validate via PHP built-in functions like intval, filter_var or..., if they passed validations (validating inputs is done), now with xss_clean or HTML Purifier, I escape them and finally echo them in view file and send them to user browser, right?
Input:
When you insert them into your database you need to escape the values. (Or use query builder that will attempt to escape them for you). So yes, validate (is the type of data you were expecting) then escape (make it safe for the database).
Output:
I would say that it depends on the usage. Lets say a user is inputting a comment. Your app generates the date. When you read it from the database you do not need to validate that it is a date. Suppose the date needs to be a future date, then you would validate that it is a future date. If it is user generated text, you may need to validate that it exists, or is not null. That is validation. You would then need to filter on output, so xss_clean it, if that is appropriate in your scenario.

Quote:In SCENARIO 2 (that I've HTML tags), the validation is not simple. so I use HTML Purifier and then store them into database. In this case when I want to output them as HTML, should I use HTML Purifier again? Or store them without HTML Purifier and when I output them, use it?
I purify on input, but that is not right. You should validate (is it set, the right length, etc) and accept the users input (fully escaped) for the database. It is on output that you need to use purifier in this example, as you want the tags intact, but scripts and potentially dangerous content removed. You can white-list the allowed tags. But if you purify on input you are changing the users input in an irreversible way, which might be problematic in certain usages. You only need to purify it once.

Quote:Both of users that replied to this question, do not say clearly xss_clean is better or HTML Purifier (as third party)! When I use xss_clean, if there is an <script> tag, it will print [removed], and HTML Purifier remove it.
Is an apple better than a pear? They are different tools that do different things.

Quote:One more thing is I, as admin, write posts with full of tags. In this case, I use <script> tags and must be there. How could I handle this? When I use xss_clean or HTML Purifier, they removed <script>! What should I do?
I would address why you are putting scripts into posts. If you must, you could write an exception somewhere that says 'if this post was written by an admin do not purify it' or similar, but I think putting scripts into posts is not a good idea in the first place. Your posts output should treat all posts as though they were from un-trusted sources. Any js should really not be in the middle of your page, so you need a new system to add javascript files or inline to your page, dependent on the post. Personally I do not ever put js into a database.

Having said all that, I think MWhitney's answer was very thorough, and is a far more experienced and better coder than I am, so I would reread and listen to what he said more than me. As he said:
Quote:you're not going to get a lot of specific answers on security questions which can be applied to every circumstance.

Best wishes,

Paul.


Thanks again for attention...

You said:
Quote:Is an apple better than a pear? They are different tools that do different things.

I ask that question because @mwhitney said:

Quote:The best tool to use to escape the output is probably going to be either xss_clean() or HTML Pufier.

if escaping output can be done with either xss_clean() or HTML Pufier, while the first one is CI built-in and the second one is a third party, I ask that question! Otherwise I know about that fruits and...!

about putting js in the post and specially in the middle of the page... You're right and I agree with this approach. I will find another solution that do not rely on js.

thanks again...
Reply
#7

Hi again,

HTML Purifier: (The apple)
Quote:HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C's specifications.
http://htmlpurifier.org/

XSS_clean: (The pear)
Quote:CodeIgniter comes with a Cross Site Scripting prevention filter, which looks for commonly used techniques to trigger JavaScript or other types of code that attempt to hijack cookies or do other malicious things. If anything disallowed is encountered it is rendered safe by converting the data to character entities.
http://www.codeigniter.com/user_guide/li...-filtering

Using HTML purifier when you are not cleaning HTML input is like using a laser precision micro digital measuring device to find out if you need a haircut :-)

HTML Purifier does some very clever stuff, but in so doing it is a resource heavy and relatively time consuming operation. I do not think anyone would propose using it on a name field for a form.

Hope that helps,

Paul.
Reply
#8

(07-17-2016, 04:00 PM)PaulD Wrote: Hi again,

HTML Purifier: (The apple)
Quote:HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C's specifications.
http://htmlpurifier.org/

XSS_clean: (The pear)
Quote:CodeIgniter comes with a Cross Site Scripting prevention filter, which looks for commonly used techniques to trigger JavaScript or other types of code that attempt to hijack cookies or do other malicious things. If anything disallowed is encountered it is rendered safe by converting the data to character entities.
http://www.codeigniter.com/user_guide/li...-filtering

Using HTML purifier when you are not cleaning HTML input is like using a laser precision micro digital measuring device to find out if you need a haircut :-)

HTML Purifier does some very clever stuff, but in so doing it is a resource heavy and relatively time consuming operation. I do not think anyone would propose using it on a name field for a form.

Hope that helps,

Paul.

Greeting to you..

You mean using HTML Purifier or XSS_Clean is a trade off, right? first are better, but it is third party, and doing operation while it is time consuming.

about xss_clean and "If anything disallowed is encountered it is rendered safe by converting the data to character entities." when I test it, as I said before, if there was a <script> element in the filed (a post), it do not print it like &lt;script&gt;, instead print [removed]!

Quote:I do not think anyone would propose using it on a name field for a form.

My main problem is about posts that will shown to user and is full of tags, not a name field. As @mwhitney mentioned and said to a  user, we could not rely on database data and it is untrusted.

thanks a lot.
Reply
#9

(07-17-2016, 12:48 PM)pb.sajjad Wrote:
  • I can validate inputs (SCENARIO 1) with form validation library while accepting data via form submitting. If they are valid, insert them into database. When I read these values to send as HTML output, I must validate via PHP built-in functions like intval, filter_var or..., if they passed validations (validating inputs is done), now with xss_clean or HTML Purifier, I escape them and finally echo them in view file and send them to user browser, right?

You can use the form validation library in both directions if you wish, it's just more limited in its usefulness (and requires more work) when you're using it without a form. You can also use built-in and custom functions in combination with the form validation library in either situation. Remember that in both cases, you need to also escape your output for the appropriate context (escape it for insertion into the database and escape it for display as HTML). Escaping for the database is usually much simpler than escaping for display as HTML, but the results of doing so incorrectly (or not doing so) can be devastating.

Quote:
  • In SCENARIO 2 (that I've HTML tags), the validation is not simple. so I use HTML Purifier and then store them into database. In this case when I want to output them as HTML, should I use HTML Purifier again? Or store them without HTML Purifier and when I output them, use it?

HTML Purifier is not used for validation, and using it on data that is to be stored in the database is usually not a good idea. I previously linked to a page in HTML Purifier's documentation which describes the way this should be done, if at all, and the limitations of doing so. If you do run the content through HTML Purifier before storing it, you shouldn't be running it through HTML Purifier again, but you should store the original input, as well, and have some method of re-processing the original input as needed (for example, in case a bug is found in HTML Purifier and you need to make sure your content is safe).

If you are pre-processing the data, you should look into methods of validating that the data you retrieved from the database is the same data you stored after processing. Otherwise, you can't trust the data and will need to run it through HTML Purifier, which defeats the purpose of pre-processing it.

Also, I would never assume that I know where the data came from. I don't know if it came from the database, was generated by the database/model, etc. Maybe I know these things when I initially wrote the code, but someone could pass input to the same code from a different source, or someone could modify the database to corrupt the data which the database/model initially generated. By not making an assumption about where the code comes from, you allow yourself to not only write safer code, but also more flexible code, because you can safely pass data to it from other sources. Just thinking about that possibility can also make it easier to make good design decisions about your code earlier in the process.

Quote:
  • Both of users that replied to this question, do not say clearly xss_clean is better or HTML Purifier (as third party)! When I use xss_clean, if there is an <script> tag, it will print [removed], and HTML Purifier remove it.

I would usually use HTML Purifier only when I expect HTML, especially in a relatively large field in the data, but it's still acceptable to use xss_clean() in that case. I would not use HTML Purifier on any field in which I do not expect HTML, but I would probably still use xss_clean(), unless I had a specific reason not to do so. In cases where I use neither, I would probably still use html_escape() or some other appropriate method of escaping the field(s) before output.

If I am not outputting the data in the body of an HTML page, I would never use xss_clean(), and whether I would use HTML Purifier would depend on the configuration and the destination of the data.

Quote:
  • One more thing is I, as admin, write posts with full of tags. In this case, I use <script> tags and must be there. How could I handle this? When I use xss_clean or HTML Purifier, they removed <script>! What should I do?

Don't do that. It's a bad answer, but what you're doing is a bad practice. Look at what many CSS/JavaScript frameworks have been doing for some time now. You can make your scripts target classes and data attributes in your markup. Then your scripts are included in your site's footers and you just add the appropriate attributes/values to your markup. If you have some bigger scripts which are used less often, you can add fields to your data to indicate whether those scripts should be included.
Reply




Theme © iAndrew 2016 - Forum software by © MyBB