Login

mwhitney · 07-15-2016, 10:19 AM

Q1: At every step of the process, filter/validate input. The closer you get to the output steps, the more you need to be sure that your data is properly filtered/validated.

This is something I can't stress enough, in part because I am as guilty as anyone of making bad assumptions about how my data is being treated somewhere else in the process. For example, I perform validation and filter data in my controller, because it's often easier to give the user more direct feedback about what's wrong with the input if I need to reject it. I usually store validation rules in my model so I can apply them consistently if multiple controllers can modify the same data (retrieving them from the model to use in the controller). So, I retrieve the validation rules from the model and validate the data in the controller, then pass it to the model, so I'm safe to escape the data for the database and go, right?

NO.
I need to validate/filter the data passed to the model. Why? Within the model, I have no way of verifying that the data has been filtered/validated except to validate it. Maybe there's some other controller out there which didn't validate the data, or created its own validation rules and didn't retrieve the model's rules. Maybe I was under a lot of pressure one day to pull off some massive form which passed data to multiple models and had a ton of complex relationships to manage and left it up to the model to reject my data.

It doesn't matter whether I use the Form_validation library or some other method to validate/filter my input, it only matters that I do it, often (and) repeatedly.

With your specific scenario, as PaulD pointed out, the regex is problematic. One of the reasons I try to avoid using regex validation (when possible) is that I'm not one of those people who can instantly recognize whether anything but the simplest regex is correct. One of the common problems with using them for names is that people tend to exclude characters outside the a-z/A-Z range, but people don't usually like being forced to substitute characters in their names.

Q2: The built-in escaping in Query Builder works fairly well, but is not a perfect solution. Unfortunately, what makes it not a perfect solution is the same reason that you're not going to get a lot of specific answers on security questions which can be applied to every circumstance. Any generalized solution can potentially be attacked under specific conditions, and you need to be wary of preventing those conditions. If you've filtered and validated your input properly, the escaping done by Query Builder will probably be all you need. If you need to disable it for a particular query, you're going to have to do it yourself. If you're not careful about how you construct your queries, especially in combination with data received from other parts of the system, there's always going to be an attack vector somewhere.

Q3: PaulD's answer has a pretty good handle on this, though I don't agree with the level of trust he places on other parts of the system. One of the benefits of HTML Purifier is that you can control what is allowed, and it is a whitelist-based system (which is generally the better option for security). Additionally, whenever I come across a situation in which I think I need to let the user input HTML to be saved and retrieved later, I set that assumption aside and re-evaluate it a few times before implementation. Sometimes there are better options, but it's very much a situational consideration.

Q4: Validate data from the database just as you would data from anywhere else in the system, or as you would data from the user. The biggest difference here is that most of the functionality provided by the Form_validation library is not useful here. You can still use it to process rules, if you wish, but you do know that the data at least meets the requirements to be in the database in the first place, which reduces some of the types of validation you need to perform. The main point here is to make sure you are validating/filtering the data for the current use. Most of us do this at a high level without much consideration:
- If no data was returned, I might display a different view, or my view might display a message to the user that no data was found.
- If data is missing from a specific field, I might need to replace it with a specific value, like an empty string or 0.

Think about what you would do in a public function/method if you received the data and had no idea where it came from. In fact, I've gotten in the habit of isolating my database/model calls in my controllers to such an extent that I often don't know whether my data came from a database, or even if I have created a model for the data (I can find out when I'm writing the code, but that's not the point).

Q5: I would generally say the same thing I've always said, filter/validate the input, escape on output. The best tool to use to escape the output is probably going to be either xss_clean() or HTML Purifier. HTML Purifier has some documentation related to improving performance for output. One thing to remember, though, is that the document is prefaced with the idea that you've already tried using HTML Purifier as you output the data to the page, and found it too slow, so they give you a couple of alternatives and detail some of the problems which may arise from using them.

Finally, one of my main points when I harp on filter/validate input everywhere is that you may know the source of the data now, but you won't always know the data source later. Did I get this from the database, or did someone pass it straight from a form? Sometimes you can save yourself massive amounts of effort in the long term by taking a little extra effort in the short term. If I always treat my inputs as suspect, it's a lot easier (and safer) to re-use my functions/methods for uses for which they may not originally have been designed.

For example, I have a page which displays a public list of events which have been approved by an administrator, so I could probably make an assumption that those events are pretty safe, but I didn't. So, the other day I had a situation come up in which the administrator wanted to preview events before approval, and all I had to do was wire up the existing list to show the event(s) he wanted to preview. The data comes from a different source, and is not as safe, but I didn't have to worry about it.

The more common scenario, though, is that someone manages to get past your security somewhere and stores something dangerous in your database. If you assume the data coming out of your database is safe, you're going to have to track it down and fix it.

Another scenario which is becoming more common is that, somewhere down the road, a public API is opened up which eventually uses some code which you originally designed to be used internally, and now you have security issues because there were mixed assumptions about where the data was going to be filtered, and it didn't actually happen.

Bonfire

Practical CodeIgniter 3
CodeIgniter Testing Guide