Welcome Guest, Not a member yet? Register   Sign In
xss_clean adds semicolon to anything with an &
#11

[eluser]Skippy[/eluser]
I do. Smile I'm seeing the same problem. There doesn't seem to be any improvement in 1.6.3. I've activated global XSS filtering in my config, the idea being to have all _POST data automatically secured all the time. But I'm having the same issue: stuff like "A&B" gets transformed into "A&B;", basically there's an extra semicolon added. And another case: "A&B C" becomes "A&B;C" (...so not only it appended a semicolon, it also ate up the space before C).

I'm surprised this issue hasn't surfaced before, since it's xss_clean() that does it. Don't people enter stuff like this into forms?

BTW, I'm not seeing the issue with this forum, so if it's built on CI I'd appreciate telling me what you did right to fix it. Smile

PS: Ah, there it is. Not happening anymore using Input.php from SVN. Any idea about how safe it would be to use that file with CI 1.6.2?
#12

[eluser]Skippy[/eluser]
Still no go, I'm afraid. I've upgraded to 1.6.3 and Input.php from SVN. A&B comes out fine now, but I'm still seeing this:

Code:
sales&marketing;,

Notice the stubborn extra semicolon (I end the string with g and comma, I get a semicolon shoved between them). Smile

Should I post a bug about this?
#13

[eluser]reconstrukt[/eluser]
I'm afraid so.

The scenarios outlined in my original post were essentially key value pairs delimited by ampersands (i.e. in URLs, etc.) Upgrading to 1.6.3 solved these issues, however, I too have seen the "semicolon insertion" issue on simple text entry as you like you mention above.

The semicolon is inserted because of the regular expression that xss_clean() uses. But the underlying problem is part of a larger design flaw - xss_clean looks for what it *thinks* to be mis-typed ASCII entities, and tries to automagically correct them for you. If/when it finds such a "typo", it inserts a semicolon. I'm not sure why this auto-correction is done, but I'm sure there's a reason - the CI guys wouldn't bake that in for nothing.

Maybe a better solution is to enumerate the ASCII characters to be converted via get_html_translation_table(), and using more precise regex's based on this explicit list. ?

In any case, your "sales&marketing;" should be left untouched (as there's no ASCII entity called "&marketing;"). So yes, this is definitely a bug...
#14

[eluser]reconstrukt[/eluser]
Note that this isn't fixed on the boards...

See above in the last paragraph, a semicolon was inserted into the first instance of "sales&marketing;"

(there, it did it again)

Looks like we can get this to happen if we type some pattern like:

[alphanumeric][ampersand][alphanumeric]

OK, let's try it. Any semicolons you may see below have been inserted by these boards, and were not typed by me.

1. sales&marketing;2. something&1something;3. a&b&c
4. 1&2&3
5. something& spacesomethingelse& anotherspacesomethingelse&
6. one1&two2;&three;&

...ok, clicking submit...
#15

[eluser]reconstrukt[/eluser]
Results:

A semicolon was inserted on tests 1, 2, 6.

3, 4, 5 were left untouched.

Also note: the line break was gobbled up after #1 and #2. So, another bug - when a semicolon is inserted, any trailing newline characters are also replaced.
#16

[eluser]Skippy[/eluser]
Not just newline, also space and possibly other types of whitespace.

Doesn't completing supposedly incomplete HTML entities go a bit beyond the call of duty, so to speak?

Speaking of which, why is a good old htmlentities() not enough to prevent XSS? XSS could be (simply) defined as remote-supplied JavaScript executed when it should not be. I would think that any browser that manages to somehow execute JavaScript in a piece of text that passed through htmlentities(), output on a document with Content-Type "text/html", to be seriously flawed. Are there such browsers? Is that why just htmlentities() is not enough?
#17

[eluser]Pascal Kriete[/eluser]
If you plan on allowing absolutely no tags, htmlentities will do the trick [EDIT: .. in most cases].

Believe it or not, almost all browsers out there today are "seriously flawed". They make a lot of assumptions to render sloppy code.
So once you start allowing some tags (or modify to create tags - i.e. bbcode), you run into the lax browser rendering engines. Under the right conditions, browsers will accept entities without semicolons. Also, many accept the javascript directive in places where javascript simply does not belong.

A positive security policy is always the best approach (specify what is allowed). Always strip as much as you can and validate where possible. That means that a url field should be checked to contain a url. (! and if someone tells you it doesn't, for god's sake fix it !)

The xss_clean function primarily uses a very strict negative security policy (recognize script elements and other attack vectors and remove them). In some cases, there is a little bit of whitelisting, such as validating image and anchor tags. Since browser rendering is so weak and unstandardized, and blacklisting is such a tough thing to do, the only way for any of this to work effectively is to be very broad. That does create some false positive we just have to live with.

As for those test cases, the SVN version allows one character behind the &, because there is no entity that looks like that. I would agree though, that the whitespace character should be reinserted after the tag (at least \09, \10, and \13 - the other invisibles are stripped anyways).
#18

[eluser]yxxm[/eluser]
Don't mean to dig this one up, but I'm still having a problem with this:

As per the changelog, this was fixed after the last post in this thread (1.7 was released 10/08, the last post here was from 09/08):

Quote:Modified XSS sanitization to no longer add semicolons after &[single letter], such as in M&M's, B&B, etc.

However, I have global_xss_filtering ON, and I am now using 1.7.2, and I can verify that this problem does (may) still exist. Perhaps what I'm looking at is slightly different. My (POST'd) string is as follows:

Code:
item[]=22&item;[]=18&item;[]=19&item;[]=20&item;[]=21

However, when accessing it, I get:

Code:
item[]=22&item;[]=18&item;[]=19&item;[]=20&item;[]=21

Unless, of course, this is correct behavior. If it is, could someone explain why?
#19

[eluser]Unknown[/eluser]
Nice one Jerry, works for me. For clarification of the process...

Create a file named MY_Controller.php in your [codeigniter]/application/core folder,

then place the following content into it...

Code:
<?php

/**
* MY_Security Class - extends CI standard Security controller
*
* This update removes the adding of semi-colons to submitted POST
* data if the data contains an ampersand & and the word after the
* ampersand doesn't match a known HTML entity
*
* @package  CodeIgniter
*/
class MY_Security extends CI_Security {

public function __construct()
{
  parent::__construct();
}

/**
  * Validate URL entities
  *
  * Called by xss_clean()
  *
  * @param  string
  * @return  string
  */
protected function _validate_entities($str)
{
  /*
   * Protect GET variables in URLs
   */

   // 901119URL5918AMP18930PROTECT8198

  $str = preg_replace('|\&([a-z\_0-9\-]+)\=([a-z\_0-9\-]+)|i', $this->xss_hash()."\\1=\\2", $str);

  /*
   * Validate standard character entities
   *
   * Add a semicolon if missing.  We do this to enable
   * the conversion of entities to ASCII later.
   *
   */
//  $str = preg_replace('#(&\#?[0-9a-z]{2,})([\x00-\x20])*;?#i', "\\1;\\2", $str);

  // hArpanet.com: new bit coming up (in place of above line)...
  // this code prevents a ; being added to string after an & unless that string matches a known HTML entity
  // code taken from: http://ellislab.com/forums/viewreply/954778/
  $matched = preg_match_all('#(&\#?[0-9a-z]{2,})([\x00-\x20])*;?#i', $str, $matches, PREG_OFFSET_CAPTURE);
  if ($matched > 0)
  {
   foreach($matches[0] as $match)
   {
    $test_str = strtolower($match[0].';');
    foreach (get_html_translation_table(HTML_ENTITIES) as $entity)
    {
     if ($test_str == strtolower($entity))
     $str = substr_replace($str, $entity, $match[1], strlen($match[0]));
    }
   }
  }
  // hArpanet.com: end of new bit

  /*
   * Validate UTF16 two byte encoding (x00)
   *
   * Just as above, adds a semicolon if missing.
   *
   */
  $str = preg_replace('#(&\#x?)([0-9A-F]+);?#i',"\\1\\2;",$str);

  /*
   * Un-Protect GET variables in URLs
   */
  $str = str_replace($this->xss_hash(), '&', $str);

  return $str;
}

}

NOTE: As you can see, this only updates the checks on non-UTF16 values. If you want to disable ; addition on UTF16 values then apply Jerry's code to that string also.




Theme © iAndrew 2016 - Forum software by © MyBB