Welcome Guest, Not a member yet? Register   Sign In
Disallowed characters in URI
#1

I am working on a search bar. There is a form, on submit the text from the input gets encoded with the JavaScript encodeURIComponent function and redirected with

Code:
window.location.href = /search/yourtexthere

This is routed to a controller with

Code:
$routes->get('search/(:segment)', 'Search::word/$1');

If I search for special characters, for example: []{}, I get the "The URI you submitted has disallowed characters" error message.
If I copy the url from the address bar, the characters are properly encoded to %5B%5D%7B%7D. All of these characters match the default regular expression in the App.php at $permittedURIChars.
When I copy just the []{} characters from the browser address bar, they stay as is, not encoded.

I tested 

Code:
preg_match('/\A[a-z 0-9~%.:_\-]+\z/iu', '%5B%5D%7B%7D');
preg_match('/\A[a-z 0-9~%.:_\-]+\z/iu', '[]{}'); 
They return 1 and 0, as expected.


I tried type in the the full url with the encoded string, same result.
I made a link on a test page where the []{} was encoded with PHP's urlencode, rawurlencode and http_build_query, with the same result.
If I set $permittedURIChars = ''; it works but it is not recommended. With this setting, if I dd(site_url(uri_string())); the url, it is properly encoded.

I use 
CodeIgnire 4.5.1
MacOS 14.5
It was tested on Firefox 126, Firefox Developer Edition 127, Safari 17.5, Safari Technology Preview 17.4 with the same result.

I assumed, maybe it is an OS/browser thing, because when I copied only the []{} characters from the url, they stayed not encoded.
But when I used dd to print out the url, I saw that the characters are encoded as they should be.

My question is, am I doing something wrong or is this a bug in CodeIgniter? Big Grin
Reply
#2

Am I the only one who is having this issue? Big Grin

$permittedURIChars debuted in 4.4.7.
I am having this issue since I updated from an older than 4.4.7 to 4.5.1.

I checked the system files.
In /system/Router/Router.php in my opinion $uri = urldecode($uri); is either unnecessary or in a wrong place.

How am i supposed to handle this case?

If someone mistypes something and sends some punctuation marks for example, the whole site will crash.

Whoops!
We seem to have hit a snag. Please try again later...

It would be nice to handle these cases in a more user friendly way.

I could sanitize the string with JavaScript before redirect but I would rather do that on the server side with PHP.
Reply
#3

Read https://codeigniter.com/user_guide/gener...characters
Reply
#4

(05-29-2024, 06:18 PM)kenjis Wrote: Read https://codeigniter.com/user_guide/gener...characters

That was the first thing I did.

According to that, there are two options.

Option 1. I can turn off the whole check by setting it to '' which is not safe for the whole project.
Option 2. I can add all the characters one can type on the keyboard, which is the same thing as option 1. basically.

There should be a middle ground, where it is possible to disable this for some controllers/methods.
It worked before 4.4.7. Or there was no filtering before that? Or there was this bug for years?
CodeIgniter 3 has permitted_uri_chars too, which is the same thing, so I assume there was filtering before 4.4.7 and it was not a bug that it worked.

How am I supposed to know all the most common possible characters? For example, I switch between Chinese, Japanese, Korean keyboard layouts all the time. Sometimes I type something, press enter and realize, I did not switch. And these are just the CJK characters.
If someone types a wrong character into a search field, not maliciously, the whole site will crash -> Whoops! page. That is a joke to me. There should be a way to handle this properly.

As I said before. In my opinion the url should not be decoded before the checkDisallowedChars.
Reply
#5

(This post was last modified: 05-29-2024, 08:53 PM by kenjis.)

Before 4.4.7 there was no filtering at all. It was a bug in CI4.
Reply
#6

(05-29-2024, 07:59 PM)loxia Wrote: According to that, there are two options.

Option 1. I can turn off the whole check by setting it to '' which is not safe for the whole project.
Option 2. I can add all the characters one can type on the keyboard, which is the same thing as option 1. basically.

Option 1 and 2 are completely different.
Option 1 does not check anything.
Option 2 checks the URI string.
URI strings may contain invisible characters or non-character binary data (or invalid data that is corrupted as a character).
Reply
#7

(05-29-2024, 07:59 PM)loxia Wrote: As I said before. In my opinion the url should not be decoded before the checkDisallowedChars.

It is wrong.
Because the decoded data is processed in the framework and your app code.

URL encoding can encode any binary data.
Do you want to accept binary data as a URI path?
In almost all cases, it would be no.
Reply
#8

For such search keywords, I recommend using a query string (e.g., ?q=keyword).
Reply
#9

(05-29-2024, 09:18 PM)kenjis Wrote: For such search keywords, I recommend using a query string (e.g., ?q=keyword).

I would have preferred the segment-based approach that CodeIgniter uses but query string it is, then.
Thank you!
Reply




Theme © iAndrew 2016 - Forum software by © MyBB