A question about validating alphabetical data |
[eluser]plainas[/eluser]
Hey all, charsets and such stuff is one the most confusing subjects for me. I need to validate data that will be part of an url. The users will be able to input a fragment the url of the pages they can create. for example a user will be able to create this kind of url: http://mydomain.com/myapp/username/user_created_data Now, to be safe, I was thinking of allowing only a-zA-Z0-9\-_ chars, but I will have many users that will generate content in several languages. My biggest problem is, using esoteric UTF-8 enconding, How do I sort out among hundreds of kinds of chars which ones represent only letters/algarisms? Like, I don't want to allow ponctuation and such stuff to be used, but how do I sort it out in chinese, corean, bangla, etc. chars? Another question... how do i safely insert UTF chars in an url? is urlencode() enough?
[eluser]plainas[/eluser]
I realise I posted this in the wrong section. Can a moderator please move this to the proper section. Thank you |
Welcome Guest, Not a member yet? Register Sign In |