Welcome Guest, Not a member yet? Register   Sign In
[1.6.3] Add Accept-Charset to Form Helper Class
#1

[eluser]Randy Casburn[/eluser]
Good Day -- I have a proposal:
Why/Problem Statement: CI constructs valid XHTML document with an HTML form. User submits form with ISO-8859-1 Latin character encoded data as part of form input. What happens to input text? If the browser is in Strict mode, the text is likely mangled since the XML specification requires the XHTML document to implement either a UTF-8 or UTF-16 character encoding. The proposal below accounts for the fact that the CI developer knows they were constructing the XHTML Strict document so the expectation in advance is a UTF-8 encoded input. The CI developer thinks this is dealt with since they made that setting in the database.php configuration file (char_set = "utf8";). Perhaps the text in the DB is still mangled.
Proposal: Add the [accept-charset = ] attribute to the <form> tag creation methods of the Form Help class. Use the DB config setting for 'char_set' to set the attribute value to be consistent with with CI DB configuration. This notifies the browser about acceptable character encodings the server will accept.
References:
[url http://www.w3.org/TR/html401/interact/fo...f-FORM]W3C Reference[/url]
[url http://www.cs.tut.fi/~jkorpela/chars.html]Tutorial on Charsets/Char Codes[/url]
Rationale: Many community members (~200 posts) have discovered some form of char_set difficulty. This issue is particularly difficult to fault isolate and every step CI can take to eliminate faulty links in the chain can help. It is assumed this is the rationale that led to the configuration and DB code base change regarding this same issue.
Benefit to CI: A significant portion of data generated for input into our web site databases comes from web based forms. By providing a consistent char_set CI would provide fewer char_set related trouble events.
Benefit to the Community: Charset related issues are still going to be around. Fault isolation will just be a lot easier if we're relying on the Form Helper Class to help us through this just like it helps us through automatically escaping our strings for us.
Discussion:
This is not the end game fix-it-all for char_set problems. Pls look at all the variations to this theme...

Default Charsets:
-Apache - ISO-8859-1
-PHP - ISO-8859-1
-MySQL - ISO-8859-1
-XML (XHTML) - UTF-8 (or 16)
-Windows - Windows-1252(Code Pages
-MAC - ISO-8859 Variants (depending on country)
-Linux,Unix - UTF (it depends)
-Other Charsets - BIG5, ISO-2202, etc.
-Browsers - Which mode? (quirks, strict, stupid)
-I don't mean to leave anyone out...

Every one of these 'platform's' charset is configurable. That makes it nearly impossible to determine what's coming at us unless we test every string (yuck). Simply forcing a web form to push text into a pre-defined char code isn't going to solve every problem either. For example, if a specific character code doesn't exist in the chosen char_set, you're still out of luck and likely to mangle your text anyway. (BIG5 <-> UTF-8)

What this fix would allow us to do is exactly what the same fix in the DB class allows us to do. Create consistency within the CI platform. Perhaps this a continuation of applying one char_set to the interfaces (in the OOP sense) of CI. Consistency.
---
Thoughts and comments?




Theme © iAndrew 2016 - Forum software by © MyBB