[eluser]Kinsbane[/eluser]
So, I've had this problem for quite a while now and after extensively search Google I haven't found anyone else with this problem who has posted a solution.
I'm trying to make a valid RSS feed for my company's different types of press releases. When I look at the raw RSS feed with Firefox, the different press releases don't have any line breaks, like how the PR is seen on the normal webpage.
I also ran the RSS feed through the validator, and have numerous errors, most of which pertain to illegal characters or entities, like this:
Quote:'utf8' codec can't decode byte 0x84 in position 25415: unexpected code byte (maybe a high-bit character?)
What functions are available to me to fix this? Keep in mind, these PR's are copy/pasted directly from Word files into the webpage form and then saved to the database. I have asked and asked and asked and asked that our PR firm do NOT do this when posting PR's to the website, but my requests get ignored - I need to be able to do this automatically. What encoding should the database table fields be to help facilitate character encoding at every level?
Thus far I have not been able to find anything on the web that tells how to deal with text copy/pasted from Word, or how to go about making sure my feed validates. It's as if everyone's got inside knowledge of this except for me, and I honestly don't know where to begin looking for answers.
What kind of solutions has everyone else developed? How have you handled character sets and encodings? Do you use UTF-8? ISO-8859-1 ? Thanks for any advice in advance.