Welcome Guest, Not a member yet? Register   Sign In
Sanitising user submitted HTML
#11

[eluser]TheFuzzy0ne[/eluser]
I was thinking about that, but the trouble is I'm not sure how to go about it. I've seen lot's of libraries out there, but they only seem to support specific versions of Word documents. Do you have any suggestions? I'd like to avoid writing a library myself if I can.
#12

[eluser]xwero[/eluser]
I haven't done a thorough search but it seems hard to find a library that supports all word versions or you have to run your site on IIS and have office installed.

I found a commandline program wvWare that is essentially a headless abiword, it still seems to be developed but their site doesn't mention support higher than 2000 and in the sourceforge new section wv2-0.0.8 is the last version is XP. So you have to try it yourself to make sure it can convert docx to html.

The flow of adding doc(x) content to the editor would be
- upload doc
- let wvWare generate a html file from it
- read the html content
- delete the doc and html file
- display the content in the editor field
#13

[eluser]TheFuzzy0ne[/eluser]
Darn it... I grabbed wv2, which was last updated some time this year, but it needs to be compiled on the server, but I don't have SSH access, only FTP access to the Web root. Sad

Thanks for the suggestion though, that'll no doubt be useful for future projects.
#14

[eluser]TheFuzzy0ne[/eluser]
[quote author="bargainph" date="1243575235"]
I'd recommend htmlpurifier. I've used it and have tested it. and coupled with an xss filter.[/quote]

Sorry, I only just noticed you're message. I was looking at HTMLPurifier and came to the conclusion that it's big. Quite possibly even bigger than Michael Jackson's latest tour. How would you rate it with regards to performance? I have to be careful as one of the rules on the end user's server is that we cannot consume more than 20% of the processing power. Sounds unrealistic to me, but I've still got to try to abide by that rule.

SafeHTMLChecker also looks pretty good, but but it doesn't allow styles inline, so I might end up taking that and expanding it a bit.
#15

[eluser]meigwilym[/eluser]
In case you haven't already seen it, HTMLPurifier is in the wiki:

http://codeigniter.com/wiki/htmlpurifier/

Mei
#16

[eluser]TheFuzzy0ne[/eluser]
Nice one. Thanks a lot. That will save me a headache if I am convinced that HTMLPurifier is the way to go.
#17

[eluser]Myles Wakeham[/eluser]
I know this is an old thread, and I've tried to make some sense out of HTMLPurifier's docs and forums, but I have a very simple, elementary question for anyone with CI and HTMLPurifier in place...

I am using the default configuration settings and simply trying to purify field entered content. If a user enters some HTML code in a field (which I'm allowing) that contains
Code:
"
or some other HTML encoded value (ie.
Code:
&
, etc.) it seems that HTMLPurifier is just stripping off the & and ; characters.

How do you set it to not do that? I know it is probably freaking out because it thinks its Javascript, but in this case its just basic HTML.

Anyone run up against this before?

Myles
#18

[eluser]TheFuzzy0ne[/eluser]
Are you using a doctype?
#19

[eluser]Myles Wakeham[/eluser]
[quote author="TheFuzzy0ne" date="1244078064"]Are you using a doctype?[/quote]

No, in this case its just HTML being allowed to be entered into a field that will then be stored and added to other content. Much in the same way that HTML could be entered into a text area like on these forums.

Myles
#20

[eluser]TheFuzzy0ne[/eluser]
Does the same thing happen on the HTMLPurifier demo page?




Theme © iAndrew 2016 - Forum software by © MyBB