Welcome Guest, Not a member yet? Register   Sign In
NicEdit and Word documents
#1

[eluser]TheFuzzy0ne[/eluser]
Hi everyone.

I've having some problems when I copy and paste Word documents into NicEdit. There's some not valid data that I need to strip out, but I'm not entirely sure how to do this, especially since I don't have Microsoft Office.

Here's an example of some of the meta data:
Code:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="ProgId" content="Word.Document">
<meta name="Generator" content="Microsoft Word 11">
<meta name="Originator" content="Microsoft Word 11">
<link rel="File-List" href="file:///C:UsersMikeAppDataLocalTempmsohtml11clip_filelist.xml">
<o:smarttagtype namespaceuri="urn:schemas-microsoft-com:office:smarttags" name="City">
<!--[if gte mso 9]>
    <xml>  Normal0falsefalsefalseMicrosoftInternetExplorer4 </xml>
<![endif]-->
<!--[if gte mso 9]>
    <xml> </xml>
<![endif]-->

Has anyone come up against this problem before? If so, how did you get around it?

Many thanks in advance.
#2

[eluser]Dam1an[/eluser]
I've gotten similar things if I just copy and paste from word into wordpress, there's several thousand lines worth of junk. (It doesn't get displayed in the actualy post, but it's redundant crap I'd rather not have).

I just doscovered they have a button on the WYSIWYG toolbar for pasting content from a word file, I would assume this would strip out the stuff you're talking about, although I've not tried it yet.

Oh, and good luck actually finding that code in the mess that is the WP source code that does that lol
(I might have a try when I get home tonight if you havn't found it/an alternative solution)
#3

[eluser]TheFuzzy0ne[/eluser]
In my case, the data IS displayed, since anything that's not valid HTML, is turned into HTML entities. This ensures that my page is always valid. The problem is that the tags don't show within the editor. They are parsed as HTML, and so are no visible are editable.

Thanks for the tip though. I'll have to take a look at WP's WYSIWYG editor.
#4

[eluser]Dam1an[/eluser]
A little bit of poking around, and it appears to be a tinyMCE plugin
If you look under wordpress/wp-includes/js/tinymcs/plugins/paste/pasteword.html thats the actual page thats displays when you paste in word stuff, and then there's some js in there, and then I start to get confused after that lol
#5

[eluser]TheFuzzy0ne[/eluser]
Aw, nuts! Well I'm using NicEdit at the moment, based purely on how light-weight it is. I can't remember now if it's TinyMCE or FCKEditor that doesn't degrade gracefully.
#6

[eluser]Dam1an[/eluser]
I think it's FCK, at least thats the one I hear people complaining about more

And even so, you can still use the TinyMCE code, it just won;t be a simple drop in (might need a few more JS files from TinyMCE)

Edit: And you should really actually test if that function does what I think/hope, otherwise it's wasted effort lol
#7

[eluser]TheFuzzy0ne[/eluser]
I think that for the interim, I can just have HTML Purifier drop invalid tags instead of escaping them. Then I should hopefully be able to crack on with other stuff.
#8

[eluser]TheFuzzy0ne[/eluser]
I've just discovered that NicEdit, TinyMCE AND FCKEditor doesn't seem to work at all when you copy and paste from a Word (or in this case, Open Office Writer) document when using Google's Chrome browser. Bummer...
#9

[eluser]TheFuzzy0ne[/eluser]
OK, I've found my problem, but I'm not sure how to fix it. For some reason NicEdit is escaping the data detailed in my original post, so HTML Purifier is not parsing it as (X)HTML. I posted on the NicEdit forums, and couldn't help noticing:

a) The fact that the copyright notice doesn't seem to cover this year.
b) That a lot of help requests had not been replied to.
c) The tumble weed blowing from one side of the NicEdit forum, to the other.

Clearly NicEdit isn't as popular as I'd hoped, so I thought I should see if I can get better support for NicEdit in the CodeIgniter forums, than I could in the NicEdit forums. Tongue

Has anyone who's used NicEdit had to deal with the problem before? All of the actual HTML is returned correctly, it's just the stuff that would usually appear in the <head> of the document and comments, that are converted to entities. I'm starting to consider moving over to a different WYSIWYG editor, but NicEdit is so lightweight and simple, I'm trying to resist the urge.
#10

[eluser]TheFuzzy0ne[/eluser]
I've done some further investigating, and it seems that it's the global XSS filtering that's converting meta and title tags to entities. Since I have XSS filtering enabled globally, I need to find a way to disable it just for this request. Does anyone have any suggestions as to how I might go about this? I can't think of anything off the top of my head, but I'm about to delve into the code now.

Thanks in advance.




Theme © iAndrew 2016 - Forum software by © MyBB