Welcome Guest, Not a member yet? Register   Sign In
Indexing HTML with Zend_Search_Lucene

Hi everyone. I'm looking for some input from anyone who uses, or has used, Zend_Search_Lucene. Zend_Search_Lucene allows you to index HTML files and strings, but to my understanding those files/strings need to be a complete document.

How do I go about indexing HTML that's not in a document, for example, some HTML that's been submitted by the user? Do I need to strip the tags from the string first, or is there a better way to do this? The HTML will be purified first, so it should be valid, and all tags will be closed properly. strip_tags() is quite buggy, so I'm not entirely sure if I should be doing this some other way.

Any input appreciated as always.

Theme © iAndrew 2016 - Forum software by © MyBB