CodeIgniter Forums
Indexing HTML with Zend_Search_Lucene - Printable Version

+- CodeIgniter Forums (
+-- Forum: Archived Discussions (
+--- Forum: Archived Development & Programming (
+--- Thread: Indexing HTML with Zend_Search_Lucene (/thread-19170.html)

Indexing HTML with Zend_Search_Lucene - El Forum - 05-30-2009

Hi everyone. I'm looking for some input from anyone who uses, or has used, Zend_Search_Lucene. Zend_Search_Lucene allows you to index HTML files and strings, but to my understanding those files/strings need to be a complete document.

How do I go about indexing HTML that's not in a document, for example, some HTML that's been submitted by the user? Do I need to strip the tags from the string first, or is there a better way to do this? The HTML will be purified first, so it should be valid, and all tags will be closed properly. strip_tags() is quite buggy, so I'm not entirely sure if I should be doing this some other way.

Any input appreciated as always.