CodeIgniter Forums
Indexing HTML with Zend_Search_Lucene - Printable Version

+- CodeIgniter Forums (https://forum.codeigniter.com)
+-- Forum: Archived Discussions (https://forum.codeigniter.com/forum-20.html)
+--- Forum: Archived Development & Programming (https://forum.codeigniter.com/forum-23.html)
+--- Thread: Indexing HTML with Zend_Search_Lucene (/thread-19170.html)



Indexing HTML with Zend_Search_Lucene - El Forum - 05-30-2009

[eluser]TheFuzzy0ne[/eluser]
Hi everyone. I'm looking for some input from anyone who uses, or has used, Zend_Search_Lucene. Zend_Search_Lucene allows you to index HTML files and strings, but to my understanding those files/strings need to be a complete document.

How do I go about indexing HTML that's not in a document, for example, some HTML that's been submitted by the user? Do I need to strip the tags from the string first, or is there a better way to do this? The HTML will be purified first, so it should be valid, and all tags will be closed properly. strip_tags() is quite buggy, so I'm not entirely sure if I should be doing this some other way.

Any input appreciated as always.