Indexing HTML with Zend_Search_Lucene - Printable Version +- CodeIgniter Forums (https://forum.codeigniter.com) +-- Forum: Archived Discussions (https://forum.codeigniter.com/forumdisplay.php?fid=20) +--- Forum: Archived Development & Programming (https://forum.codeigniter.com/forumdisplay.php?fid=23) +--- Thread: Indexing HTML with Zend_Search_Lucene (/showthread.php?tid=19170) |
Indexing HTML with Zend_Search_Lucene - El Forum - 05-30-2009 [eluser]TheFuzzy0ne[/eluser] Hi everyone. I'm looking for some input from anyone who uses, or has used, Zend_Search_Lucene. Zend_Search_Lucene allows you to index HTML files and strings, but to my understanding those files/strings need to be a complete document. How do I go about indexing HTML that's not in a document, for example, some HTML that's been submitted by the user? Do I need to strip the tags from the string first, or is there a better way to do this? The HTML will be purified first, so it should be valid, and all tags will be closed properly. strip_tags() is quite buggy, so I'm not entirely sure if I should be doing this some other way. Any input appreciated as always. |