Welcome Guest, Not a member yet? Register   Sign In
MP_Search: Website indexing search for CodeIgniter
#1

[eluser]Jelmer[/eluser]
I went to look for a solution different from searching the database with LIKE & MATCH statements because they don't seem that efficient when searching multiple rows from multiple tables.
The solution I liked most was the one indexing all words on a page, removing all common words, using a stemming algoritm on them and scoring them. Searching such an index is done with an easy WHERE statement and scoring is done by using the mySQL function SUM().

Currently the search library offers these features:
- Search for all or any words in the search string
- Results are scored by how many times a word is found, with some areas of your choosing getting more weight then others
- Keywords and search string are stemmed to improve results (for example "parties" should become "party")
- Common words are stripped from keywords and search string
- You can feed HTML and it will only index the words, images will be replaced by the contents of thier title or alt attributes
- You can give it a join table and the results will be joined using the page_id on the other table, allowing you to fetch titles, urls and descriptions along with the search results
- Only alphanumeric strings are indexed, special characters like "รก" are replaced by their simple versions ("a"). Other special characters are stripped
- You can load multiple languages (currently only Dutch & English)
- Versions for CodeIgniter Active Record and RapidDataMapper

It's still early in development and I only implemented the English language version without testing it (which only impacts the stemmer, not the other features), but it should work without problems (please let me know if you run into errors).

You can find the code, an example controller and Dutch/English language files on http://mpsimple.mijnpraktijk.com/mp_search.htm

Sources for development:
- http://www.symfony-project.org/askeet/1_0/en/21 (most of the basics for this implementation, also the 'stop words'/'noise words' for the English version are from there)
- http://www.phpbuilder.com/columns/clay19990421.php3 (although quite old, another source of inspiration)
- http://drupal.org/project/dutchstemmer (the Dutch stemming functions, which I modified somewhat)
- http://tartarus.org/~martin/PorterStemmer/php.txt (English stemmer Class)




Theme © iAndrew 2016 - Forum software by © MyBB