Welcome Guest, Not a member yet? Register   Sign In
making full text search smarter
#1

[eluser]brian88[/eluser]
Ive implemented Full Text Search in my site. It works fairly well. But I want to make it smarter.

Heres my query for searches
Code:
SELECT *,
MATCH(content) AGAINST ('{$keyword}' IN BOOLEAN MODE) AS score
FROM products
WHERE MATCH(content)
AGAINST('{$keyword}' IN BOOLEAN MODE)
ORDER BY score DESC

I want to make it smarter in these ways...

1. when searching for "beaver" or "beavers" you get different results. I want it to just ignore the "s" and combine the 2.

2. I have hundreds of chuck norris jokes. When searching for "Chuck norris runs a marathon joke", the first 3 results are just fine relating to him running a marathon but the remaining 200+ results are just random chuck norris jokes. I just want to return the jokes that have him running only.

Also it actually searches for through my joke database for the word "joke" inside the actual joke. I just want to ignore that word since no one would actually want to get results with the word "joke" in them.

3. Is there any easy way to include a spellcheck? maybe a library I can throw in to use?

Im open to any other ways to make fulltext smarter. Anyone else have some feedback or help?
#2

[eluser]mikerh[/eluser]
When you are doing full text search with MATCH() in BOOLEAN mode you can use the + and - operators. + means the result must have the search argument words and - means the result must not have those words.

So for example if you did AGAINST ('+chuck +norris +marathon' IN BOOLEAN MODE) should only return rows with those exact words in them. So you could just write a wrapper for your post that adds the + to the front of each word.

To get beaver and beavers you can use the * operator. beaver* will return both.

To make it really smart you should provide options in the search form like "Match only these words" or "Match this word and others like it", etc... This way you can then program your search logic and query based on the constraints.

Here is the BOOLEAN ref https://dev.mysql.com/doc/refman/5.5/en/...olean.html
#3

[eluser]brian88[/eluser]
thanks for the tips. is there a way to make, beavers* return both also?
#4

[eluser]mikerh[/eluser]
I think you would need to use some regular expressions. Check if the word ends in "s" and replace it with *. Most words that end in "s" are a variation on the base word.

So you could do something like...

[pseudo code]

Take posted phrase.
Create an array using space as a delimiter
Walk through the array looking for words that end in "s" and replace it with *
Add + to the front of each word
Output query param
[/pseudo code]

For example if someone searches for "furry beavers" the param output would become '+furry beaver*'
#5

[eluser]brian88[/eluser]
I think I can manage that.

it wouldn't effect results if the word actually needs to end in "s" would it?
like chuck norris would turn into chuck norri*
#6

[eluser]mikerh[/eluser]
Here is some very quick code that I am sure could be vastly improved

Code:
<?php
function matchParams($text)
{
    /*
     * Function to create a match extact param string for full text search that also looks for words with possible s
     * modifier
     */
    $param_string = '';
    /*
     * Function for adding + and replacing s on the end of words with *
     */
    function replaceText(&$a)
    {
        if (substr(strtolower($a),-1) == 's')
        {
            $a = substr_replace($a,'*',-1);
        }
        $a = "+".$a;
    }
    /*
     * Working from inside out.  explode(' ',$str) will create an array of the text string
     * array_unique will remove any duplicate words
     * array_filter('strlen') will remove empty array values.  For example if the string is "furry    beavers   "
     * the explode() function will create null array values...array_filter() will remove them
     */
    $arr = array_filter(array_unique(explode(' ',$text)),'strlen');
    /*
     * Now add the + and replace the s
     */
    array_walk($arr,'replaceText');
    /*
     * Now create the param string
     */
    return $param_string = implode(' ',$arr);
}

$string = "Chuck  Norris fights  furry   beavers";
$param_string = matchParams($string);
echo $param_string;

/*
* Output
* +Chuck +Norri* +fight* +furry +beaver*
*/
?>
#7

[eluser]brian88[/eluser]
Thanks for the code and documentation! Works perfect.

After I test this out. I might be back soon to see if theres any improvements we can do to make it smarter.

your #1.

oh ya, what does "&" do?
Code:
function replaceText(&$a)
#8

[eluser]mikerh[/eluser]
I just realized I didn't need to set and return a variable. Here is the new "minified" version.

Code:
<?php
function matchParams($text)
{
    function replaceText(&$a)
    {
        if (substr(strtolower($a),-1) == 's')
        {
            $a = substr_replace($a,'*',-1);
        }
        $a = "+".$a;
    }

    $arr = array_filter(array_unique(explode(' ',$text)),'strlen');
    array_walk($arr,'replaceText');
    return implode(' ',$arr);
}

// Usage
$string = "Chuck  Norris fights  furry   beavers";
$param_string = matchParams($string);
echo $param_string;

/*
* Output
* +Chuck +Norri* +fight* +furry +beaver*
*/
?>
#9

[eluser]mikerh[/eluser]
[quote author="brian88" date="1346263963"]

oh ya, what does "&" do?
Code:
function replaceText(&$a)
[/quote]

It passes the variable from the array_walk() function by reference so the function can work directly on the variable itself.

http://nl3.php.net/manual/en/language.re...s.pass.php
#10

[eluser]brian88[/eluser]
ahh ok. thanks.

how can I only add "+" to words greater than 3 characters? Because I dont want words like "+a" or "+it"

This would be easy if it was just a string( if(strlen($a) >= 3) ), but its the words are in an array...




Theme © iAndrew 2016 - Forum software by © MyBB