Welcome Guest, Not a member yet? Register   Sign In
making full text search smarter
#11

[eluser]CroNiX[/eluser]
I'd look into Lucene by the Apache Foundation. It's a full fledged search system that has a lot of options, like returning results ordered by relevance and word proximity.

Zend framework has a Lucene library, which CI can use. There are tutorials showing you how to use Zend libraries within CI. It's great and can be dropped into any project with little modification. You just need to build the index.

http://framework.zend.com/manual/en/zend...cene.html/
#12

[eluser]mikerh[/eluser]
[quote author="brian88" date="1346268361"]ahh ok. thanks.

how can I only add "+" to words greater than 3 characters? Because I dont want words like "+a" or "+it"

This would be easy if it was just a string( if(strlen($a) >= 3) ), but its the words are in an array...[/quote]

Yep just change all the logic to this.

Code:
<?php
function matchParams($string)
{
    $wordLength = 4;
    $stopWords = array(",", "'", '"', "*"); // add words or characters to this array if you want to remove them
    $keepWords = array("norris"); // add known words here so they don't get modified  i.e. norris doest become norri*

    $string = str_replace($stopWords, '', $string); // here is where we remove the stuff from $stopWords

    $arr = explode(' ', $string); // make the array

    /*
     * All the array work is now in a for each loop
     */
    foreach ($arr as $key => &$value) {
        $value = strtolower($value); // take away case

        if (strlen($value) < $wordLength) {
            unset($arr[$key]); // remove smaller than your defined word length
        } else {
            foreach ($keepWords as $keepWord) {
                if (!preg_match("/$keepWord/", $value)) // if they match your keep words...don't touch
                {
                    if (substr($value, -1) == 's') {
                        $value = substr_replace($value, '*', -1); // replace s with * on the other words
                    }
                }
            }
        }
        $value = "+" . $value;
    }
    return implode(' ', $arr);
}

// Usage
$string = 'Chuck Norris fights\' furry, beaver\'s "and" the honey badger* on a hill';
$string = matchParams($string);
echo $string;

/*
* Output
* +chuck +norris +fight* +furry +beaver* +honey +badger +hill
*/
?&gt;
#13

[eluser]brian88[/eluser]
thanks for the link CroNiX.

hey mikerh,
When using your stop words its actually replacing them inside of words i want.

example.

Code:
$stopWords = array("a", "an", 'to', "or");

original string
chuck norris ran a marathon

returns
+chuck +nris +mrthon
#14

[eluser]mikerh[/eluser]
Oops sorry I didn't think of that. Keep in mind that Mysql will take care of common words when you build a full text index.

Anyway to do words yourself just use some regular expressions...just modify how you see fit. just make sure to surround whole words with \b. For example if you want to remove the word cheese it would be \bcheese\b

Anyway...new code!

Code:
&lt;?php
function matchParams($string)
{
    $wordLength = 4;
    $stopWords = array("/[^a-zA-Z0-9\s]/" , "/\bor\b/", "/\bdribble\b/"); // keeps everything except special chars and the words or and dribble.  surround words with \b
    $keepWords = array("norris"); // add known words here so they don't get modified  i.e. norris doest become norri*

    $string = preg_replace($stopWords, '', $string); // here is where we remove the stuff from $stopWords

    $arr = explode(' ', $string); // make the array

    /*
     * All the array work is now in a for each loop
     */
    foreach ($arr as $key => &$value) {
        $value = strtolower($value); // take away case

        if (strlen($value) < $wordLength) {
            unset($arr[$key]); // remove smaller than your defined word length
        } else {
            foreach ($keepWords as $keepWord) {
                if (!preg_match("/$keepWord/", $value)) // if they match your keep words...don't touch
                {
                    if (substr($value, -1) == 's') {
                        $value = substr_replace($value, '*', -1); // replace s with * on the other words
                    }
                }
            }
        }
        $value = "+" . $value;
    }
    return implode(' ', $arr);
}

// Usage
$string = "Quote'd EndingInS or but not Norris dribble stars";
$string = matchParams($string);
echo $string;

/*
* Output
* +quoted +endingin* +norris +star*
*/
?&gt;




Theme © iAndrew 2016 - Forum software by © MyBB