Welcome Guest, Not a member yet? Register   Sign In
MySQL Search Query Library v0.1
#1

[eluser]trice22[/eluser]
Hello,

I wrote a library which produces MySQL-queries like you need them for a search field, since I needed it for a project and I thought someone might maybe have use for it and/or you could atleast have a look at it (this is the first actual class I wrote - so please have patience).

WHAT DOES IT DO?
Actually not much: It takes a search string (e.g. from an input field) and forms a MySQL-query for you, using "LIKE".

WHY WOULD I NEED IT THEN?
Well, it does a few things for you, that might come handy - it:
- splits the search-string, but keeps expressions between double-quotes intact
- removes a list of too common symbols (except from the expressions)
- removes stopwords (words, that are too common) depending on a external stopwords file
(so far I've includes stopwords for English, German, French and Finnish) - not the expressions
- reduces words to it's stem (using the Porter Stemmer algorithm - so far only for English) - and again leaves the expressions untouched
- will allow the usage of the most basic BOOLEANS like "AND or +", "NOT or -" and "OR"
- furthermore you have the possibility to set a few options, often used in "advances serach modi" - like "search for ALL words"


HOW TO USE IT?
Download the zip and extract it into your library folder.
Then put into your Model (or Controller):
Code:
$this->load->library('Emq');
// Set config-variables - details below:
$config = array(
    "table" => "rt_articles",
    "returnColumns" => "id"
);
$this->emq->setConfig($config);        
// get query according to your $searchstring        
$myquery = $this->emq->find($searchstring);
You'll have to set atleast the variables "table" and "returnColumns", here's the full list:
Code:
$returnColumns;     // What comes back
$table;             // From which table
$searchColumns;     // Where to look
$groupByColumn;     // GROUP BY
$orderByColumn;     // ORDER BY
$order;             // DESC or ASC
$limit;             // LIMIT - "10" becomes "0, 10"
$stemmer;           // use Porter Stemmer algorithm (true or false)
$lang;              // for stopwords ("en","de","fr",fi")
$stopWords;         // Has to contain all words
$searchAll;         // Search for all words (true or false)
//------------------------
// EXAMPLE:
$config = array(
    "returnColumns"  = "id",
    "table"          = "blog_articles",        
    "searchColumns"  = "title, subtitle, text",    
    "groupByColumn"  = "title",            
    "orderByColumn"  = "date",            
    "order"          = "DESC",            
    "limit"          = "0, 30",            
    "stemmer"        = true,                
    "lang"           = "en",                
    "searchAll"      = false                
);

You can also set every option separately:
Code:
$this->emq->setReturnColumns("returnColumns");
$this->emq->setTable("blog_articles");
$this->emq->setSearchColumns("title, subtitle, text");
$this->emq->setGroupBy("title");
$this->emq->setOrderBy("date");
$this->emq->setOrder("DESC");
$this->emq->setLimit("0,30");
$this->emq->setStemmer(true);
$this->emq->setLanguage("en");
$this->emq->searchAll();
Furthermore:
Code:
$this->emq->disableStemmer(); // will disable the Porter Stemmer Algorithm
$myStopWords = array("word1","word2");
$this->emq->appStopWords($myStopWords); // extends the stopwords list

WHAT IT DOESN'T DO:
- No security features are integrated - CI comes with great security helpers (like xss-cleaning etc.) - just use them before the EMQ.

EXAMPLE 1:
Code:
// The search string
$searchstring = 'I never had "sex with this" NOT woman';

// will produce this MySQL-query:
SELECT id
FROM articles
WHERE (
(
title LIKE '&#xse;x with this%'
AND title NOT LIKE '&#xma;n%'
)
OR (
text LIKE '&#xse;x with this%'
AND text NOT LIKE '&#xwo;man%'
)
OR (
tags LIKE '&#xse;x with this%'
AND tags NOT LIKE '&#xwo;man%'
)
)
GROUP BY column2
ORDER BY column1 ASC
LIMIT 0 , 10

EXAMPLE 2:
Code:
// The search string
$searchstring = 'birds -bees';

// will produce this MySQL-query:
SELECT id
FROM articles
WHERE (
(
title LIKE 'i;rd%'
AND title NOT LIKE '¾e%'
)
OR (
text LIKE 'i;rd%'
AND text NOT LIKE '¾e%'
)
OR (
tags LIKE 'i;rd%'
AND tags NOT LIKE '¾e%'
)
)
GROUP BY column2
ORDER BY column1 ASC
LIMIT 0 , 10

WHAT I WOULD BE REALLY GRATEFUL FOR:
It's the first version, man! Go - find a bug, there gotta be a lot ;D .
I don't consider myself to be a programmer nor a very logical thinking person.
I write the code I need (if I can) but I usually have to realize later, that there'd be 10.000 better and easier ways.
If you've some time, please read through it and if you've any advice about structure and solutions but also coding-style and possibilities to simplify - please let me know.

Otherwise, please be my guest and test it, if you have some use for it.

THANKS:
This class is inspired by the search-class of Stephen Bartholomew
-> (http://aspn.activestate.com/ASPN/Cookboo...ipe/125901)
and the PHP search engine from roscripts.com
-> (http://www.roscripts.com/PHP_search_engine-119.html)
It uses the Porter Stemmer algorithm for English by Jon Abernathy <[email protected]>

Cheers,
trice

//EDIT: Apparently I can't submit a query like "% man %" (without the spaces - I guess you know how the MySQL-query would look like anyways.
#2

[eluser]trice22[/eluser]
Sorry for pushing - but is this idea that useless or the class so badly done? Even that would interest me... anyone?
BTW: If there's already something else (and probably better) existing, would somebody point me to it?
#3

[eluser]CI miller[/eluser]
It's not bad but very few users actually find the time to "learn" how a search-engine works (for a site). For the end-user dosn't bring so much value. More important for search results is to be ordered by relevance and for that you need some indexing (your pages or your database). Also the queries that contain "LIKE" are not very fast. To summarize
1. If you have lots of content you need indexing capabilities
2. If you don't have too much content then using LIKE is enough.
#4

[eluser]trice22[/eluser]
THANKS, that's something I can work with.
But you're talking about working with, what come back - right?
Would something like a listing do the trick?

Something like this?
1. result: contains all words in the right order
2. result: contains all the words
3. result: contains some of the words
4. ...

AND: How would you replace LIKE (besides indexing the DB) - with MATCH? - or am I mixing things up now?

Sorry for bugging, but this is really something I'd like to learn, since you need a good search for almost every site (and I'm really fed up with searches, that don't work at all).
#5

[eluser]Cybolic[/eluser]
I think CI miller means that the library is fine, but a tad slow, and that it's not advanced enough for a large site, and too advanced for a small one.

Personally I think it's a great addition to the CI Universe, but it's not fast enough for my site (that would be the large one ;-)).

Thanks for sharing it!
#6

[eluser]trice22[/eluser]
Thanks, Cybolic - now I'm starting to get it.
My goal is it, to understand, how a good search-function works and to write a class/library I could re-use.
After what I already know, it's important to balance the work between PHP on one side and the DB on the other.
I'm not sure how to achieve this, how to structure the results in a good way or how to speed it up (so it would work for larger projects, too. I guess, once it's sufficient for bigger sites, I'll be able to break into moduls - to build a system that suits every need.).
The reason why I've posted this here is to get some feedback in which direction I should continue (besides maybe able to give something back). But it seems that I hit just a spot in the middle.
If somebody has some opinions what makes a search function good and faster, please share it - and I wouldn't mind if somebody would have some examples.
I guess many of you had to deal with this problem, earlier and many might be interested in an easy/easier solution.

Thanks a lot,
-trice
#7

[eluser]magiccyril[/eluser]
I'm really intersted in this library, but the download link doesn't work (404 error). Could anyone give me a link, I would like to test this library.

Thanks
#8

[eluser]deadelvis[/eluser]
Me too. Can someone please repost/fix the link? I havent found any other CI search libraries yet.
#9

[eluser]trice22[/eluser]
I don’t actively support this library anymore, so be ready to fix things as you go.
And just to clarify this again: This is not a search library. It's — like CI's Active Record Class— simply a query builder. If you're looking for a full blown search library, you might want to check out something like Sphinx.

If you're still interested, drop me a PM and I'll send you a the files.

Cheers and thanks for the interest,
—trice




Theme © iAndrew 2016 - Forum software by © MyBB