Login

01-27-2008, 07:30 PM

[eluser]lifewithryan[/eluser]
Performance will do you no good if your data is messed up Wink

I see what you're saying about that query, however I still think querying a reference table is cleaner than stuffing a bunch of ID's in one field. How do you index something like that? I think most DBA's would be ill if they uncovered a schema like that. But as I always say...rules can be broken if it means the jobs gonna get done. (I just try not to break them often).

01-27-2008, 09:16 PM

[eluser]wiredesignz[/eluser]
Can you think of any reason to index that field?

I can assure you that this method was an order of magnitude faster than using a link table, and even moreso when my main table grew to over 25000 records.

I don't see bloat of any kind as an advantage.

EDIT: In fact MYSQL uses this technique itself for its User Privileges Table.

01-28-2008, 08:24 AM

[eluser]lifewithryan[/eluser]
cool, glad its working for you (and them).

Like I said, do what gets the job done right? All I'm saying is, I cringe when I get pulled onto a jobsite and see one field referring to multiple other fields...but that's just a matter of preference I suppose. I'd really be curious to see how much slower/faster it is. Perhaps I'm just used to using bleeding-edge hardware.

Where I'm currently working, we have several tables with hundreds of thousands of records each. We don't see any slowdowns at all using reference tables like this.

Anyway...do what works for ya right?

01-28-2008, 05:37 PM

[eluser]wiredesignz[/eluser]
Consider this scenario:

You have a `features` Lookup table with 80 rows, it will rarely get data added.

You have a `cars` table that grows and shrinks daily as clients upload their data (so far 25,000 rows). Each `car` can only have a maximum of say 40 features stored in the `cars.features` column.

You load the entire `features` lookup table into memory and use script to match the `cars.features`.

In this situation you can see memory transactions will be much faster than querying a Pivot table to find features for each car.
(80 rows searched in memory against 1,000,000 rows searched in a pivot table, to find 40 matches, there is no comparison.)

The motto here is don't do things simply because you read it somewhere or because someone tells you its the correct way. They aren't always right.

This is probably the same setup awpti will use in his application.

BTW, Them is your MYSQL database too.

01-29-2008, 09:05 AM

[eluser]lifewithryan[/eluser]
Oh I had a bunch of stuff to say on here but you definitely have me thinking...I'm going to test some stuff out here on my end just to get some benchmarks...

I can see that there'd be a huge amount of records to search through for SQL and using a reference table, and can see that loading the features table into some sort of array in memory should be fast.

Unless I'm not following you correctly though, its the 25000 queries that PHP will be running on the back end to compare the array contents to each car's feature column.

do you have any code I can look at to how you are managing this?

so if I want to list all cars and their features in some sort of report I'd first have to grab every car (25,000) go through each car.features field and lookup each id in that column and go grab the name of the feature from the features table where that I'd matches? So basically I read the features table into a key:value pair type array, then start at car0, read its feature list and retrieve that key:value pair from the features array. Hmmm....i want to play with this some.

01-29-2008, 10:40 AM

[eluser]lifewithryan[/eluser]
Just messing around with this here, (because I'm a glutton for punishment, and bored), but just messing with that "LOCATE" query. The only thing I've noticed is that if you have to be careful on your filter string.

I created a table with 25000 cars and 40 random features (out of 80) and ran that query looking for every car that had a feature with an ID of 1 in its value string. The problem was, I searched wrong in that I did "... LOCATE('1', features). However this returned every car that had a 1 in that string. Meaning it saw 31 and since that contains a '1' it returned that car. Needless to say the query returned 24,998 records (all but TWO, that can't be right...).

I fixed it, briefly, by doing "...LOCATE(',1,', features), and that was much better finding only 9,260 records. Much better...however...what if feature 1 was at the "end" of the list? This query would skip any record that happened to have the number one at the END of the feature list since it wouldn't have a trailing comma. (We're starting to see that we're going to have to program around 'dirty data' issues in this case by making sure that our features list gets written to the DB in a very specific manner.)

I will say that MySQL is reporting rather fast response times. So now I'm going to try things via SQL and see how much of a time difference there is.

Here's a breakdown of records to queries:
LOCATE('1', features) -- 24, 998
LOCATE(',1,', features) -- 9,260
LOCATE(',1', features) -- 24,921
LOCATE('1,', features) -- 24,582

So which one of these queries tells us how many cars actually have feature 1?? I don't think ANY of them are correct, but the closest would have to be the second one. However, as stated above, that doesn't cover feature 1 being at the end of the list. Looks like we've got some cleanup to do there. So much for our Data being King...who can trust that?

Still more testing to do on my end...this is getting interesting...

Next I gotta mess with the mapping/reference/pivot table solution.

01-29-2008, 12:06 PM

[eluser]lifewithryan[/eluser]
Status so far, not including PHP views but just backend processing time:

Using a join table to find only those cars with feature 1:

Code:
SELECT c.name, f.name from carfeatures cf LEFT OUTER JOIN carssql c on c.id = cf.carId LEFT OUTER JOIN features f on f.id = cf.featureId where cf.featureId = 1;  (took .59 seconds)

This yields 9890 records which is quite close to our "not quite right query" using the LOCATE functionality...for review the closest we got was:

Code:
select * from carsphp where LOCATE(',1,', features);

Yielding 9,260 records...however I've found that the following three queries add up the correct number:

Code:
select count(*) from carsphp where features like '%,1,%'; -- 9,260

select count(*) from carsphp where features like '%,1';   --  330

select count(*) from carsphp where features like '1,%';   -- 300

Those three queries give me the count of cars I'm looking for 9,890 cars with feature 1. It took me three queries to sort through our 'bad data' and only one to sort through the mapping table. The 'messy' way cost me .4 seconds but gave me the wrong record count forcing me to run it two more times with different parameters, equalling 1.2 seconds.

It took me one query to use the mapping table, i got the right number of recordds AND it took me .59 seconds.

The final test would be to pass the result sets to the views and see how long it takes to render the given list. But at this point, that's overkill because you want to be sure your data is correct. Data is KING.

Who do you want to buy a car from, the guy that may take .5 seconds (given the benefit of the doubt) to get back with you but forgets your power windows or the guy who may take .6 seconds to get back with you and remembers you wanted the power windows as well? Perhaps he'll even offer you a cup of coffee...

I've got the source code if anyone wants to see it, and if necessary, i can dump the the tables so people can do their own investigation. For me...I'm going to continue using the mapping table, since it will give me the subset of data I'm interested in and all PHP has to do is render it...no filters, no looping through arrays, etc.

*phew*

01-29-2008, 02:19 PM

[eluser]awpti[/eluser]
This discussion and my own experimentation has come to the same general conclusion.

Using the 'has and belongs to many' methodology makes more sense in this situation and my own. After some thought, I realized I'd need to search through people based upon a given skillset. I really don't care to pull that much data from a DB when I can use a quick JOIN query to match a given set of requested results.

Thank you all for your input, this turned out to be quite an interesting discussion. Every method has it's merits. Some just put undue strain on the development process of a fairly small application that will never likely never see use outside of this small, 15-man company.

01-29-2008, 03:00 PM

[eluser]codex[/eluser]
I thought the title read 'database design injury' :lol:

01-29-2008, 03:07 PM

[eluser]wiredesignz[/eluser]
@ryan: You sure did a lot of work there.

Each feature obviously has to be uinque, if you had used a two character feature id (ie '01') instead of simple numerics you would have achieved better results. (no point in having flawed logic)

However, in this system we use alpha id's (ie: 'PW' = Power Windows, 'AC' = Air Conditioning) so a typical car.features column would contain something like: 'AC PW PS EFI CC CL ABS ... ' etc. This column is populated by the Car Dealer from his own inventory management software.

To make it more confusing different Dealer systems provide different abbreviations for the same car feature. ie: 'EW' or 'PW' = Electric Windows or Power Windows. So the lookup table compensates by having both 'EW' and 'PW' description as 'Power Windows' etc.

I'm glad you enjoyed playing with this, But we'll stick with the lookup system as it is far more versatile.