Login

03-30-2009, 04:24 PM

[eluser]peterbz[/eluser]
I have a car site that allows users to post different information about used cars, wheels, auto parts, etc.

Now each of these categories also posses sub categories. For example, used cars have different makes such as Honda, GM, etc. Wheels are divided into its sizes, etc.

However, each of the categories (used cars, wheels, auto parts, etc) have different attributes. For example, used cars will have kilometers driven, price, transmission, etc.

Now the question is, should i divide used cars, wheels, and auto parts into separate databases or should they all be on the same database with a category id? The only problem is that each of them have their own unique attributes (used cars have transmission, wheels have size, etc)

Version 1

TABLE: Used Cars

TABLE: Wheels

TABLE: Auto Parts

TABLE: UsedCarCategories

TABLE: WheelsCategories

TABLE: AutoPartsCategories

Version 2

TABLE: Listings

TABLE: Categories

Which version is better? The reason for Version 2 is because more different categories may be moved around or added in the future.

03-30-2009, 04:41 PM

[eluser]jedd[/eluser]
My feeling is that, especially if you have a known number of categories, that you do one category per table, all in one db of course.

Quote:Now each of these categories also posses sub categories. For example, used cars have different makes such as Honda, GM, etc. Wheels are divided into its sizes, etc.

I'd say that manufacturer is just another attribute. Similar, size is an attribute of wheel.

Quote:However, each of the categories (used cars, wheels, auto parts, etc) have different attributes. For example, used cars will have kilometers driven, price, transmission, etc.

An attribute table will list every type of attribute possible.

An attribute_item table will correlate the items with an attribute. There'd probably be a column for value in there.

Doing sanity checking on this (mileage can't be < 0, wheels can't have a transmission type, etc) will present some challenges, but they'd exist no matter what your schema.

03-30-2009, 05:05 PM

[eluser]TheFuzzy0ne[/eluser]
If it were me, I'd go with version 2. It should also make it easier for you to search the database without having to query each table.

03-30-2009, 07:16 PM

[eluser]peterbz[/eluser]

Quote:An attribute_item table will correlate the items with an attribute. There’d probably be a column for value in there.

Doing sanity checking on this (mileage can’t be < 0, wheels can’t have a transmission type, etc) will present some challenges, but they’d exist no matter what your schema.

So are you saying something like this?

ATTRIBUTES

Code:
attr_id  |  attr

1           mileage

2           transmission

3           size

ATTRIBUTE ITEM

Code:
attr_id  |  listing_id  |  value

1           43             3000km

1           107            2911km

2           43             auto

2           107            auto

3           92             17 inch

But the downside of having an attribute table and putting all items into one single table is that each select statement will take very long since the ONE table will contain so many rows

03-30-2009, 07:32 PM

[eluser]jedd[/eluser]
You mean a SELECT * WHERE listing_id=$x .. ? I'd certainly be INDEXing all the columns there (not sure how useful that is on the attribute column, but nonetheless). I think the database will be pretty snappy with such requests. You can crank up some test data if you are concerned, though.

03-30-2009, 07:34 PM

[eluser]peterbz[/eluser]
Have you ever done something like this using this table design? How would you do this if you were to program this project?

03-30-2009, 08:35 PM

[eluser]slowgary[/eluser]
Jedd is a normalization junkie, I would take his opinion very seriously. Wink

It makes sense to me to do this:

Code:
items

-----------------------------

| item_id | item_name       |

-----------------------------

| 1       | 96 honda        |

| 2       | goodyear tires  |

| 3       | mustang fender  |

-----------------------------

attributes

-----------------------------

| attr_id | attr_name       |

-----------------------------

| 1       | mileage         |

| 2       | size            |

| 3       | color           |

-----------------------------

values

-------------------------------------

| item_id | attr_id | value         |

-------------------------------------

| 1       | 1       | 74962         |

| 1       | 3       | red           |

| 2       | 2       | 16 inches     |

-------------------------------------

It's the only way you'll really be able to expand the items that you can store without having to add database tables and write new code. Also some of the attributes will surely be cross-item, like weight for instance. So you could search by a similar attribute like "SELECT * WHERE `weight` < '50lbs'" and you wouldn't have to write code to figure out if the item has a weight field or not. It'd be a nightmare any other way.

The frontend would be more consistent this way too, as the user would always use the same open ended interface...maybe just a dropdown of attributes next to a blank input and an 'add' button. You could use JavaScript to build a list (each with an 'x' button for deletion), then when the user is satisfied with the info they'd save the item.

03-31-2009, 05:24 AM

[eluser]jedd[/eluser]
Fan ... rather than junkie.. Smile

I used to have a bunch of very bad habits - but people much smarter than me have (mostly) talked me out of them. One of the big mantras I get is 'aim for tall, thin tables - lots of them'. Which pretty much sums up the goal/effect of normalisation.

If I were to do this .. ?

Your and slowgarry's table structure is what I would lean towards. I'd have an id column on every table. I've been pondering how you'd try to force some sanity into the data on the way in - an attribute_attribute table, say - but can't come up with an elegant approach. I'd contemplate separating attributes into categories, specifically numeric versus everything else, just to make searches and calculations easier - but this could get messy. You'd need two columns in your attributes table to refer to the different attributes_items/values tables, for starters.

I like the idea of abstraction, but as I said earlier on, if you have a fixed number of categories, and those categories are substantially different as far as attributes go, and that number of categories will never change, and that number is small (anything up to 10, say) -- then I'd probably look at keeping them separate. What I'd watch out for, if you follow that design, is if you are ever tempted to make a 'misc' category .. then you've probably failed, as you now have a hybrid system, and you might as well have everything abstract (ie. everything is implicitly in a 'misc' category - as per the current discussion).

As for quelling your concerns on performance - really, put some test data together and crank up the CI Profiling library, and run some tests. We tend to underestimate the speed of databases, and consequently do some silly things with that expectation in mind.

Consider this forum page, as an example - and the number of DB queries required to generate it. 15 people per page - so that's 15 lookups to the user table (which is around 70,000 rows at the moment) to grab username, create_date, avatar and signature. It also grabs total-post count for each user - and if they've done it properly this is 15 x count() calls against the presumably quite massive message table (upwards of 110,000, judging by my current URL). There's a check/update in there to see/set if you've read the latest message. And that's just the body - the header has a few more DB calls of its own. Point being, this page loads pretty snappily, despite doing upwards of 35 db calls against two 70,000+ row tables.

03-31-2009, 07:53 AM

[eluser]peterbz[/eluser]
I just did some research and found out this style is called EAV (Entity Attribute Value). Apparently, there are 2 schools of thoughts: one that frowns upon it since it is an anti-pattern, and others who use it because their requirements are unclear or are always changing.

What I'm deciding to do in the end is adapting the inheritance model.

Since all 3 of these "Categories" all are listings, I can essentially create an class Listing and the 3 Categories (auto parts, cars, wheels) can extend from this class. In essence, I can create a table Listing that has all the similar attributes (title, user_posted, etc) and create 3 other tables wheels, cars, auto parts (each with their respective attributes PLUS a listing_id to link back to the listing table). In this way, if I need other foreign keys such as table Images that needs to be linked to the original listing, I can just link it with listing_id instead of have 3 different foreign keys or 3 different tables for Images.

03-31-2009, 08:04 AM

[eluser]jedd[/eluser]
I like the sound of this approach!

I haven't pondered it deeply, but I would consider having the ID field of wheels/cars/autoparts as the same as the listing.ID. Given everything appears in listing, then each item's listing.ID is going to be unique.

Having wheels.ID (etc) as the PK would enforce uniqueness, and you'd wrap some code around inserts to make sure there were no attempts to put in duplicates, but this should be a modest effort.

Note that wheels could technically be seen as a part, so think carefully about whether you want to separate them.