Login

01-25-2008, 10:09 PM

[eluser]awpti[/eluser]
Here's the overall situation.

I've been doing hobbby-development for a number of years (almost 10!) and have finally accepted an actual job developing a sort of Project Management tool (with a very narrowly defined set of features).

I have no problem putting the app together along with a database, but I've never been a -great- database designer. I'd love some input on this and any pointers to improve my own skills as much as possible.

This is the current design.

Code:
CREATE TABLE s3pm_employees (

 `id` int not null primary key auto_increment,

 `role_id` tinyint,

 `username` char(32),

 `password` char(40),

 `first_name` char(32),

 `last_name` char(32),

 `email` char(128),

 `cell_number` char(30),

 `home_number` char(30),

 `total_completed` int,

 `skillsets` tinytext  # comma seperated list of skillsets (id)? SELECT... 

                       #s3pm_skillsets.. WHERE s3pm_skills.id IN ($meh->skillsets?)

);

CREATE TABLE s3pm_skillsets (

 `id` int not null primary key auto_increment,

 `skill_name` char(32) # Just what it sounds like, text representation of a skill

); 

CREATE TABLE s3pm_tickets (

 `id` int not null primary key auto_increment,

 `creator_id` int, #s3pm_clients.id

 `employee_id` int default 0, #s3pm_employees.id

 `request_type` char(32), # This may be an ever-changing. Another related table?

 `files_attached` int, # 

 `iterations` int default 10, # we subtract from this after each employee response

 `status` enum('Closed', 'Open', 'Admin-Review')

);

CREATE TABLE s3pm_files (

 `id` int not null primary key auto_increment,

 `ticket_id` int, #s3mp_tickets.id

 `file_type` char(3), #Probably only *imagetypes*, .txt, .doc, .pdf - ENUM?

);

CREATE TABLE s3pm_responses (

 `id` int not null primary key auto_increment,

 `ticket_id` int, #s3pm_tickets.id

 `response` text

);

CREATE TABLE s3pm_clients (

 `id` int not null primary key auto_increment,

 `username` char(32),

 `password` char(40),

 `first_name` char(32),

 `last_name` char(32),

 `email` char(128),

 `cell_number` char(30),

 `home_number` char(30)

);

Have I done anything blatantly stupid?

Am I looking at any huge headaches if I eventually need to extend this in some small manner? (Probably never happen, but I may take this codebase and build another app out of it)

I appreciate any thoughts on this. Thankfully, the client gave me a nicely detailed feature list, including general associations.

01-25-2008, 10:47 PM

[eluser]Michael Wales[/eluser]
I didn't read through everything but I did notice this:

Code:
CREATE TABLE s3pm_employees (

`id` int not null primary key auto_increment,

`role_id` tinyint,

`username` char(32),

`password` char(40),

`first_name` char(32),

`last_name` char(32),

`email` char(128),

`cell_number` char(30),

`home_number` char(30),

`total_completed` int,

`skillsets` tinytext  # comma seperated list of skillsets (id)? SELECT...

                       #s3pm_skillsets.. WHERE s3pm_skills.id IN ($meh->skillsets?)

);

CREATE TABLE s3pm_skillsets (

`id` int not null primary key auto_increment,

`skill_name` char(32) # Just what it sounds like, text representation of a skill

);

The comma-seperated list is one way to attack it, but the most common method - and one that has been popularized by Ruby on Rail's "has and belongs to many" relationship is a table dedicated to cross-referencing between 2 tables.

For example:

Code:
CREATE TABLE s3pm_employees_skillsets (

employee_id INT NOT NULL,

skillset_id INT NOT NULL

);

You would then use a JOIN query to return a result-set for a particular employee, joining the skillsets table, through the employees_skillsets table. It makes things a bit more difficult on the querying process (not on the engine, just on wrapping your brain around everything). But, it is far more extensible and easier on the server's processor (not having to explode() a string into an array, which is what I imagine you are currently doing, either that or searching via regex or str_* functions).

01-25-2008, 10:59 PM

[eluser]Armchair Samurai[/eluser]
Just from a cursory glance, may I ask why you're using CHAR instead of VARCHAR for all fields? If I had to hazard a guess, from the looks of the lengths of a lot of them (32 and 40), I assume you're either hashing them with md5 and sha1.

If you want to normalize the DB, I'd get rid of the skillsets column in the employees table and perhaps use join table.

EDIT
... which I just noticed Michael mentioned above. Oops.

01-25-2008, 11:08 PM

[eluser]wiredesignz[/eluser]
I tend to disagree with Micheal's suggestion even though his method is what the theorists recommend.

In the real world having a link table for that type of data is a real bottleneck. Imagine a table containing 5000 employees each with 10 skillsets, that creates 50,000 entries in the link table.

In my case I had table of 25,000 cars each with 40 features, a link table for that would contain 1,000,000 entries.

The comma delinited field will be faster to use when finding skillsets in your case and was when I looked up car features.

01-25-2008, 11:12 PM

[eluser]awpti[/eluser]
Interesting. I knew about that concept but never really understood it until now.

It makes a lot more sense, and I have a fairly strong understanding of throwing around JOIN queries.

There isn't any code behind this yet - my 'logic' behind this idea was;

$Skill_sets = '1,2,3,6,7,9'

SELECT skillsets FROM s3pm_skillsets WHERE id IN ($Skill_sets)

Only time i'd really need to imp/explode it is when a modification is made (removal of or addition to said list).

I look into it a bit more as I approach the point of actually putting code on paper. It's relatively simple to modify table structure and logic in such a narrowly defined area.

I'll be developing this application steadily, and piece by piece in the hopes of minimizing massive changes due to flow issues.

EDIT:

Wow, I'm slow. Had a buncha pullaways.

Anyway, I did some studying up on and it and do believe a comma seperated list would work better in the long run.

I just drew up the content there with little real thought to structure.

The only fields that will be hashed in any way is the password field (sha1). As far as the length values on most fields, I like dealing with 1/2/4/8/16/32... (you get the idea) lengths.

Is there a better way to handle field lengths for arbitrary, unknown things like names and email addresses?

01-25-2008, 11:32 PM

[eluser]wiredesignz[/eluser]
Even though I do like that query you have (I never knew about using `IN`) I would just pull the entire skillsets table into an array and use PHP array_keys to find the matching skillsets. (if the table isnt too big) Especially if you need to do repeated lookups.

01-25-2008, 11:34 PM

[eluser]Armchair Samurai[/eluser]
Not really in regards to unknown field lengths, I was just wondering.

But I am curious as to why you chose CHAR over VARCHAR. If I'm remembering correctly, VARCHAR carries a 1 byte overhead but is generally smaller in the long run that using char for everything.

For example: let's say a user enters 'John' for the firstname column, and 'Doe' for the lastname. If you use VARCHAR, the entries will use 4 (John) + 3 (Doe) + 2 (overhead) = 9 bytes. If you use CHAR(32), you'll end up with 64 bytes. I'd stick with CHAR for the password hash, but use VARCHAR if it's an unknown length.

01-25-2008, 11:49 PM

[eluser]awpti[/eluser]
As said, just created this in a rush. Almost all the fields will be flipped to varchar (that are not a fixed-length).

Me + rushing to create the actual layout (from a, literal, on-paper visual design Smile

)

01-27-2008, 04:08 PM

[eluser]lifewithryan[/eluser]
if it matters, I back Michaels solution...this is the way I've seen most people attack this problem. Aside from that, exporting the data to some other format may be sort of tricky as well since there are going to be numerous commas in that list. If someone was requesting a CSV dump, now your stuck figuring out the best way to do that and not mess up that field. (Granted you can try PIPE delimited files etc but you always run into issues no matter what you pick).

Having that data in a reference table is cleaner and more manageable. Large tables aren't a big deal anymore with the speed of today's hardware. On top of that good indexing, and more specific queries should cut down on any overhead.

For me, data is king...keep your data clean and you'll have few problems in the long run.

...one more thing too: having all those values in one field in 5000 records means you have to go and look through all 5000 records, inspect all the data in that field and then filter out the data you are looking for where as if you ran the query on the reference table, (indexed by skillset ID), asking it for the skillsets you are looking for and joining in the employee table, you'll only be searching through a small subset of the employees table, rather than searching a VARCHAR field for an occurrence of some skillset id. You could filter all that with PHP but why not let SQL do what it was made to do and do it for you?

01-27-2008, 04:19 PM

[eluser]wiredesignz[/eluser]
I thought performance was king. Wink

EDIT: None of the above diatribe means it can't be done in a single SQL query.

Code:
SELECT * FROM employees WHERE LOCATE( $skillset_id , skillsets )