Login

09-26-2009, 08:51 AM

[eluser]jedd[/eluser]
Having pondered this as a background task today (two hours on a ferry certainly allows for some mind-wandering) here are the things that worry me about your question. Don't take offence at these - you may well have all the following points covered - but based on what you've revealed, this explains why I'm thinking what I'm thinking.

You've hinted (I concede it may be a typo, but nonetheless ...) that you believe 1 x 1,000 == 1,000 x 10. This suggests a lack of consideration of the metrics of your designs.

You've not mentioned anywhere what database you're using. Most of us around here would probably assume MySQL, but it's odd that when asking about edge cases of a particularly database design, you don't actually say. Or what operating system (or indeed what file system). Some file systems choke when a directory ends up with thousands of directory entries under it (as you'd get with MySQL and the 'thousands of databases' approach).

A corollary - you are asking a fairly substantial design question about database capabilities on a forum for a PHP framework. This is disturbing - in my order of preference - because a) your DBA(s) should have come up with a third design that's better than your stated two, b) your DBA should know the answer to the question you asked (comparing the two choices) based upon the SQL engine you're using, c) your DBA should be able to rattle up some test database instances with mongs of random data in order to answer your question definitively and also set up a properly designed schema to answer your most recent question at the same time, d) your DBA should, in any basic test suite, also be counting the hours it taskes for a MySQL server to flush out all the data (ie. how long it takes to shut the instance down) and how long it takes a corruption test / fix run to complete after a power or disk failure, backup options and limitations for this many tables or databases - the kinds of thing that are good to know before you commit to a major design decision, e) your DBA should be pursuing this question on the forums / mailing lists for the database product, f) it implies (quite strongly) that this is a greenfield scenario and/or you have nothing comparable in play already that you can learn from, so all the previous concerns are now amplified by the incongruity of the original question.

EDIT: Oh, forgot one other thing I wanted to say here. You've not been very explicit with your rationale for discounting the 'proper schema' approach out of hand. There's only two reasons that you've mentioned - first, that you don't want WHERE clauses on your SELECT statements (though it's not clear why you don't want to use WHEREs, or how you think you'll write a system of this scale without the occasional WHERE slipping in), and something about account holders buying a 'namespace' and wanting to have them not share tables with other users. With the former, well, it's just a bit silly really, so let's ignore that one. With the latter, it's unclear if you are concerned about security or performance (or perhaps something else). If it's security, then I can almost understand why it'd be a nice marketing move to run separate databases - it implies (while not actually committing you to) improved security. Of course whether it's a real or imagined improvement depends entirely on the quality of your developers.

09-26-2009, 09:18 AM

[eluser]Spicer[/eluser]
Thanks for all the extra feedback. I get it. So assuming we cannot change the way we are doing that. Assuming we can not work with the DBA to make changes.

You all keep saying I am shooting myself in the foot. Why?? Maintaining the data, Maintaining the schemes are not an issue. Database is mysql.

-> Is there a performance issue to doing this? If so explain with details.

-> This is what we have not tested well and why I am asking here. Lets say we get 1000 different account requests per hour and each account makes 100 requests per hour. So that would be 1000 (database or unique table prefix requests) x 100. Assuming they are all coming in at random times so some could be concurrent or coming in while the other requests are still being process.

I am looking for answers like; "Do the table prefix, different databases load tons of extra overhead into memory"; "Do the different database idea because that will give you cacheing per database so you will have better performance".....

EDIT: There is no duplicate data across the databases or table prefixes. For example, there is no "states" table that holds all 50 states and is copied over and over again to each database or table prefix.

Also, we are using where clauses. I was just using that as an example to explain what the different results would be.

09-26-2009, 09:48 AM

[eluser]Spicer[/eluser]
Here are some good post outlining my issues and thoughts. Not many people talking about my table prefix idea, but people seem to be split on many databases or one database. No one is really talking about performance issue. Mainly maintenance issue (which I am not worried about). We have always experienced better performance with many databases, but wondering if there is something we are over looking.

http://discuss.joelonsoftware.com/defaul....610178.21

09-26-2009, 10:26 AM

[eluser]jedd[/eluser]
[quote author="Spicer" date="1253996333"]
-> Is there a performance issue to doing this? If so explain with details.
[/quote]

My reaction to this .. is that it's precisely the kind of thing you should be explaining, with details, to yourself.

An organisation with a commercial imperative and the resources to do its own testing - it seems way more appropriate than asking on a PHP framework forum and then arguing with the opinions you receive back.

It'd be nice if there was even a hint of PHP or CI stuff somewhere in your questions.

That, and the claim that you are 'not worried about' maintenance issues, encourages me to seek entertainment elsewhere. Good luck with it all.

09-26-2009, 10:55 AM

[eluser]InsiteFX[/eluser]
Like jedd has said above!

Have you check the memory issues?
How much memory does the server have?
What size will the tables and indexes be?
Is this MySQL Free or enterprize?

You do not state what kind of server or OS this is going run on.

These are factors to good database design.

Enjoy
InsiteFX

09-26-2009, 11:01 AM

[eluser]Spicer[/eluser]
Thanks for the reply.

-> Ubuntu 9.04
-> VPS so we can scale up on the memory. (we keep an eye on memory using 'free') If we start to swap we bump up the memory.
-> Mysql Free 5.1
-> tables and indexes are based on how much the account loads up. One account might have very little data and another account might have over a gigabyte of data.

09-26-2009, 11:08 AM

[eluser]InsiteFX[/eluser]
Then I think you shoud goto MySQL and read the limits of tables etc. for the free version.

Enjoy
InsiteFX

09-26-2009, 11:24 AM

[eluser]Spicer[/eluser]
yeah. I will not be hitting those limits. It seems most limits are bound by the filesystem anyway.

09-26-2009, 12:28 PM

[eluser]bretticus[/eluser]
I sympathize with you. I was handed a MySQL client database with 3 million orders in one table. A report query, that an IT guy there had made, was taking about 20 seconds to return a relatively small bit of data because that order table was being referenced. An index helped a lot LOL. Big Grin

However, the shear size of the table and the organizations unwillingness to throw more resources at the problem was frustrating. In the end, it was poor database design. However, I have often thought that it wouldn't have hurt to archive that data elsewhere. I think they actually did that since they didn't need to reference orders from 6 years ago. Smile

Honesty, I would like to hear from the other posters their suggestions about how to handle tables with millions of records through better schema design and coding practices. Not a challenge in any way. I am sincerely interested on how you might handle that scenario.

Thanks! And good luck Spicer.

09-26-2009, 01:15 PM

[eluser]Wuushu[/eluser]
What about the partitioning feature new in MYSQL 5.1? (all in one table)