Welcome Guest, Not a member yet? Register   Sign In
Mysql queries and general program speed.
#1

[eluser]SpooF[/eluser]
This has nothing to do with CodeIgniter.

One of the projects I'm currently working on requires a massive mount of data collection. Our goal is to reach 20,000 requests an hour, so around 330 requests per minute (5.4 per second). On each request we are downloading ~37 kb of data (xml), each request has 20 independent blocks, and we are extracting the parts we want from each block. Each block is a dataset so we have 20 inserts to the database per request. Thats 108 inserts to the database every second (if we get all 20,000 and hour). At a rate like this would it be faster to combine all of those 20 calls into a single query?

To get to the 20,000 requests an hour I'm thinking of making a long term process to do all this work. That way as soon as one request is completed the next one is started and there is no downtime while we wait for the cron to come along and start the php script going again (Besides cron really should be doing this kind of work). This would also stop us from making a new connection to the database every time we make a request. Instead there would be a single connection.

I'm thinking of writing the long term processes in python. Do you think it would be a good choice or should I do it in c/c++?

Also this is just our data collection this doesn't not include post processing or client requests to the database.

I'm not even sure if this is possible at the moment to do from a single server. I might have to setup a few to split the work between. I've been running a few tests using php, and to just download the content and phrase the xml into an object is taking quite a bit of time. I haven't done any benchmarks with python, working on that right now.

[EDIT] Ya, there is no way to get 5.4 calls a second, theres just no way to download the information fast enough, at least from a single process. Next Im going to try running a few processes. 200 some kbps is not a lot at all for a server that has a 10mbps connection, so I have a feeling running a few processes is entirely possible.
#2

[eluser]gigas10[/eluser]
What could you possibly be doing that requires 6 inserts in a second? Maybe you should just export all this data to a txt file and at the end of the day do an insert from the txt file?
#3

[eluser]dmorin[/eluser]
It sounds like you're writing an app that will Pull all of this information and not have it pushed to you like a normal website? If that's the case, where/how do you get the urls that you're supposed to request the information from?

If you need lots of requests happened per second, then you'll need to run them in parallel. You can either do this by forking your php code and running multiple at the same time, or by using multi-curl to make multiple requests simultaneously (see http://www.jaisenmathai.com/blog/2008/05...ulti_curl/). If all you're really doing is downloading a file though, you may just want to write it so it runs wget. Depending on how you call it, you can make a system call without waiting for it to finish which would allow you to spawn them very quickly but you wouldn't know if they were completed successfully.

I would take a multi-tier approach so that this first portion would only download and save the xml file. Then have another process parsing the xml files in parallel. All of this can be done in php, but that doesn't mean it's necessarily best either, it really just depends on what you're comfortable with.

Good luck.




Theme © iAndrew 2016 - Forum software by © MyBB