Login

02-19-2009, 06:52 AM

[eluser]ArcticZero[/eluser]
Alright, I will do that. Smile

By the way, here's another issue I noticed. With the sample data in that dump I uploaded, I noticed it hits the memory limit much faster than if I send generated by the test data populator script.

The sample dataset only has 970+ rows, while I've tried testing up to 10000 sales rows (including 100000 sales items) generated by the script. But again, the sample data seems to consume a lot more memory and at a more rapid pace, as shown by an open task manager window. Any ideas on this?

02-19-2009, 07:13 AM

[eluser]TheFuzzy0ne[/eluser]
From what I can see, the data in your table columns for the sales table is on average almost twice the size (in bytes) of that generated by your populator script. That's what I meant about variable data. What I meant to say what "variable length" data.

02-19-2009, 07:24 AM

[eluser]ArcticZero[/eluser]
I see. I suppose the most processing lies in the actual sending of data. I mean, even disabling all server-side processing in the submit_transactions function does little to drop the memory usage. I guess it's just impractical to send a massive amount of data at once. That's some great input. I'll be working on a more scalable solution right away. Smile

02-19-2009, 07:52 AM

[eluser]TheFuzzy0ne[/eluser]
One more thing I should point out that may make your job more difficult, but should not be ignored, are race conditions.

If the client is grabbing various pages and more data is inserted into the database during this process, it may mean that the client gets one or more rows of duplicate data.

It could also work the other way if a row were deleted, in which case the client may skip one or more rows of data.

I'm not sure if the above really matters that much, but if it does, there are some solutions. Using hashes to compare data would probably work. The client could send an id and the data hashed, the server could pull out all of those entries, and return the ones that have changed based on the supplied hash, along with information about any of the rows that may have been deleted. Sorry for complicating everything.

Another way to get around this (just to complicate things further), is that your script is hardlimited, as you are coding now, so it will only give out x number of results. Instead of the client requesting a page number, it requests an ID.

So starting from the top, let's assume that client requests ID 1 (which would essentially be the first page). The server would check, and grab the first x number of results where the ID is more than, or equal to 1 (order by id of course). The server passes it back to the client, the client does whatever it needs to do, and then makes a request again to the server, for ID (insert last ID received here) +1. Using this method will ensure that order of the returned dataset won't change. Any products added during the process will of course be picked up on the last page, because if you're using auto increment in your column, new rows will always be inserted at the end.

Hope this helps.

02-19-2009, 08:30 AM

[eluser]ArcticZero[/eluser]
An excellent suggestion. However this script is only run once per day, after closing hours. This is when there is zero activity coming from the client (no rows are being added), so it's fine.

Right now, I'm setting up a multi-part request in which the server sends a callback after it receives a batch of data, asking for the next set. This goes on until no more rows are available, and the client sends an "All OK" message, and the process is finished.

02-19-2009, 08:36 AM

[eluser]TheFuzzy0ne[/eluser]
Phew! I was starting to get worried that you might be implemented a solution that wouldn't work effectively. Fortunately for me though, RPC calls are controlled. Bonus!

Good luck with your project, and I hope you get back on target soon. Smile

02-20-2009, 12:27 AM

[eluser]ArcticZero[/eluser]
Just wanted to post back and tell you that I managed to solve the problem. Basically, it works something like pages on a multi-part list, as you originally suggested. The number of items per "page" is user-settable. The client gathers rows up to the limit, and sends the data to be processed.

One a response is received, the client selects the same number of rows, starting from the point it left off from, and repeats the process. Once no more sales rows can be retrieved, it moves on to the other processes, and the server reacts accordingly based on the command sent. Using this method, 25 rows per batch, and the old sample dataset, my memory usage never went above 99mb. Smile

The only drawback is that the script takes ages to complete, since it keeps bouncing back data. But with the execution time limit disabled, that is a non-issue. Again, I'd like to thank you for all your help. I would probably have taken ages to solve this problem without your advice. Many thanks! Support forums need more people like you, definitely. Smile

02-20-2009, 03:31 AM

[eluser]TheFuzzy0ne[/eluser]
The only other thing I can suggest you do (if and when you want, of course), is have your script dump the output to a file, which will be used as a cache. At the end of the day, when the client checks, the controller will simply serve up the entire static page.

Thanks for your comments. I really appreciate them. Smile

It's always a pleasure helping someone who appreciates it.

10-13-2009, 02:36 PM

[eluser]Unknown[/eluser]
I had a huge memory issue when using the XML-RPC library. I decided to ditch it and realized how easy it is to just write a simple client. Here's a great example...

http://techchorus.net/xml-rpc-client-php...technorati

01-25-2010, 08:07 AM

[eluser]Unknown[/eluser]
See this thread:
Memory leak in XML-RPC SOLVED