Login

09-10-2009, 04:53 AM

[eluser]WanWizard[/eluser]
I hope someone can explain this issue I have with memory_usage(), as displayed by the profiler.

We have a small team developing on an application. Although the actual hardware used differs from developer to developer (all laptops though), they all use an up-to-yum version of Fedora 11 to develop on.

I have checked, checked, and double checked, but the Apache, MySQL and PHP versions are identical, and so are the php.ini, httpd.conf and the my.cnf files.

Source is checked out of our subversion server, so apart from the bit they're working on, code in all development enviroments is identical as well. Same goes for the contents of the development database (we use a central test set to load it).

Now, the issue is that on some machines, memory usage reported is 2Mb to 3Mb higher than on others when requested exactly the same URL (a controller that nobody is working on). Percentage wise, we're taking about 25-40% more memory used.

Who has a clue to what's going on here, or how to determine what's going on?

09-10-2009, 06:57 AM

[eluser]jedd[/eluser]
Hey WanWizard,

Some questions for you, to help hunt down the differences you're seeing.

Quote:Although the actual hardware used differs from developer to developer (all laptops though), they all use an up-to-yum version of Fedora 11 to develop on.

It'd be interesting to know how the hardware differs. Specifically if there's any correlation to machines that have more memory, and instances where more memory is seen to be used by the process. (Some applications have a habit of grabbing more memory, if it's available.)

Quote:Now, the issue is that on some machines, memory usage reported is 2Mb to 3Mb higher than on others when requested exactly the same URL (a controller that nobody is working on). Percentage wise, we're taking about 25-40% more memory used.

So, you take two of the machines that consistently (?) give different results.

You make sure their local DB contains the same test data.

You shut them down, power them back up, and do exactly the same sequence of steps on both - that is, loading the URL in question with whatever input data you require to get you to that point.

And under those circumstances there's a consistent difference in the results?

09-10-2009, 09:40 AM

[eluser]WanWizard[/eluser]
[quote author="jedd" date="1252605420"]
Hey jedd,

Quote:It'd be interesting to know how the hardware differs. Specifically if there's any correlation to machines that have more memory, and instances where more memory is seen to be used by the process.

All machines are Dells, ranging from an Inspiron 8600 with 1Gb memory to a Precision M6300 with 4Gb. The figures I mentioned in my post are from these two extremes. If is this relevant, it worries me, as the application when finished has to run on a beefy server with tons of memory. I'd like to be able to predict performance and number of concurrent sessions, which is the reason for this question in the first place.

Quote:So, you take two of the machines that consistently (?) give different results.

Yes.

Quote:You make sure their local DB contains the same test data.

As part of our development code we have an initdb controller, that deletes all tables and restores them to a known state, with a defined test set of data.

Quote:You shut them down, power them back up, and do exactly the same sequence of steps on both - that is, loading the URL in question with whatever input data you require to get you to that point.

And under those circumstances there's a consistent difference in the results?

Yes. All machines are restarted regularly (=daily), our developers travel a lot, or work from home, which is why the have their development environment locally. Fedora + Dell + Suspend = ! Good_Idea. So we never use that, we always shutdown.

09-10-2009, 11:00 AM

[eluser]jedd[/eluser]
[quote author="WanWizard" date="1252615221"]
All machines are Dells, ranging from an Inspiron 8600 with 1Gb memory to a Precision M6300 with 4Gb. The figures I mentioned in my post are from these two extremes. If is this relevant, it worries me, as the application when finished has to run on a beefy server with tons of memory. I'd like to be able to predict performance and number of concurrent sessions, which is the reason for this question in the first place.
[/quote]

I think you'll be just fine. Have you spec'd out what happens, however, if your worst-case results from your roaming laptops is the best case scenario for your server metrics?

You didn't answer if there's a correlation between laptops with more memory, and larger memory footprint.

Quote:
Quote:You shut them down, power them back up, and do exactly the same sequence of steps on both - that is, loading the URL in question with whatever input data you require to get you to that point.

And under those circumstances there's a consistent difference in the results?
Yes. All machines are restarted regularly (=daily), our developers travel a lot, or work from home, which is why the have their development environment locally. Fedora + Dell + Suspend = ! Good_Idea. So we never use that, we always shutdown.

This is not what I said.

You need to run through the exact same procedure on the two test machines, using the same software base.

You can't boot up in the morning sometime, do some work, do some other work, fill up your file system cache, your MySQL cache, and your CI cache, and then start comparing performance results after lunch. If you do that you'll likely find really bizarre and varying results that make no sense. Oh, wait, what?

( Btw, if your programmers are working from all over the place, you should check out (npi) git - much better for roaming developers than svn. )

09-10-2009, 12:56 PM

[eluser]WanWizard[/eluser]
I wasn't talking about performance, but memory usage. I know you can't make any statements about performance if you don't use a controlled environment.

Tests have been conducted with identical code and database. Apache and MySQL have been restarted prior to the tests. On a side note: even if we don't do that, but we repeat the tests after a days of hard work, the figures are the same as with the first test of the day. There is hardly any variation (just bytes) in the memory usage a given machine reports, no matter when you run the tests, and what you have done prior to testing.

The machine with the lowest specs always reports the lowest memory consumption.

I can't really accept that this happens just because the difference in specs. That would mean that PHP's memory allocation algorithm is seriously flawed, why would it report a 2 meg difference in memory consumption when processing exactly the same code, running the same queries, producing the same results?

The specs call for a PHP memory limit of max. 32M, so there is no immediate issue. I'm going to try to find a machine with similar specs as the production server is going to have, and see what happens there. I'm not to worried at the moment, but when I see something I can't explain, I start looking for an explaination. ;-)

As to your BTW: our developers roam, the repository doesn't. Even if they're on the road or at home, they're always online (thanks to flat-fee 3G or broadband), so no issues there. We might switch to git eventually (but that's a business decision which means repo conversions, different toolsets, etc), but currently I don't see the benefits...

09-10-2009, 01:51 PM

[eluser]jedd[/eluser]
Entirely concur wrt wanting to explain weird behaviour.

If you're running a GNU/Linux distro, pick one of the larger-memory laptops, and force the kernel to utilise less memory (mem= at boot). If you're using MS - no idea if you can fake it similarly. There might be a correlation between vm, not just physical RAM, so you might try forcing a smaller-memory box up with a hideous amount of swap, and vice versa.

Note that memory usage influences performance - especially when number of concurrent processes x footprint per process approaches available memory.

Quote:I can't really accept that this happens just because the difference in specs. That would mean that PHP's memory allocation algorithm is seriously flawed, why would it report a 2 meg difference in memory consumption when processing exactly the same code, running the same queries, producing the same results?

I don't know if this is the case .. but from an academic point of view, it's actually pretty smart - to use more resources if they're available.

Does the elapsed_time inversely correlate with the memory usage figures you're getting out of the profiler?

09-11-2009, 01:54 AM

[eluser]WanWizard[/eluser]
[quote author="jedd" date="1252630314"]If you're running a GNU/Linux distro, pick one of the larger-memory laptops, and force the kernel to utilise less memory (mem= at boot). If you're using MS - no idea if you can fake it similarly. There might be a correlation between vm, not just physical RAM, so you might try forcing a smaller-memory box up with a hideous amount of swap, and vice versa.[/quote]
No MS in this shop. :-) I'll see if I can produce some results this way.

Quote:I don't know if this is the case .. but from an academic point of view, it's actually pretty smart - to use more resources if they're available.

According to the manual, PHP reports the memory allocated by malloc(). The same code operating on the same data should allocate the same amount of memory. Regardless of what is available. Why would it allocate more? The only small fluctuation that I see can be explained by the actions taken by the sessions library, the extra memory allocated happens when the library is renewing the session (I see the extra query to update the session ID in the profiler output).

I found some possible explanation in Zend documentation, which state that when plenty of memory is available, the memory manager doesn't perform cleanup functions (p.e. memory allocated for variables that are no longer in scope and not explicitly unset) for performance reasons. Instead, all cleanup is performed when the script finishes (are part of the final cleanup). Since memory_get_usage() is executed before the cleanup, it would report include all garbage in memory, since it is still allocated. This also means that if I reduce memory on both test systems to the same amount, I should see more equal figures.

Quote:Does the elapsed_time inversely correlate with the memory usage figures you're getting out of the profiler?

Difficult to say. Processing the page typically takes 0.06 seconds on both machines, although occasionally it peaks to 0.11. These figures are roughly the same for both machines. Since it's no controlled environment (the machines are 'desktops' and typically run desktop applications at the same time (browser, email, etc...)), this doesn't say much...

09-11-2009, 02:32 AM

[eluser]n0xie[/eluser]
May I inquire why you are so worried about the memory footprint? Don't forget that premature optimization is the root of all evil. We have several high traffic websites running on CI and so far memory has never been an issue, so maybe this shouldn't be a concern until it turns into a problem at which point you should be thinking about scaling, which is a whole different way of looking at your application and memory usage still won't be the deciding factor there.

09-11-2009, 06:12 AM

[eluser]WanWizard[/eluser]
Its quite simple.

Total available memory divided by (Memory footprint times number of concurrent processes) equals maximum number of concurrent processes. Apache needs to be tuned accordingly, to make sure the performance requirements are met. If you don't, you might end up with processes in the queue waiting for memory to become available (or worse, crashing processes) which negatively affect the performance of the application.

The design specs call for a maximum footprint of 32M (which we have set as memory limit in CI to make sure we never exceed it), so we have to make sure we stay within this limit. Given the fact that CI can use tons of memory (depending on how you use its functionality, especially if you are a heavy ORM / Active Record user), I want to be cautious from day one, to prevent having to make major structural changes in the future.

If there is a variation of 25%++ in memory consumption, I have to take worst-case into account, and set the 'design' limit to 32M minus 25% to be on the safe side. Which is quite a big chunk of memory.

I agree with you that there are lots of other (and better) solutions to deal with scalability, but unfortunately I didn't set the design specs, I just have to make sure that they are met... ;-)

09-11-2009, 10:03 AM

[eluser]n0xie[/eluser]
I understand completely.

Maybe this helps?