Welcome Guest, Not a member yet? Register   Sign In
Update/insert in chunks?
#11

[eluser]TheFuzzy0ne[/eluser]
[quote author="codex" date="1234686787"]Hmm, please elaborate.[/quote]

An MD5 hash is a great way to compare two sets of data to see if they are identical. Obviously you cache the file somewhere, so ideally, you should be able to make an MD5 hash of the old file, and MD5 hash of the new one, and compare the two strings. If they are the same, then the feeds are identical, and therefore there's no need to update the tables. If they are different, then it's time to process the data and get it into the database.

Does that make sense?

I know this hasn't fully answered your question, but I believe this may be part of the solution. I'm having difficulty making sense of your code. It's nothing to do with your code, but rather how my brain works. I get the impression that you put the newest feed into the temporary table regardless of whether the products table actually needs updating. By comparing two files with an MD5 hash, you can save that step if it's not necessary.
#12

[eluser]TheFuzzy0ne[/eluser]
Also, what is the error you're receiving? Is it a timeout, or is it that you've run out of available memory? I'd imagine it's a timeout because the insert is taking too long.
#13

[eluser]codex[/eluser]
[quote author="TheFuzzy0ne" date="1234687446"][quote author="codex" date="1234686787"]Hmm, please elaborate.[/quote]

An MD5 hash is a great way to compare two sets of data to see if they are identical. Obviously you cache the file somewhere, so ideally, you should be able to make an MD5 hash of the old file, and MD5 hash of the new one, and compare the two strings. If they are the same, then the feeds are identical, and therefore there's no need to update the tables. If they are different, then it's time to process the data and get it into the database.

Does that make sense?[/quote]

Yes, this is a nice solution. BUT, it only solves the part of wether to do the update or not. It doesn't solve the running out of memory issue. I think it may have something to do with inefficiency of my code.

Quote:I know this hasn't fully answered your question, but I believe this may be part of the solution. I'm having difficulty making sense of your code. It's nothing to do with your code, but rather how my brain works. I get the impression that you put the newest feed into the temporary table regardless of whether the products table actually needs updating. By comparing two files with an MD5 hash, you can save that step if it's not necessary.

Yes, the new feed goes into the temp regardless. And yes, MD5ing it can eliminate that step. But see answer above ;-)
#14

[eluser]codex[/eluser]
[quote author="TheFuzzy0ne" date="1234687551"]Also, what is the error you're receiving? Is it a timeout, or is it that you've run out of available memory? I'd imagine it's a timeout because the insert is taking too long.[/quote]

I first got the maximum of 30 seconds error. But since I made a slight modification I just get blank a page.
#15

[eluser]TheFuzzy0ne[/eluser]
Yes, I know this only solves part of the problem, but it is a good start. Give me an hour to study your code in a bit more detail, and I will see if I can come up with the second part of the solution. In the meantime, if you can work on implementing the MD5 hash and compare, you might find that the time saved there is enough for the time being.
#16

[eluser]codex[/eluser]
[quote author="TheFuzzy0ne" date="1234687968"]Yes, I know this only solves part of the problem, but it is a good start. Give me an hour to study your code in a bit more detail, and I will see if I can come up with the second part of the solution. In the meantime, if you can work on implementing the MD5 hash and compare, you might find that the time saved there is enough for the time being.[/quote]

Cool, thanks! But how do you MD5 a CSV file? Gotta think about that one...
#17

[eluser]TheFuzzy0ne[/eluser]
OK, before I get back to thinking, is the data feed ordered in any way? Are all the new entries at the top, or at the bottom only? Or are those entries scattered througout the file?
#18

[eluser]TheFuzzy0ne[/eluser]
[quote author="codex" date="1234688037"]Cool, thanks! But how do you MD5 a CSV file? Gotta think about that one...[/quote]

Something like:

Code:
$hash = md5(file_get_contents('filename'));

should do the trick.
#19

[eluser]codex[/eluser]
[quote author="TheFuzzy0ne" date="1234688147"]OK, before I get back to thinking, is the data feed ordered in any way? Are all the new entries at the top, or at the bottom only? Or are those entries scattered througout the file?[/quote]

I don't think there's a specific order. Also, every merchant has his own method of ordering I think. So you can't rely on the data in the file itself. The correct ordering and filtering is done when inserting the data into the temp. That's basically what it's for.
#20

[eluser]codex[/eluser]
[quote author="TheFuzzy0ne" date="1234688311"][quote author="codex" date="1234688037"]Cool, thanks! But how do you MD5 a CSV file? Gotta think about that one...[/quote]

Something like:

Code:
$hash = md5(file_get_contents('filename'));

should do the trick.[/quote]

Jeez, could it be any simpler?;-) Thanks!!




Theme © iAndrew 2016 - Forum software by © MyBB