• 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
19,000+ file move

#1
[eluser]carvingCode[/eluser]
I have a web directory with 19K+ files in it that I need to move. Basically I'll be creating directories based upon a year/month and using 'rename()' to move the appropriate files into the directory. Currently, all files are stored in one directory and I want to optimize things a bit.

I don't have a problem with the coding (yet), but am wondering if I should expect any problems with moving (rename()-ing) this many files at once?

Thoughts?

TIA

#2
[eluser]pbarney[/eluser]
Moving that many files will probably take some time to do, so you will likely run up against the PHP max_execution_time (in php.ini).

So you'll want to move only a hundred files at a time or so (this is a guess... the actual number will likely be different), and then re-run the script. Use a header('location:') to automate that.

One warning though, if it's a shared web host, I would throttle it so you don't spike your disk usage.

Add a sleep(1) or something between every 10 file moves. That'll effectively add about 31 minutes to the total move, but it may prevent your process from being dumped by a watchdog process.

Hope this helps!

#3
[eluser]carvingCode[/eluser]
Thanks. You honed in on just the type of problems I assumed may occur. Will heed your suggestions.

#4
[eluser]mddd[/eluser]
Before doing extra work, I would do a quick test: why not move 100 files to see how long that takes.
Moving files within the same volume is usually super fast because the files don't actually have to be moved. The disk index is just changed.
Try it on your own computer... I just moved 6000 items, totalling 17 GB from one folder to another in 3 seconds.

Bottom line: it may be far easier and faster than you think. Don't do a lot of work before you tried it first.

#5
[eluser]carvingCode[/eluser]
I've got the basic code worked out. Here's the working loclahost version:

Code:
<?php

$existing_dir = "C:\wamp\www\data\eh\insp_rpts\\";
$new_path = "C:\wamp\www\data\eh\ir_pdf\\";


echo "Starting...<br />";
echo "---------------------<br />";

// open a known directory, and proceed to read its contents
if (is_dir($existing_dir)) {
    if ($dh = opendir($existing_dir)) {
        while (($file = readdir($dh)) !== false) {
            if ($file != "EHPBCONT.pdf") {          // special file - leave alone
                if (substr($file, -4) == ".pdf") {  // process only .pdf files

                    $ym = substr($file, -12, -6);   // extract YYYYMM from filename
                    $y = substr($ym, 0, 4);         // extrat YYYY
                    $m = substr($ym, -2);           // extract MM
                    $new_dir = $y . "\\" . $m;      // make directoty string

                    // mkdir

                    if (!file_exists($new_path . $new_dir)) {
                        mkdir($new_path . $new_dir, 0755, TRUE);    // 0755
                        echo $y . " - " . $m . "<br />";
                    } else {
                        echo $new_dir . " exists<br />";
                    }

                    // copy files
                    
                    if (!file_exists($new_path . $new_dir . "\\" . $file)) {
                        if (!copy($existing_dir . $file, $new_path . $new_dir . "\\" . $file)) {
                            echo "failed to copy $file...<br />";
                        }
                    } else {
                        echo "file exists: $file...<br />";
                    }
                }
            }
        }
        closedir($dh);
    }
}

My purpose has me extracting a year/month from the pdf's filename and using that, in YYYY/MM format, to create the new directory structure. mkdir's recursive option makes creating sub-directories a breeze.

To use this within my application, I decided it best to copy() rather than rename() as I'll need to do some modification to my application code to integrate the new directory structure. After I get everything worked out, I'll delete the old directory.

#6
[eluser]n0xie[/eluser]
One note though:
'Copy' is much much slower than 'Rename' since with copy it has to do an actual disc copy. With rename it just alters the filesystem index file and leaves the file alone.

#7
[eluser]mddd[/eluser]
That's what I said before Smile

It might be a good idea to copy the entire directory first, and then run the script to move the files in the right place.
That will make the script run a lot quicker and prevent possible problems!

#8
[eluser]SpooF[/eluser]
Is there anyway to run a shell script on your host server?

#9
[eluser]carvingCode[/eluser]
I'm in the process of trying to duplicate the dir(copy its contents to new dir) using cPanel's file manager. But it keeps ending before it's complete. It copied over 11K of the files, but there's another 10K+ to go.

Hope I don't have to FTP them down and back up....

Spoof: What shell script do you have in mind?


Digg   Delicious   Reddit   Facebook   Twitter   StumbleUpon  


  Theme © 2014 iAndrew  
Powered By MyBB, © 2002-2021 MyBB Group.