Welcome Guest, Not a member yet? Register   Sign In
Best strategies when storing images on fileserver
#1

[eluser]E1M2[/eluser]
I'm building a community site that will depend heavily on image uploads. In the beginning my thought was to store everything in one directory with a hash for the filename using SHA2 as MD5 & SHA1 have been known to have potential collisions, I don't even think they are actively used by even the gov't anymore. Anyway, my concern, I don't know when the single directory approach could become more of a hinderance than a benefit as image count piles up.

It's been asked before, 'why don't we just slap the file in a DB blob field'. Now I'm not a DBA and I have thought about that myself considering it would just be a straight shot to the DB without having to access the filesystem thereafter. But, I'm very weary if it turns out that the DB in use has to be changed to another DB down the road and I've heard horror stories of trying to move from one blob type to another. So for me I'm shooting for keeping things on the fileserver.

I've done a search through the forum as well on the interweb, here is the strategy I've decided to run with maybe someone else will find it to be of use, maybe other useful strategies will surface.


Option #1 - Month & Year
IMG_FOLDER/YYYY/MM/ID_FILENAME.EXT

While this looks like a good candidate it doesn't really spread files across the board much and still leaves room for directories to have a massive about of images something I'm trying my best to stay away from.


Option #2 - Hashed
IMG_FOLDER/a-f0-9/a-f0-9/a-f0-9/ID_FULLHASH.EXT

Use the MD5 hash of the image ID. substr() the hash as needed for the 3 sub folder names. For the filename, rename the temp file to the ID plus the full hash to thwart a collision.

Here is an example:

Code:
$id = 10;
$img_hash = md5($id);
$img_node1 = substr($img_hash, 0, 3);
$img_node2 = substr($img_hash, 3, 3);
$img_node3 = substr($img_hash, 6, 3);
        
$img_ref = IMG_DIR . $img_node1 . '/' . $img_node2 . '/' . $img_node3 . '/' .  $id . '_' . $img_hash . IMG_EXT;

// result: IMG_DIR/d3d/944/680/10_d3d9446802a44259755d38e6d163e820.JPG

Conclusion
Def going with a flavor of Option2. Not only does it spread files across directories but best of all, since we used the ID as the base for the hash there is no need to store directory meta data in the DB, we'll always know by md5($id)
#2

[eluser]llbbl[/eluser]
Putting the image data in a blob field is a bad idea.

Having the images all in one directory isn't so bad as you might think, it all depends on how you access the images. If you are going to split them up just increment folder numbers. When you reach about 5,000 images in a directory increment to the next number.

Store everything in a database. Which folder the image is in, the randomly generated filename, everything. Also in the database have a img_counter table that keeps track of the number of images in each folder, so you know when to create the next folder.
#3

[eluser]E1M2[/eluser]
Great strategy, similar to how iPods store music files, starting with a folder of 'F0' and building up from there.

What's sweet
- Better management of filesystem resources, folders aren't randomly generated.
- Relocating a set of data to an archive server would be quite easy.
- Doing backups is a breeze.

Overall this solution would be more work up front than going with Option2 as integrity would need to be high when managing and storing dir location in the DB. But one can't overlook the benefits this brings as outlined above.

Good points libbl. I've just been sold.
#4

[eluser]llbbl[/eluser]
np, yw




Theme © iAndrew 2016 - Forum software by © MyBB