Does anyone have a good approach for indexing alot of files? - Printable Version +- CodeIgniter Forums (https://forum.codeigniter.com) +-- Forum: Archived Discussions (https://forum.codeigniter.com/forumdisplay.php?fid=20) +--- Forum: Archived Development & Programming (https://forum.codeigniter.com/forumdisplay.php?fid=23) +--- Thread: Does anyone have a good approach for indexing alot of files? (/showthread.php?tid=30622) |
Does anyone have a good approach for indexing alot of files? - El Forum - 05-20-2010 [eluser]esset[/eluser] I'm facing this problem I'm unsure how to approach. I have a folder structure with subfolders and files, ALOT of files (up to 300.000). These files and folders are constantly growing, and being moved around. What I want to do is to index these files into a database (PATH and filename). I ran some tests on the directory_map() function and fetching around 10.000 files at a time wasn't a big load. My idea was to have a cronjob set to run every minute and check for files and index the ones that's new. The problem I'm facing right now would be: - Can I minimize the files being indexed somehow to make the search a bit smaller? - Is this a good approach, can I do it somehow different with php/mysql? - Should I run one cronjob on a function that just index the files, and a separate one that afterwards checks that they haven't been moved from their location on the disc? ANY tips are very welcomed. Thanks all. CI rocks. Does anyone have a good approach for indexing alot of files? - El Forum - 05-20-2010 [eluser]WanWizard[/eluser] For searching through large volumes, Sphinx is the way to go: http://www.sphinxsearch.com/ Does anyone have a good approach for indexing alot of files? - El Forum - 05-20-2010 [eluser]esset[/eluser] Sphinx is for database searches though, right? I need something thats good for reading stuff of disks, I guess. Does anyone have a good approach for indexing alot of files? - El Forum - 05-21-2010 [eluser]WanWizard[/eluser] [quote author="esset" date="1274408312"]Sphinx is for database searches though, right? I need something thats good for reading stuff of disks, I guess.[/quote] From the Sphinx docs: Quote:The data to be indexed can generally come from very different sources: SQL databases, plain text files, HTML files, mailboxes, and so on. Does anyone have a good approach for indexing alot of files? - El Forum - 05-21-2010 [eluser]esset[/eluser] Dude rock on. I'll start reading through the documentation then. Thanks for the tips. |