Does anyone have a good approach for indexing alot of files? |
[eluser]esset[/eluser]
I'm facing this problem I'm unsure how to approach. I have a folder structure with subfolders and files, ALOT of files (up to 300.000). These files and folders are constantly growing, and being moved around. What I want to do is to index these files into a database (PATH and filename). I ran some tests on the directory_map() function and fetching around 10.000 files at a time wasn't a big load. My idea was to have a cronjob set to run every minute and check for files and index the ones that's new. The problem I'm facing right now would be: - Can I minimize the files being indexed somehow to make the search a bit smaller? - Is this a good approach, can I do it somehow different with php/mysql? - Should I run one cronjob on a function that just index the files, and a separate one that afterwards checks that they haven't been moved from their location on the disc? ANY tips are very welcomed. Thanks all. CI rocks.
[eluser]WanWizard[/eluser]
For searching through large volumes, Sphinx is the way to go: http://www.sphinxsearch.com/
[eluser]esset[/eluser]
Sphinx is for database searches though, right? I need something thats good for reading stuff of disks, I guess.
[eluser]WanWizard[/eluser]
[quote author="esset" date="1274408312"]Sphinx is for database searches though, right? I need something thats good for reading stuff of disks, I guess.[/quote] From the Sphinx docs: Quote:The data to be indexed can generally come from very different sources: SQL databases, plain text files, HTML files, mailboxes, and so on.
[eluser]esset[/eluser]
Dude rock on. I'll start reading through the documentation then. Thanks for the tips. |
Welcome Guest, Not a member yet? Register Sign In |