Welcome Guest, Not a member yet? Register   Sign In
Does anyone have a good approach for indexing alot of files?
#1

[eluser]esset[/eluser]
I'm facing this problem I'm unsure how to approach.

I have a folder structure with subfolders and files, ALOT of files (up to 300.000). These files and folders are constantly growing, and being moved around.

What I want to do is to index these files into a database (PATH and filename).

I ran some tests on the directory_map() function and fetching around 10.000 files at a time wasn't a big load. My idea was to have a cronjob set to run every minute and check for files and index the ones that's new.


The problem I'm facing right now would be:

- Can I minimize the files being indexed somehow to make the search a bit smaller?

- Is this a good approach, can I do it somehow different with php/mysql?

- Should I run one cronjob on a function that just index the files, and a separate one that afterwards checks that they haven't been moved from their location on the disc?



ANY tips are very welcomed. Thanks all. CI rocks.
#2

[eluser]WanWizard[/eluser]
For searching through large volumes, Sphinx is the way to go: http://www.sphinxsearch.com/
#3

[eluser]esset[/eluser]
Sphinx is for database searches though, right?

I need something thats good for reading stuff of disks, I guess.
#4

[eluser]WanWizard[/eluser]
[quote author="esset" date="1274408312"]Sphinx is for database searches though, right?
I need something thats good for reading stuff of disks, I guess.[/quote]

From the Sphinx docs:
Quote:The data to be indexed can generally come from very different sources: SQL databases, plain text files, HTML files, mailboxes, and so on.
#5

[eluser]esset[/eluser]
Dude rock on. I'll start reading through the documentation then.

Thanks for the tips.




Theme © iAndrew 2016 - Forum software by © MyBB