CodeIgniter Forums

Full Version: Building a Read vs Unread Status for a Forum
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3

El Forum

[eluser]vokic[/eluser]
Well I see few possible solutions here:

1. Use additional field in db for thread which will hold user ids who read the thread and when you display just check user id against that field. It might get big but if you make it so that every time a new response is posted, that field is erased and a thread is marked as new again... Something like this:

12 456 45 88 etc..

2. Use a text file for the same purpose Smile As you think about it you will most likely show between 10 and 20 threads per page which means 10-20 text files which is really not that much.. Unless of course those threads will be seen 1 million times each hour or so.. In that case, throw in the adsense and open up for a decent hardware Smile

Forgot to mention for the first one that I think is how vbulletin actually works or some kind of hack for it or something and that I've read about it in the past... If I dig up a link I'll post it...

El Forum

[eluser]andrewtheandroid[/eluser]
hi jedd first of all thanks for taking the time to read my post.

Quote:I'm not sure from your code sample and subsequent explanation was an adjustment of my idea, or an extrapolation of your original ..

I think I may have not worded my reply as well as I hoped. I think when I went and edited it I took out the "I like the idea of.." to indicate it was not novel. Yeh as i was reading I was thinking of which of the ideas I would try to use for my project. Not my own "new" solution to the problem. Sorry.

Quote:Lets say we are only really interested in the last (say 30) topics.

As soon as you set arbitrary limits for things like this, you strike a whole raft of new problems I think.

I see how a busy forum with many new topics each minute would be a big problem but then again I set the arbitrary number because I think we need to set a line so it won't cause a problem for performance? Also if there were 100 new topics that were 5 pages deep I think anything past 2 pages is not as 'new' it is true I haven't read them but then if i dont visit the forum in one month I don't want to be flooded with 100 new topics as unread. I think anything older than an arbitrary figure whether it be a few months or x number of new topics?

I only did a database unit in uni so I have no real experience when it comes to real world performance of something like that. I'm genuinely interested to know what kind of problems that would cause. Would it be a performance issue? or a maintenance issue? etc..

Quote:In any case, I can't see any advantage to tracking via time rather than message number.
That's true about the difference in time and that message id would be the same..
So if i were to use messageid in place of dates in my description it could work?

Quote: There's probably a few things that will be hard for me to implement later, but I think I got most of the interesting functionality sorted (or at least sortable).

I'm not going to be too ambitious for this project since the maximum number of people that can join is 1000 seeing as it's an employee website. I doubt that they will all join I think the max I would probably see would be 400 or so. If that's the case would performance be a big issue here with my implementation?

Also is this read vs. unread feature a must or a "nice to have"? Just want to know how much attention to detail i should give this for the project.

Thanks again for feedback.

El Forum

[eluser]andrewtheandroid[/eluser]
Quote:Well I see few possible solutions here:

1. Use additional field in db for thread which will hold user ids who read the thread and when you display just check user id against that field. It might get big but if you make it so that every time a new response is posted, that field is erased and a thread is marked as new again... Something like this:

12 456 45 88 etc..

Would this be ok for a group of around 500 users with light to casual forum usage in terms of performance?

Quote:Forgot to mention for the first one that I think is how vbulletin actually works or some kind of hack for it or something and that I've read about it in the past... If I dig up a link I'll post it...

Keen to be updated on this. Smile I noticed some of the forums that I use that are powered by vbulletin if I go to a thread i'm now "subscribed" to taht thread and that any other threads i haven't opened i haven't "read". So does vbulletin simply check my list of subscribed threads?

El Forum

[eluser]jedd[/eluser]
[quote author="andrewtheandroid" date="1257521730"]
I see how a busy forum with many new topics each minute would be a big problem but then again I set the arbitrary number because I think we need to set a line so it won't cause a problem for performance?
[/quote]

The idea of an arbitrary limit came up in that earlier thread I pointed you at. I pondered it for a while, but I decided against it.

While you might save some space, you end up having to do more code (exception handling, count checks) and I suspect performance actually degrades.


Quote:Also if there were 100 new topics that were 5 pages deep I think anything past 2 pages is not as 'new' it is true I haven't read them but then if i dont visit the forum in one month I don't want to be flooded with 100 new topics as unread.

All true - however any thread (or topic, if you prefer that word) you haven't visited is marked as unvisited - there's no record of your state within that thread. To really spell this out - it means you don't store any data for that user / thread combination. You can deduce programmatically - from this absence - that the thread is unread (and, of course, that the user has never visited the thread).

Anything beyond 2 pages is, again, an arbitrary limit you're placing on every user for every situation - at best it's going to inconvenience a handful of people, at worst it's cheating. Wink And, as I say, I don't see significant benefits from saying 'okay, more than 2 pages of new messages, I'm going to stop tracking this page for this user, but for that user, who was here yesterday, I'll keep tracking it for them .. until another five messages come in .. yada yada.

Quote:I only did a database unit in uni so I have no real experience when it comes to real world performance of something like that. I'm genuinely interested to know what kind of problems that would cause. Would it be a performance issue? or a maintenance issue? etc..

You can easily do the math, if you want.

You need to know the number of active(!) forum users, the number of new threads popping up a day, how long threads may remain open without activity - this will give you worse case (size-wise) figures. Obviously not every user hits every thread across every forum, so the real-world results will be better (small data set, better performance).

In your case - 400 users multiplied by whatever number of threads you're expecting to start each day, multiplied by the number of days threads are active before being automatically locked .. will give you a rough table size. My gut feel is that the result could be termed 'trivial', both in terms of space and subsequently data access times.


I think there's a few other factors to consider - you've hinted at one (the 'older than two pages' thing), which is the presentation aspect. The way the CI forums do it .. well, I haven't quite worked it out. They seem to cheat a bit, and occasionally I lose state of my threads in the two forums I regularly haunt. Any system I write would have to be far more consistent.

A corollary is your question about whether this feature is necessary or just nice - I think it's necessary, as reading forums without some way of being able to identify which threads contain new messages would just be a nightmare. I really like the way that these forums work - emboldify the subject text, and provide a link to the newest message in the thread.

There's also whether you're going to auto-lock your old forums - if you do, then you have a) reduced the size of threads you need to track, and b) an ideal hook location for auto-cleaning dead thread/user/post table entries.

El Forum

[eluser]vokic[/eluser]
[quote author="andrewtheandroid" date="1257522042"]Would this be ok for a group of around 500 users with light to casual forum usage in terms of performance?[/quote]

Well for that group I think you shouldn't even see any difference at all..
Just make additional field for thread in db like 'read' and when someone reads it add his user id into that field... When someone adds a reply, just completely delete that field and start again.. Smile And when you do the display just compare:

Code:
if(strstr($thread['read'], $user_id)) echo '<a href="thread_url"><b>Thread title</b></a>';
else echo '<a href="thread_url">Thread title</a>';

or something similar Smile

When someone goes to view the thread just add his user id if doesn't already exist:

Code:
if(!strstr($thread['read'], $user_id)) {
  $thread['read'] .= ' '.$user_id;
  $this->db->update('threads', $thread, 'id=thread_id';
}

That's actually all there is to it... But the troubles might start when you need to select only threads read by the user or threads subscribed..
There is that solution with LIKE : "SELECT * FROM threads WHERE thread_read LIKE $user_id" and we all know it will be slow for large records.. Maybe full text search would help at that situation... Then there are more solutions like: "SELECT * FROM threads WHERE LOCATE($user_id, $thread_read) > 0" or something similar which might be faster then LIKE etc.. It has flaws but hey, we are chatting about it Smile

I would prefer thread based solution over user based solution because it is easier to maintain... Operations are done only on that one field while on user approach every time you update a thread (read/noread/delete) you would have to update each user record..

Maybe for subscriptions you can go even with standard way of thread_id, user_id table cause it will be much less than read/noread records..

El Forum

[eluser]vokic[/eluser]
And yes seems like CI Forums are based on time approach.. So for that my guess is that they are using classic thread_id user_id relation table which is updated regularly through a cron job or something similar.. Maybe even when you login triggers some action.. An insight into this from Derek or some from the staff would be great... Smile

El Forum

[eluser]jedd[/eluser]
[quote author="vokic" date="1257541931"]
I would prefer thread based solution over user based solution because it is easier to maintain... Operations are done only on that one field while on user approach every time you update a thread (read/noread/delete) you would have to update each user record..
[/quote]

I don't understand this bit. Any record of the state of a user's position in a thread has to include a unique identifier for the user, and the thread, as well as the position (time OR message number) in that thread.

Could you explain your take on the user-centric versus the thread-centric approach in more detail?

El Forum

[eluser]vokic[/eluser]
Hmmm I didn't realize we were talking about user positions in a thread or what that has to do with thread being already read or not? Are we talking about replies to a thread or just if it was read or not?

El Forum

[eluser]jedd[/eluser]
Bugger. I wrote a lengthy, insightful, cogent, polite spiel about this .. and then it got lost somewhere between pressing Submit and the 'oh, you want to write a new empty message in this thread, eh?' pages. Dangnabit.

With that in mind, what follows is a precis of what would have been the most enlightening message you read all year.

I think that perhaps you are suggesting an approach where you have a table that tracks threads, and records for each user whether or not that user is up to date in that thread or not. And my 'user centric' question would involve inverting that - a table that tracks users, and records for each thread whether the user is up to date in that thread or not. Any new activity in a thread involves scanning the table and modifying the records for each user (or each thread, depending). Is that right? If it is, then I think this is a particularly limiting approach to tracking things.

For any given thread there are several states for a given user:
a) the user has never seen the thread
b) the user has read the thread - including the most recent message - they are up to date
c) the user has read the thread - but has not seen 1 or more of the most recent messages
d) the thread is locked (it has exceeded the length of time a thread may be inactive)

On top of this you have the following variations:
i) a user may wish to unsubscribe from a thread. Thread tracking may still occur, and be shown in the forum's thread list, but notifications stop
ii) a user may wish to ignore a thread, seeing no reference to it in the forum's thread list (I don't know any forums that do this)

For me, the original four states (above) involve:
a) no state information is held at this point
b) the user_thread_message table contains a row with user id, thread id, and the id of the most recent message in that thread
c) the user_thread_message table contains a row with user id, thread id, and the id of the message the user most recently read in that thread
d) no state information is held for the user / thread combination

For © and (d), when a user looks at the list of threads in a forum, for each thread we compare the highest numbered message_id in that thread versus that recorded in the relevant row of the user_thread_message table. Visually this means we, for example, set the subject to be bold, like the CI forums approach.

For the complications:
i) I have a BOOL 'watched' column in my user_thread_message table. On a new message being posted in a thread, this table is searched and the relevant watching users are notified.
ii) I haven't bothered with implementing this (yet).

El Forum

[eluser]AgentPhoenix[/eluser]
[quote author="jedd" date="1257458301"]Yeah - who'd think to search forums for messages about writing forums, eh?

Anyhoo, the method I describe in there is pretty much done - a few things I need to sort out (locking threads on reads if they're a certain age) and so on - but the design seems to work a treat. Entirely unsure how well it would scale across thousands of threads, thousands of users - but I speculate reasonably well (because of the fact that locking a thread deletes all latest-read info about it for all users) and besides, disk is cheap.[/quote]

Weird, I had the box checked to notify me of responses to the thread and it didn't.

Anyway, for the part of this that checks for unread topics within a given forum, it shouldn't be too much of an issue because you'll only ever be checking a certain number of threads anyway, right? For example, if you have 50 threads to a page, then you'll only have to check those 50 threads for unread state, not all 800,000 that exist in that forum (unless I'm not thinking of something). Of course, when checking the general read/unread state of a forum, I can see where it might get hairy.
Pages: 1 2 3