Welcome Guest, Not a member yet? Register   Sign In
Pagination Algorithms
#1

[eluser]kurucu[/eluser]
I have written a simple forum in codeigniter and successfully implemented pagination.

The query just shows the next x replies after offset y (sticking to MySQL Limit command, as it allows different users to specify their own page sizes.

My problem is finding out which page a post would lie on, for example for linking quotes or redirection after editing.

Does anyone know what sort of query or algorithm could be used to directly find the page for a post, or at least find the relative position of a post in a sequential list (i.e. from the MySQL query).
#2

[eluser]jedd[/eluser]
Hi kurucu. Curiously, I'm looking at something similar for my own rinky dinky forum system at the moment.

I haven't written anything yet, as I've been hoping for a great epiphany after a few weeks of 'background pondering'. Alas, no epiphany has been forthcoming.

My rough thoughts so far:

First, do a count() query for all relevant (ie. = thread_id, etc) messages where the id is less than or equal to the-desired-message-id.

Mod the result against your posts-per-page variable, then subtract same from earlier count() in order to get the CI pagination offset number.
#3

[eluser]alboyd[/eluser]
[quote author="jedd" date="1251827400"]Hi kurucu. Curiously, I'm looking at something similar for my own rinky dinky forum system at the moment.

I haven't written anything yet, as I've been hoping for a great epiphany after a few weeks of 'background pondering'. Alas, no epiphany has been forthcoming.

My rough thoughts so far:

First, do a count() query for all relevant (ie. = thread_id, etc) messages where the id is less than or equal to the-desired-message-id.

Mod the result against your posts-per-page variable, then subtract same from earlier count() in order to get the CI pagination offset number.[/quote]

Would it just be

ceil or floor(count of messages up to desired message id / number of messages)
#4

[eluser]jedd[/eluser]
Yeah, if you want to take the fun out of it ...

page-you-want = floor ( count-of-total-posts-up-to-selected / posts-per-page )

I'd watch out (but I'm always paranoid about them) off-by-one errors, and also that the pagination system copes with 0 (I think it does - but I'd probably just strip it myself to be neat).
#5

[eluser]kurucu[/eluser]
Thanks for all the responses so far. I can see that the functions for total pages and the page you want given a particular post are similar.

However, my problem I think really lies within the fact that it is hard to find the number of posts up to the one you want.

The ID is not useful because posts may be moved between threads for moderation. Is there a method to find the row number and select based on that?

Come to think of it... is it as simple as selecting all posts in the thread before the given post date?

So, something based on this method would be:

Code:
Run count(*) queries to find total_post_count and posts_before_current (based on post dates)

total_pages = ceil( total_post_count / posts_per_page )
desired_page = floor( posts_before_current / posts_per_page)

Select first posts_per_page starting at desired_page

Would that be quite a quick set of queries to perform regularly?

-----

As a bit of background, I settled for a threaded system that specifies parent_id to give hierarchy. I've read about other tree methods that use incremental boundary numbers but, while slick, looked impossible to maintain. I only provide one view at present: linear chronological; but I'll keep my mind open so that a threaded output can be shown if I don't have to use recursive queries.

I have a thread table, rather than relying on selecting the ultimate parent from the post table, just for speed when viewing/searching, and to store one-off thread meta data.

It took me a lot of pondering on how to do it efficiently while still offering reasonable features! I think this model works well. I left-join user information to the posts result to show signatures, usernames etc. I considered loading a user cache for each thread, but decided this was more efficient. I may be wrong!
#6

[eluser]jedd[/eluser]
I tend to like my caches lower down the stack, too. And I rather foolishly hadn't entertained that you might be running a schema that didn't look roughly like mine, consequently didn't have the same functionality.

I think I can follow what you're doing in your database, but for comparison here's mine. Two caveats. First, tabs are set to 4 spaces but I think EE mungs that round a bit (certainly its font selection isn't monospace AFAICT). Second, actual content is stored in a message_content table where its ID matches that of message - I keep it separate as want to keep message slim and fast, and know I'll be doing some funky index/searching on the content proper.

Code:
CREATE TABLE forum  (
        id                      INT UNSIGNED NOT NULL AUTO_INCREMENT,
        name            CHAR (100) UNIQUE NOT NULL,             # Eg. 'Seed saving'
        description     CHAR (250),                                             # Eg. 'To discuss the saving of seeds'
        PRIMARY KEY (id),
        INDEX (name)
        );

CREATE TABLE thread  (
        id                      INT UNSIGNED NOT NULL AUTO_INCREMENT,
        forum_id        INT UNSIGNED NOT NULL,                  # pseudo-FK to forum.id  *** AND *** 0 = MAIL
        sticky          BOOL DEFAULT FALSE,                             # To show at the top of every forum, or not
        locked          BOOL DEFAULT FALSE,                             # Can we post new messages to this thread
        view_count      SMALLINT DEFAULT 0,                             # We don't care about tracking > 65k views
        subject         CHAR (99) NOT NULL,                             # Eg. 'Problem with drying out tomato seeds'
        PRIMARY KEY (id),
        INDEX (forum_id),
        INDEX (subject)
        );


CREATE TABLE message  (
        id                      INT UNSIGNED NOT NULL AUTO_INCREMENT,
        thread_id       INT UNSIGNED DEFAULT 0,                 # FK to thread.id (every message is part of a thread)
        author          INT UNSIGNED DEFAULT 0,                 # FK to member.id (every message has an author)
        msgdate         DATETIME,                                               # Date of message creation
        PRIMARY KEY (id),
        INDEX (thread_id),
        INDEX (author)
        );

I hate sticky threads (they were deprecated the first time someone invented a wiki, IMO) but have the feature there JIC someone talks me into using them. I'm also sticking with forward chronology only, as per your approach - but reason it should be straightforward enough to invert that order. User / message-#-in-post is tracked elsewhere, as is whether you're watching a particular thread.

Anyway, to your problem - you suggest you do track by thread ID, as the parent - but then you say that you are having difficulty counting based on the number of messages in a thread - surely that's a 'where parent = x'?
#7

[eluser]kurucu[/eluser]
As it turns out, our designs are very similar except for the message indexing (or in fact any indexing). With your message subtable, how do you propose to collect all the messages for displaying each thread?

I was actually thinking of doing something similar so that post editing history could be reviewed.

My problem was finding out where the post in question lies, rather than the total number of posts. I think I'll go with counting all posts with the wanted post's date or earlier for now. I wondered if there was a faster way to do it than perform several queries before the page generation query, or at least minimising them.
#8

[eluser]jedd[/eluser]
[quote author="kurucu" date="1251897795"]
With your message subtable, how do you propose to collect all the messages for displaying each thread?
[/quote]

Propose? Most of my basic core functionality is in play already.

Note that I share thread & message with my private-mail system (which is a kind of whispered thread variation to the whole forum system).

Here's my query to grab the messages for a given thread (it includes pagination support - ie. the LIMIT)
Code:
function  get_list_of_messages ( $thread_id = 0 , $offset = 0 , $count = 10 )  {
    $query_string = "SELECT
                        message.id,
                        message.msgdate,
                        member.handle       AS  author_name,
                        member.id               AS  author_id,
                        member.created          AS  author_created,
                        message_content.actual  AS  content
                    FROM
                        message
                    LEFT JOIN
                        member  ON  message.author=member.id
                    LEFT JOIN
                        message_content  ON  message.id=message_content.id
                    WHERE
                        message.thread_id=". $thread_id ."
                    ORDER BY
                        msgdate ASC
                    LIMIT ". $offset ." , ". $count ;

    $results = $this->my_get_rows ($query_string);

    // It'd be nice to do this in the main SQL call above, but can't work out how ...
    if ($results)
        foreach ($results as $result=>$message)
            $results[$result]['author_posts'] = $this->get_number_of_posts_by_author($message['author_id']);

    return $results;
    }  //  end-method  get_list_of_messages ()

I'm also not too sure about the last bit there - getting post-count-by-author. At the moment I'm running a pretty-much 3NF schema, but realise I may have to denormalise. The function cited there (get number of posts by author) actually does a count() against messages at the moment, but can easily swing back to a lookup to the member table.

Quote:I was actually thinking of doing something similar so that post editing history could be reviewed.

I'm not sure what to do with this.

I have my activity log, of course, and was wondering about just having a category in there for message edits .. but then it really blows out the (var) width of that table compared with the original intent of the log.

I thought about never deleting posts, just flagging them as previous instances and effectively hiding them - but then I'd end up with quite a messy two-way linked list within my messages structure, chronology would no longer reliably map to ID order, and I'd still have to work out a way to identify deltas. I toyed with having a separate log just for messages - so a snapshot after each edit gets dumped in there - and I have a joiner table to link message.id with message_snapshot.id say.

At the moment I'm thinking I might just flag that a message *has* been edited .. and leave it at that.

What's your current thinking on this one?


Quote:My problem was finding out where the post in question lies, rather than the total number of posts. I think I'll go with counting all posts with the wanted post's date or earlier for now. I wondered if there was a faster way to do it than perform several queries before the page generation query, or at least minimising them.

While I'm all for reducing the number of queries .. remember these are pretty simple and cheap ones, especially if you've indexed a few of the key (ha ha) fields. And these types of queries are perfect cache candidates.

Are your message numbers going to skew from date order? If they're not, it remains very simple. If they do, then the only obvious complication there is sorting by your date field instead .. yeah?
#9

[eluser]kurucu[/eluser]
Nice work.

You've done pagination support and thread information collection just about the same way as I plan to, except I do not have a message_content table - my messages are in the message table (I call mine post).

The reason I was asking about the message content query was that I thought you'd have more than one message_content per message (for tracking history, as I intended to do). Can you explain how separating these makes indexing better/possible? To be honest, I may do away with an edit history as I can only see rationale for keeping it for auditing and not so much for daily use (e.g. in the extreme event that the thread becomes subject to a legal argument.... I said extreme).

I don't know what version of MySQL you're using, or whether this affects 3NF, but could you not use a subquery to get the user's post count? Given your description, it sounds possible, but perhaps I'm missing something. However, as user posting is far less frequent than viewing, I decided to keep a post_count for each user in the user table. This is then quickly collected much like the other user details (in both our cases), but for me requires a quick call to a function to increment post user count for each post event.

As for your final comments on my problem - agreed. I wasn't sure how heavy these queries might be, but accept that a count() or two won't slow things down, and the final select only gets what's going on the page anyway.

Thanks for your time and thoughts! This has been a very helpful discussion for me so far!
#10

[eluser]jedd[/eluser]
Ah, yeah, that'd be an interesting approach (the message_content : message being n:1). At the moment one has an autoinc id, the other's PK is 'id' but that's inserted on creation and taken from the first's id. If you see what I mean.

I think .. I may end up with a message_changes table [ id, message_id, who, when, content ] ... and see how that goes.

I separated them out as I knew that messages table would get a beating, and wanted it as slim as possible - certainly fixed length / short rows. I don't know if it'll make indexing better, but I know that the whole thing of indexing large text fields is ... somewhat of a black art. I'm not sure at this stage if I'll use a different storage engine for the message content, even. I reason it's easy to migrate it back into a single 'message' table if I want - but it'd be messy to revisit a single table approach later and split into multiple tables.

I agree about the post_count in the user table - I've got that field in there now, but haven't coded anything for it yet .. but yeah .. pretty easy to update on posting. I'll feel a bit dirty de-normalising my schema, though. And you're right - a sub-query would work well, but I prefer the PHP iteration there, at least for the time, mostly because as I mentioned, I thought I may change the mechanism for determining post counts. When I swing over to keeping that in the user table it'll be easy to remove that bit entirely.

Btw, are you likely to be publishing your code once you're done?




Theme © iAndrew 2016 - Forum software by © MyBB