Welcome Guest, Not a member yet? Register   Sign In
Hierarchical Pages: Storage & Access
#1

[eluser]Teks[/eluser]
I saw that there has been much discussion in these forums about the different strategies and models used for storing hierarchical data in SQL databases. From these discussions, it seemed to me that there are many programmers out there who are unaware of the many different models available - we most often hear about 1 or 2 only, and even then there seems to be sometimes an irrational favouritism towards a specific model over another, regardless of the specific case requirements. This, again, seems to me to indicate a lack of knowledge, and true understanding of the advantages and disadvantages inherent in each model, in each design pattern. I have often found, that in specific situations we have to change, adapt, combine and evolve the 'standard' solutions, in order to get a result that is a *truly* effective answer to our current requirements.

I am in the middle of designing a new CMS, which hopefully will be based on CodeIgniter, and therefore have just gone through the process of researching the current design patterns and trends to help in some of our problems - one of them being, the storage of an arbitrarily large and complex hierarchy of 'pages', which every CMS should be capable of handling. The analysis process itself was quite interesting, and I thought that perhaps someone in the CodeIgniter community might benefit from having access to it. So, here is our final analysis and recommendation, for your critique and enjoyment. I hope someone will find this useful.
#2

[eluser]Teks[/eluser]
There seem to be currently 4 main techniques for storage and manipulation of hierarchical information in SQL databases. These are: 'adjacency list model', 'nested set model', 'node enumeration model', and 'edge enumeration model'.

We approach the 4 different models with the intention of creating a hierarchy of nested pages for an online content-management system. It is assumed that the content of all pages is going to be stored in a 'pages' table, in an online database which will most likely be MySQL or PostgreSQL. A php script will receive the request, retrieve the pages from the database, and display them.

Creating and managing page hierarchy in this context poses specific problems, which other types of nested hierarchies may not experience:

* Each page is accessed via a unique URL, which somehow must be used by the php script in order to find the page in the database. This often means that the URL has to be broken down, and each component of the URL - representing a 'node' in the hierarchy - has to be verified against the db. For instance: a request for a page at address "mydomain.com/catalog/category/subcategory/product" may mean that we need to ensure that a 'catalog' pages exists, and that it has a child named 'category'. We then have to check that 'category' has a child 'subcategory', and so on. This may translates into multiple db requests for each single page access - very inefficient, and able to cause many problems on sites with large traffic.
* A request for a full, hierarchical list of all pages in the database - which is often required when users are adding, removing or editing pages in a content management system - is often translated into a recursive request, which can create multiple - and often numerous - requests to the db.
* The order of pages - as it appears in the full pages list to the user - is arbitrary. In content management systems, it is vital that the user be able to reorder pages at will, as this order usually translates into the final navigation menus of the user's site. Again, retrieving the list, while obeying the user-defined order of pages, is often implemented via recursive algorithms, which cause multiple queries to the db.
* Parent/Child information, in relation to a specific page, is often needed - for instance, when creating 'breadcrumbs' for a page, or when generating dynamic sub-menus via javascript/ajax on-the-fly. Getting a page's 'lineage' is, again, often done via recursive algorithms which generate numerous db queries.

After extensive research on the internet, we managed to find over 10 workable design patterns, which offer solutions for storage of hierarchical data in sql databases. We found that although there is a large number of design patterns in use, most seem to be variation on 1 of 4 common storage models:

1) Adjacency List: http://dev.mysql.com/tech-resources/arti...-data.html
2) Nested Set: http://dev.mysql.com/tech-resources/arti...-data.html
3) Node Enumeration: http://www.onlamp.com/pub/a/onlamp/2004/...l_sql.html
4) Edge Enumeration: http://www.onlamp.com/pub/a/onlamp/2004/...l_sql.html

Both "node enumeration" and "edge enumeration" models are actually merely specialised variants of the "Path Enumeration" model - but the variations are particularly relevant to our application here, so they are listed separately.

Each of the models analysed here presupposes a certain structure in the data table - that is, that certain fields will be available in the table to store a certain type of information that is needed for that model. The required fields are listed below.

Items which are seen as negatively impacting on our choice are marked with ">>".
#3

[eluser]Teks[/eluser]
--------------------------------------------
ADJACENCY LIST MODEL:
--------------------------------------------
Data table needs fields:
`parent_id` int(10) unsigned NOT NULL default '0'
`order` int(10) unsigned NOT NULL default '0'

When creating a new page:
-- set parent_id
-- set own order
-- set order of pages under same parent where order >= my order

When deleting a page:
-- delete all children (pages where parent_id = my id)
-- delete self

When moving a page within the hierarchy:
-- set parent_id
-- set own order
-- set order of pages under same parent where order >= my order

When retrieving a single page:
>> convert URL path to page id (possibly using recursion and multiple db queries)
-- if page does not exist, show 404

When retrieving complete hierarchical list of pages:
>> possibly using recursion, and multiple db queries

When retrieving breadcrumbs for a page:
>> possibly using recursion, and multiple db queries

When retrieving the children of a page:
-- single query: select pages where parent = my id
#4

[eluser]Teks[/eluser]
--------------------------------------------
NESTED SET MODEL:
--------------------------------------------
Data table needs fields:
`lft` int(10) unsigned NOT NULL default '0'
`rgt` int(10) unsigned NOT NULL default '0'

When creating a new page:
-- set own LFT number (based on the RGT of either previous sibling, or parent)
-- set own RGT number
-- update LFT and RGT of all pages where LFT > my own LFT

When deleting a page:
-- delete all children (pages where LFT > my own LFT AND RGT < my own RGT)
-- delete self
-- update LFT and RGT of all pages where LFT > my own LFT

When moving a page within the hierarchy:
-- set own LFT number (based on the RGT of either new previous sibling, or new parent)
-- set own RGT number
-- update LFT and RGT of all pages where LFT > my own (new) LFT

When retrieving a single page:
>> convert path to page id (possibly using recursion, and multiple db queries)
-- if page does not exist, show 404

When retrieving complete hierarchical list of pages:
-- single query: select all pages ordered by LFT

When retrieving breadcrumbs for a page:
-- select all pages where LFT < my own LFT AND RGT > my own RGT, ordered by LFT

When retrieving the children of a page:
-- single query: select all pages where LFT > my own LFT AND RGT < my own RGT, ordered by LFT
#5

[eluser]Teks[/eluser]
--------------------------------------------
NODE ENUMERATION MODEL:
--------------------------------------------
Data table needs fields:
`keypath` varchar(200) default NULL
`order` int(10) unsigned NOT NULL default '0'

When creating a new page:
-- set own keypath (based on parent's keypath + url-fied title of page)
-- set own order
-- set order of pages under same parent where order >= my order

When deleting a page:
-- delete all children (= pages where keypath starts with my keypath)
-- delete self

>> When editing a page:
-- if title changed, then:
-- set own keypath
-- set keypath of all children (= pages where keypath starts with my keypath)

When moving a page within the hierarchy:
-- set order of pages under same parent where order > my order
-- set own order
-- if moved to a different parent:
-- set own keypath
-- set keypath of all children (= pages where keypath starts with my keypath)

When retrieving a single page:
-- simple, single query: select page where keypath = url
-- if page id does not exist, show 404

When retrieving complete hierarchical list of pages:
>> retrieve all pages then order results (by recursion) within parents

When retrieving breadcrumbs:
-- keypath IS the breadcrumb

When retrieving the children of a page:
-- possibly a single query (depending on db): select all pages where keypath starts with my keypath
#6

[eluser]Teks[/eluser]
--------------------------------------------
EDGE ENUMERATION MODEL:
--------------------------------------------
Data table needs fields:
`keypath` varchar(200) default NULL

When creating a new page:
-- set own keypath (based on parent's keypath + my order)
>> set keypath of pages under same parent where last item of keypath (= order) >= my order

When deleting a page:
-- delete all children (= pages where keypath starts with my keypath)
-- delete self

When moving a page within the hierarchy:
>> set keypath of pages under same parent where last item of keypath (= order) >= my order
-- set own keypath (based on new parent's keypath + my new order)
-- if moved to a new parent:
-- set keypath of all children (= pages where keypath starts with my keypath)

When retrieving a single page:
>> convert path to page id (possibly using recursion, and multiple db queries)
-- if page does not exist, show 404

When retrieving complete hierarchical list of pages:
-- single query: select all pages ordered by keypath

When retrieving breadcrumbs for a page:
>> possibly using recursion, and multiple db queries

When retrieving the children of a page:
-- possibly a single query (depending on db): select all pages where keypath starts with my keypath
#7

[eluser]Teks[/eluser]
--------------------------------------------
CONCLUSION
-------------------------------------------
The 'nested set model' would be the ideal candidate for our requirements, if it were not for the single problem of having to convert the page's url - possibly recursively - when retrieving a single page from the database. Unfortunately, retrieving a single page is a very common and essential task, so a more optimal solution must be found. In fact, all of the models studied have this same problem, except for one.

The 'node enumeration model' - and its variants - seems to be the only one that can efficiently deal with conversion of the page's URL into a single db query. Therefore, in order to overcome this problem in the 'nested set model', we suggest that we combine some of the functionality from the 'node enumeration model', and end up with our own variant - which we may call an "enumerated nested set model":

ENUMERATED NESTED SET MODEL:
Data table needs fields:
`keypath` varchar(200) default NULL
`lft` int(10) unsigned NOT NULL default '0'
`rgt` int(10) unsigned NOT NULL default '0'

The keypath field will store the page's URL, and therefore we will be able to retrieve a single page simply by looking up its URL against the keypaths.
The ordering and nesting of the pages, however, is not handled by the 'keypath' field at all, but rather by the 'lft' and 'rgt' fields, as in a 'normal' nested set model.

When creating a new page:
-- set own keypath (based on parent's keypath + url-fied title of page)
-- set own LFT number (based on the RGT of either previous sibling, or parent)
-- set own RGT number
-- update LFT and RGT of all pages where LFT > my own LFT

When deleting a page:
-- delete all children (= pages where LFT > my own LFT AND RGT < my own RGT)
-- delete self
-- update LFT and RGT of all pages where LFT > my own LFT

-- When editing a page:
-- if title changed, then:
-- set own keypath
-- set keypath of all children (= pages where LFT > my own LFT AND RGT < my own RGT)

When moving a page within the hierarchy:
-- set own LFT number (based on the RGT of either new previous sibling, or new parent)
-- set own RGT number
-- update LFT and RGT of all pages where LFT > my own (new) LFT
-- if moved to a different parent:
-- set own keypath
-- set keypath of all children (= pages where LFT > my own LFT AND RGT < my own RGT)

When retrieving a single page:
-- simple, single query: select page where keypath = url
-- if page id does not exist, show 404

When retrieving complete hierarchical list of pages:
-- single query: select all pages ordered by LFT

When retrieving breadcrumbs:
-- keypath IS the breadcrumb, OR
-- single query: select all pages where LFT < my own LFT AND RGT > my own RGT, ordered by LFT

When retrieving the children of a page:
-- single query: select all pages where LFT > my own LFT AND RGT < my own RGT, ordered by LFT

This solution seems to combine some of the advantages of both models, allowing for quick retrieval of single pages, as well as for efficient retrieval of complete hierarchical lists of pages.

It is important to note, however, that it also combines some of the disadvantages of both models. Some extra code has to be written to manage the integrity of the 'keypath' properties of pages, whenever the user changes the title of a page as the keypath is based on the title. Also, whenever a page is created, moved or deleted within the hierarchy, extra code has to be written to manage the integrity of 'lft' and 'rgt' fields throughout the database. Nevertheless, as these updates can be easily made to the database with a single query, this still seems to be the most efficient solution to date.
#8

[eluser]tmkajk[/eluser]
Teks,
I realize it has been a couple of months since you post, but I was wondering if you made any progress with your model for managing hierarchical pages? I find myself in pretty much the same situation and, while a variation of the node enumeration model has worked so far, now need to develop a way to manage a completely hierarchical set of pages easily. That is, I need to develop a way to allow users to manage not only page CRUD, but where the page is on the map. So, again, did you make any progress with this as actual code or is it still proof of concept? Could you use a second brain on the project?


~ Todd




Theme © iAndrew 2016 - Forum software by © MyBB