Home What are the options for storing hierarchical data in a relational database?
Reply: 0

What are the options for storing hierarchical data in a relational database?

user3776
1#
user3776 Published in June 25, 2018, 4:02 am

Good Overviews

Generally speaking, you're making a decision between fast read times (for example, nested set) or fast write times (adjacency list). Usually, you end up with a combination of the options below that best fit your needs. The following provides some in-depth reading:

  • One more Nested Intervals vs. Adjacency List comparison: the best comparison of Adjacency List, Materialized Path, Nested Set and Nested Interval I've found.
  • Models for hierarchical data: slides with good explanations of tradeoffs and example usage
  • Representing hierarchies in MySQL: very good overview of Nested Set in particular
  • Hierarchical data in RDBMSs: most comprehensive and well-organized set of links I've seen, but not much in the way of explanation

Options

Ones I am aware of and general features:

  1. Adjacency List:
    • Columns: ID, ParentID
    • Easy to implement.
    • Cheap node moves, inserts, and deletes.
    • Expensive to find the level, ancestry & descendants, path
    • Avoid N+1 via Common Table Expressions in databases that support them
  2. Nested Set (a.k.a Modified Preorder Tree Traversal)
    • Columns: Left, Right
    • Cheap ancestry, descendants
    • Very expensive O(n/2) moves, inserts, deletes due to volatile encoding
  3. Bridge Table (a.k.a. Closure Table /w triggers)
    • Uses separate join table with: ancestor, descendant, depth (optional)
    • Cheap ancestry and descendants
    • Writes costs O(log n) (size of subtree) for insert, updates, deletes
    • Normalized encoding: good for RDBMS statistics & query planner in joins
    • Requires multiple rows per node
  4. Lineage Column (a.k.a. Materialized Path, Path Enumeration)
    • Column: lineage (e.g. /parent/child/grandchild/etc...)
    • Cheap descendants via prefix query (e.g. LEFT(lineage, #) = '/enumerated/path')
    • Writes costs O(log n) (size of subtree) for insert, updates, deletes
    • Non-relational: relies on Array datatype or serialized string format
  5. Nested Intervals
    • Like nested set, but with real/float/decimal so that the encoding isn't volatile (inexpensive move/insert/delete)
    • Has real/float/decimal representation/precision issues
    • Complex matrix encoding variant adds ancestor encoding (materialized path) for "free"
  6. Flat Table
    • A modified Adjacency List that adds a Level and Rank (e.g. ordering) column to each record.
    • Cheap to iterate/paginate over
    • Expensive move and delete
    • Good Use: threaded discussion - forums / blog comments
  7. Multiple lineage columns
    • Columns: one for each lineage level, refers to all the parents up to the root, levels down from the item's level are set to NULL
    • Cheap ancestors, descendants, level
    • Cheap insert, delete, move of the leaves
    • Expensive insert, delete, move of the internal nodes
    • Hard limit to how deep the hierarchy can be

Database Specific Notes

MySQL

  • Use session variables for Adjacency List

Oracle

  • Use CONNECT BY to traverse Adjacency Lists

PostgreSQL

  • ltree datatype for Materialized Path

SQL Server

  • General summary
  • 2008 offers HierarchyId data type appears to help with Lineage Column approach and expand the depth that can be represented.
You need to login account before you can post.

About| Privacy statement| Terms of Service| Advertising| Contact us| Help| Sitemap|
Processed in 0.419019 second(s) , Gzip On .

© 2016 Powered by mzan.com design MATCHINFO