While dedicated graph databases like Neo4j and ArangoDB have gained popularity for working with interconnected data, PostgreSQL—a traditional relational database—offers powerful capabilities for handling hierarchical and graph-like data structures. At the heart of this functionality is PostgreSQL’s implementation of Common Table Expressions (CTEs) and recursive queries, which allow you to traverse hierarchical relationships efficiently without specialized graph database software.
Understanding Hierarchical Data
Hierarchical data appears in countless scenarios:
- Organization charts with employees and managers
- Category trees in e-commerce platforms
- File system directories
- Comment threads with parent-child relationships
- Network topologies
- Family trees
These structures share a common pattern: nodes connected to other nodes through parent-child or similar relationships.
Setting Up a Basic Hierarchical Structure
Let’s start with a simple example: an employee hierarchy where each employee has a manager (except the CEO, who has no manager).
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
manager_id INTEGER REFERENCES employees(id),
department VARCHAR(100),
title VARCHAR(100)
);
-- Insert sample data
INSERT INTO employees (id, name, manager_id, department, title) VALUES
(1, 'John Smith', NULL, 'Executive', 'CEO'),
(2, 'Maria Garcia', 1, 'Sales', 'Sales Director'),
(3, 'Robert Johnson', 1, 'Engineering', 'CTO'),
(4, 'Lisa Wong', 2, 'Sales', 'Sales Manager'),
(5, 'David Kim', 3, 'Engineering', 'Lead Developer'),
(6, 'Jennifer Lee', 3, 'Engineering', 'QA Manager'),
(7, 'Michael Chen', 4, 'Sales', 'Sales Representative'),
(8, 'Sarah Brown', 4, 'Sales', 'Sales Representative'),
(9, 'James Wilson', 5, 'Engineering', 'Backend Developer'),
(10, 'Emily Davis', 5, 'Engineering', 'Frontend Developer'),
(11, 'Daniel Martin', 6, 'Engineering', 'QA Engineer'),
(12, 'Sophia Martinez', 7, 'Sales', 'Junior Sales Rep');
Recursive Queries with Common Table Expressions (CTEs)
PostgreSQL implements SQL:1999 standard recursive queries using the WITH RECURSIVE
syntax. This powerful feature allows you to start with an initial query (the anchor) and repeatedly apply another query (the recursive part) to the results until no new rows are returned.
Example 1: Finding All Subordinates of a Manager
Let’s find all employees who report directly or indirectly to the CTO (Robert Johnson, ID 3):
WITH RECURSIVE subordinates AS (
-- Anchor: start with the CTO
SELECT id, name, manager_id, department, title, 1 AS level
FROM employees
WHERE id = 3
UNION ALL
-- Recursive part: join with employees who have the current employee as manager
SELECT e.id, e.name, e.manager_id, e.department, e.title, s.level + 1
FROM employees e
JOIN subordinates s ON e.manager_id = s.id
)
SELECT id, name, department, title, level
FROM subordinates
ORDER BY level, name;
This query returns the CTO and all employees who report to the CTO, either directly or through intermediate managers, along with their level in the hierarchy.
Example 2: Finding the Path to the Top
Now, let’s find the management chain from a specific employee up to the CEO:
WITH RECURSIVE management_chain AS (
-- Anchor: start with a specific employee (e.g., Daniel Martin, ID 11)
SELECT id, name, manager_id, title, ARRAY[id] AS path
FROM employees
WHERE id = 11
UNION ALL
-- Recursive part: join with the manager of the current employee
SELECT e.id, e.name, e.manager_id, e.title, m.path || e.id
FROM employees e
JOIN management_chain m ON e.id = m.manager_id
)
SELECT id, name, title, array_length(path, 1) AS level
FROM management_chain
ORDER BY array_length(path, 1) DESC;
This query returns the complete chain of management from Daniel Martin up to the CEO, with the path array tracking the sequence of IDs in the hierarchy.
Advanced Graph Techniques in PostgreSQL
Cycles Detection
Real-world hierarchies sometimes have cycles (A reports to B, B reports to C, C reports to A). Recursive queries can detect these cycles:
WITH RECURSIVE hierarchy_check AS (
SELECT id, name, manager_id, ARRAY[id] AS path, false AS cycle
FROM employees
WHERE id = 5 -- Start with some employee
UNION ALL
SELECT e.id, e.name, e.manager_id, h.path || e.id, e.id = ANY(h.path)
FROM employees e
JOIN hierarchy_check h ON e.id = h.manager_id AND NOT h.cycle
)
SELECT * FROM hierarchy_check WHERE cycle;
If this query returns any rows, there’s a cycle in the hierarchy.
Breadth-First vs. Depth-First Search
PostgreSQL’s recursive queries normally use depth-first search. You can simulate breadth-first search by ordering the results:
WITH RECURSIVE org_tree AS (
SELECT id, name, manager_id, 1 AS level, ARRAY[id] AS path
FROM employees
WHERE id = 1 -- Start with the CEO
UNION ALL
SELECT e.id, e.name, e.manager_id, t.level + 1, t.path || e.id
FROM employees e
JOIN org_tree t ON e.manager_id = t.id
ORDER BY t.level -- This encourages breadth-first traversal
)
SELECT id, name, level, path
FROM org_tree
ORDER BY level, id;
Finding the Shortest Path Between Two Nodes
One classic graph problem is finding the shortest path between two nodes:
WITH RECURSIVE path_search AS (
-- Anchor: start with source employee
SELECT id, name, manager_id, ARRAY[id] AS path, 1 AS depth
FROM employees
WHERE id = 7 -- Source employee
UNION ALL
-- Recursive part: explore both up and down the hierarchy
SELECT e.id, e.name, e.manager_id, p.path || e.id, p.depth + 1
FROM employees e
JOIN path_search p ON (e.manager_id = p.id OR e.id = p.manager_id)
WHERE NOT e.id = ANY(p.path) -- Avoid cycles
AND p.depth < 10 -- Prevent infinite recursion
)
SELECT path
FROM path_search
WHERE id = 11 -- Target employee
ORDER BY array_length(path, 1)
LIMIT 1;
This query finds the shortest path between employee ID 7 and employee ID 11, exploring both upward and downward relationships.
Real-World Use Case: Category Hierarchies in E-commerce
Let’s look at a practical example of how to model and query product categories in an e-commerce system:
CREATE TABLE categories (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
parent_id INTEGER REFERENCES categories(id),
description TEXT,
slug VARCHAR(100) UNIQUE
);
INSERT INTO categories (id, name, parent_id, slug) VALUES
(1, 'Electronics', NULL, 'electronics'),
(2, 'Computers', 1, 'computers'),
(3, 'Laptops', 2, 'laptops'),
(4, 'Desktop PCs', 2, 'desktop-pcs'),
(5, 'Computer Components', 2, 'computer-components'),
(6, 'CPUs', 5, 'cpus'),
(7, 'Graphics Cards', 5, 'graphics-cards'),
(8, 'Monitors', 1, 'monitors'),
(9, 'Gaming Monitors', 8, 'gaming-monitors'),
(10, 'Office Monitors', 8, 'office-monitors'),
(11, 'Smartphones', 1, 'smartphones'),
(12, 'Android Phones', 11, 'android-phones'),
(13, 'iOS Phones', 11, 'ios-phones');
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(200) NOT NULL,
category_id INTEGER REFERENCES categories(id),
price DECIMAL(10, 2),
description TEXT
);
-- Insert some sample products
INSERT INTO products (name, category_id, price) VALUES
('Dell XPS 13', 3, 1299.99),
('Custom Gaming PC', 4, 1899.99),
('Intel Core i9-12900K', 6, 589.99),
('NVIDIA RTX 3080', 7, 699.99),
('Samsung Odyssey G7', 9, 649.99),
('Dell UltraSharp 27"', 10, 349.99),
('Samsung Galaxy S22', 12, 799.99),
('iPhone 14 Pro', 13, 999.99);
Query: Find All Products in a Category and Its Subcategories
A common requirement is to find all products in a category, including those in all subcategories. For example, all products in “Computers” and its subcategories:
WITH RECURSIVE category_tree AS (
SELECT id, name, parent_id
FROM categories
WHERE id = 2 -- "Computers" category
UNION ALL
SELECT c.id, c.name, c.parent_id
FROM categories c
JOIN category_tree ct ON c.parent_id = ct.id
)
SELECT p.id, p.name, p.price, c.name AS category
FROM products p
JOIN category_tree c ON p.category_id = c.id
ORDER BY c.name, p.name;
Query: Build a Complete Category Tree with Products Count
This query builds a category tree with indentation to visualize the hierarchy and shows the count of products in each category:
WITH RECURSIVE category_tree AS (
SELECT id, name, parent_id, name AS path, 1 AS level, ARRAY[id] AS path_ids
FROM categories
WHERE parent_id IS NULL
UNION ALL
SELECT c.id, c.name, c.parent_id,
ct.path || ' > ' || c.name,
ct.level + 1,
ct.path_ids || c.id
FROM categories c
JOIN category_tree ct ON c.parent_id = ct.id
),
category_product_counts AS (
SELECT c.id, COUNT(p.id) AS product_count
FROM category_tree c
LEFT JOIN products p ON p.category_id = c.id
GROUP BY c.id
)
SELECT
ct.id,
repeat(' ', ct.level - 1) || ct.name AS category,
ct.level,
cpc.product_count,
ct.path_ids
FROM category_tree ct
JOIN category_product_counts cpc ON ct.id = cpc.id
ORDER BY ct.path;
Advanced Techniques for Large Hierarchies
Materialized Path Pattern
For static hierarchies that don’t change often, you can use a materialized path pattern for faster queries:
ALTER TABLE categories ADD COLUMN path_ids INTEGER[];
-- Update the path_ids column
WITH RECURSIVE category_paths AS (
SELECT id, name, parent_id, ARRAY[id] AS path_ids
FROM categories
WHERE parent_id IS NULL
UNION ALL
SELECT c.id, c.name, c.parent_id, cp.path_ids || c.id
FROM categories c
JOIN category_paths cp ON c.parent_id = cp.id
)
UPDATE categories c
SET path_ids = cp.path_ids
FROM category_paths cp
WHERE c.id = cp.id;
-- Create an index for faster queries
CREATE INDEX idx_categories_path_ids ON categories USING GIN(path_ids);
Now you can find all descendants of a category without recursive queries:
SELECT * FROM categories
WHERE path_ids @> ARRAY[2]; -- All categories under "Computers" (ID 2)
Closure Table Pattern
Another approach for frequently queried hierarchies is the closure table pattern, which stores all paths in a separate table:
CREATE TABLE category_paths (
ancestor_id INTEGER REFERENCES categories(id),
descendant_id INTEGER REFERENCES categories(id),
path_length INTEGER,
PRIMARY KEY (ancestor_id, descendant_id)
);
-- Populate the closure table
WITH RECURSIVE paths AS (
-- Direct paths (length 0)
SELECT id AS ancestor_id, id AS descendant_id, 0 AS path_length
FROM categories
UNION ALL
-- Recursive paths
SELECT p.ancestor_id, c.id AS descendant_id, p.path_length + 1
FROM categories c
JOIN paths p ON c.parent_id = p.descendant_id
)
INSERT INTO category_paths
SELECT * FROM paths;
This allows for efficient queries of ancestors, descendants, and paths:
-- All ancestors of "CPUs" (ID 6)
SELECT c.*
FROM categories c
JOIN category_paths cp ON c.id = cp.ancestor_id
WHERE cp.descendant_id = 6
ORDER BY cp.path_length DESC;
-- All descendants of "Computers" (ID 2)
SELECT c.*, cp.path_length AS depth
FROM categories c
JOIN category_paths cp ON c.id = cp.descendant_id
WHERE cp.ancestor_id = 2 AND cp.path_length > 0
ORDER BY cp.path_length, c.name;
Performance Considerations
When working with hierarchical data in PostgreSQL, keep these tips in mind:
- Indexing: Create appropriate indexes on parent_id columns and any columns used for filtering in recursive queries.
- Limit Recursion Depth: Always include a depth limit in your recursive queries to prevent runaway queries in case of cycles.
- Materialization: For frequently accessed hierarchies that don’t change often, consider materializing the paths.
- Pagination: Hierarchical data can grow large; implement pagination when displaying results.
- Monitoring: Use
EXPLAIN ANALYZE
to understand query performance and optimize as needed.
Conclusion
PostgreSQL’s recursive query capabilities make it a powerful tool for working with hierarchical and graph-like data structures. While dedicated graph databases offer specialized features for complex graph operations, PostgreSQL provides an excellent solution for many hierarchical data problems without introducing another database system into your architecture.
Whether you’re modeling organizational hierarchies, category trees, or network relationships, PostgreSQL’s recursive CTEs offer a standardized, SQL-based approach to traversing and analyzing connected data. Combined with PostgreSQL’s robust transaction support, rich data types, and extensive indexing options, you get a versatile platform for handling both relational and hierarchical data in a single database system.
For many applications, this approach offers the perfect balance of power and simplicity, allowing you to leverage graph-like queries while maintaining the benefits of a mature relational database system.
Leave a Reply