Understanding Query Performance: A Deep Dive into EXPLAIN ANALYZE

When your database starts to slow down, diagnosing the exact source of performance issues can be challenging. Fortunately, PostgreSQL provides a powerful tool for this purpose: EXPLAIN ANALYZE. This command offers crucial insights into how PostgreSQL executes your queries, helping you identify bottlenecks and opportunities for optimization.

What is EXPLAIN ANALYZE?

The EXPLAIN ANALYZE command shows you the execution plan that PostgreSQL’s query planner generates for your SQL statements, along with actual runtime statistics. This provides a comprehensive view of:

  • How the database searches for your data
  • Which indexes (if any) are being used
  • How tables are being joined
  • The actual time spent on each operation
  • How many rows are processed at each step

Basic Usage

The syntax is straightforward:

EXPLAIN ANALYZE
SELECT * FROM users WHERE email = 'john@example.com';

The output might look like this:

QUERY PLAN
-----------------------------------------------------------------
Index Scan using idx_user_email on users  (cost=0.29..8.31 rows=1 width=72) (actual time=0.044..0.046 rows=1 loops=1)
  Index Cond: (email = 'john@example.com'::text)
Planning Time: 0.092 ms
Execution Time: 0.075 ms

Understanding the Output

Let’s break down the components of an EXPLAIN ANALYZE output:

1. Operation Types

These are the fundamental operations PostgreSQL performs:

  • Seq Scan: Full table scan, reading every row sequentially
  • Index Scan: Using an index to find specific rows
  • Index Only Scan: Using only the index without accessing the table
  • Bitmap Scan: Two-phase scan using an index bitmap
  • Nested Loop/Hash/Merge Join: Different join algorithms
  • Sort: Sorting operations for ORDER BY clauses
  • Aggregate: Calculations for functions like COUNT, SUM, AVG
  • Limit: Restricting number of rows returned

2. Cost Estimates vs. Actual Times

(cost=0.29..8.31 rows=1 width=72) (actual time=0.044..0.046 rows=1 loops=1)

This shows:

  • Startup cost (0.29): Estimated cost before first row is returned
  • Total cost (8.31): Estimated cost to return all rows
  • Estimated rows (1): Expected number of rows
  • Width (72): Estimated average row width in bytes
  • Actual time (0.044..0.046): Actual startup and total time (in milliseconds)
  • Actual rows (1): Actual number of rows returned
  • Loops (1): Number of times this operation was performed

3. Planning Time and Execution Time

Planning Time: 0.092 ms
Execution Time: 0.075 ms
  • Planning Time: How long PostgreSQL spent creating the execution plan
  • Execution Time: Actual time to execute the query

Reading Execution Plans: A Hierarchical Structure

Execution plans are read from the innermost (most indented) operations to the outermost ones. Each operation feeds its results to the operation above it.

Example:

QUERY PLAN
-----------------------------------------------------------------
Hash Join  (cost=10.57..21.14 rows=10 width=73) (actual time=0.136..0.138 rows=10 loops=1)
  Hash Cond: (orders.customer_id = customers.id)
  ->  Seq Scan on orders  (cost=0.00..8.50 rows=350 width=28) (actual time=0.011..0.013 rows=20 loops=1)
  ->  Hash  (cost=9.50..9.50 rows=10 width=45) (actual time=0.083..0.083 rows=10 loops=1)
        ->  Seq Scan on customers  (cost=0.00..9.50 rows=10 width=45) (actual time=0.008..0.009 rows=10 loops=1)
              Filter: (status = 'active'::text)

Reading order:

  1. Seq Scan on customers with a filter
  2. Hash creation from customers data
  3. Seq Scan on orders table
  4. Hash Join combining both using the specified condition

Common Performance Issues to Look For

1. Sequential Scans on Large Tables

Seq Scan on users (cost=0.00..458.00 rows=10000 width=182) (actual time=0.028..8.312 rows=10000 loops=1)

This indicates PostgreSQL is reading every row in the users table. For large tables, this suggests you might need an index.

2. Significantly Wrong Row Estimates

When estimated rows are vastly different from actual rows, it indicates outdated statistics:

Index Scan using idx_email on users (cost=0.29..8.31 rows=1 width=72) (actual time=0.044..12.046 rows=10000 loops=1)

Solution: Run ANALYZE on your table to update statistics.

3. Inefficient Join Methods

Hash joins are typically fast, but nested loops on large datasets can be problematic:

Nested Loop (cost=0.29..2844.29 rows=1000 width=68) (actual time=0.068..425.072 rows=10000 loops=1)

Consider adding indexes on join columns or increasing work_mem parameter.

4. High Startup Times

If planning time is unusually high, you might have complex queries or triggers affecting performance:

Planning Time: 230.092 ms

5. Multiple Passes (Loops > 1)

When loops are greater than 1, PostgreSQL is executing the same operation multiple times:

Index Scan using idx_order_id on order_items (cost=0.29..8.31 rows=5 width=72) (actual time=0.044..0.046 rows=5 loops=1000)

This might indicate opportunities for better join strategies or materialization.

Advanced Features

Buffers Information

Add BUFFERS to see I/O statistics:

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM users WHERE email = 'john@example.com';

Output includes:

Buffers: shared hit=4 read=1

This shows how many buffer blocks were:

  • Hit in cache (faster)
  • Read from disk (slower)

Format Options

For more readable output:

EXPLAIN (ANALYZE, FORMAT JSON) SELECT * FROM users WHERE email = 'john@example.com';

Available formats: TEXT (default), XML, JSON, YAML

Real-World Optimization Examples

Example 1: Adding Missing Index

Original query:

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_email = 'john@example.com';

Seq Scan on orders (cost=0.00..2230.00 rows=42 width=120) (actual time=0.536..15.478 rows=38 loops=1)
  Filter: (customer_email = 'john@example.com'::text)
  Rows Removed by Filter: 49962
Planning Time: 0.088 ms
Execution Time: 15.502 ms

After adding index:

CREATE INDEX idx_orders_customer_email ON orders(customer_email);

New execution plan:

Index Scan using idx_orders_customer_email on orders (cost=0.29..8.45 rows=38 width=120) (actual time=0.028..0.152 rows=38 loops=1)
  Index Cond: (customer_email = 'john@example.com'::text)
Planning Time: 0.092 ms
Execution Time: 0.183 ms

Performance improved from 15.5ms to 0.18ms!

Example 2: Optimizing Joins

Original query with nested loop join:

EXPLAIN ANALYZE
SELECT o.*, c.name 
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'completed';

Nested Loop (cost=0.29..5286.29 rows=5000 width=124) (actual time=0.056..220.391 rows=5000 loops=1)
  ->  Seq Scan on orders o (cost=0.00..180.00 rows=5000 width=84) (actual time=0.014..12.014 rows=5000 loops=1)
        Filter: (status = 'completed'::text)
  ->  Index Scan using customers_pkey on customers c (cost=0.29..1.01 rows=1 width=40) (actual time=0.039..0.041 rows=1 loops=5000)
        Index Cond: (id = o.customer_id)
Planning Time: 0.354 ms
Execution Time: 220.638 ms

After increasing work_mem:

SET work_mem = '16MB';

New execution plan:

Hash Join (cost=140.00..298.00 rows=5000 width=124) (actual time=1.464..8.834 rows=5000 loops=1)
  Hash Cond: (o.customer_id = c.id)
  ->  Seq Scan on orders o (cost=0.00..180.00 rows=5000 width=84) (actual time=0.010..1.980 rows=5000 loops=1)
        Filter: (status = 'completed'::text)
  ->  Hash (cost=77.00..77.00 rows=5000 width=40) (actual time=1.436..1.436 rows=5000 loops=1)
        Buckets: 8192  Batches: 1  Memory Usage: 421kB
        ->  Seq Scan on customers c (cost=0.00..77.00 rows=5000 width=40) (actual time=0.008..0.701 rows=5000 loops=1)
Planning Time: 0.321 ms
Execution Time: 9.586 ms

Performance improved from 220ms to 9.5ms!

Best Practices for Using EXPLAIN ANALYZE

  1. Test on representative data: Query plans can change dramatically based on table size and data distribution
  2. Test multiple times: The first execution may load data into cache, affecting subsequent runs
  3. Be careful in production: EXPLAIN ANALYZE actually executes the query, so avoid using it on:
    • INSERT, UPDATE, or DELETE statements
    • Very slow queries that might impact production
  4. Compare plans before and after changes: Keep records of execution plans before making changes
  5. Focus on the highest-cost operations first: Don’t optimize operations that already run quickly

Conclusion

Mastering EXPLAIN ANALYZE is an essential skill for database performance tuning. By understanding execution plans, you can identify bottlenecks, validate optimization efforts, and ensure your database operates at peak efficiency.

Remember that optimization is iterative:

  1. Identify slow queries
  2. Analyze execution plans
  3. Make targeted improvements
  4. Measure results
  5. Repeat

While EXPLAIN ANALYZE provides valuable insights, it should be paired with other PostgreSQL performance tools like pg_stat_statements, auto_explain, and proper monitoring to build a comprehensive performance optimization strategy.

What slow queries are you currently struggling with? Try using EXPLAIN ANALYZE to uncover optimization opportunities, and you might be surprised by how much performance you can unlock with the right approach.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

CAPTCHA ImageChange Image