PostgreSQL + Node.js: How We Achieved 50K Transactions per Second

Connection Pooling, Write-Ahead Log Tuning, and Avoiding N+1 Queries

When we started building our real-time analytics platform, performance was a critical requirement. Our customers needed to process tens of thousands of events per second without losing data or introducing latency. After months of optimization, we finally reached our goal of 50,000 transactions per second using PostgreSQL and Node.js. Here’s how we did it.

The Challenge

We had a seemingly simple requirement: store user events from multiple sources and make them queryable in real-time. But at scale, this became challenging:

  • Processing 50K+ events per second
  • Maintaining sub-100ms query response times
  • Ensuring data consistency and durability
  • Running on reasonable hardware (not an infinite budget)

Starting Point: The Naive Approach

Our initial implementation was straightforward:

// For each incoming event
app.post('/event', async (req, res) => {
  const client = await pool.connect();
  try {
    await client.query('BEGIN');
    await client.query('INSERT INTO events(data) VALUES($1)', [req.body]);
    await client.query('COMMIT');
    res.status(200).send('Event recorded');
  } catch (e) {
    await client.query('ROLLBACK');
    res.status(500).send('Error recording event');
  } finally {
    client.release();
  }
});

This approach quickly hit a wall at around 2,000 transactions per second. Connection overhead, resource contention, and other issues prevented us from scaling further.

Optimization #1: Connection Pooling Done Right

Our first major optimization was implementing proper connection pooling with pg-pool:

const { Pool } = require('pg');
const pool = new Pool({
  max: 100, // Carefully tuned to our CPU cores
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
});

// Reuse connections efficiently
app.post('/event', async (req, res) => {
  const client = await pool.connect();
  try {
    await client.query('INSERT INTO events(data) VALUES($1)', [req.body]);
    res.status(200).send('Event recorded');
  } catch (e) {
    console.error('Error recording event:', e);
    res.status(500).send('Error recording event');
  } finally {
    client.release();
  }
});

This change alone doubled our throughput to ~4,000 TPS. We found that our optimal pool size was around 100 connections for our 16-core server, which allowed for maximum concurrency without excessive context switching.

Optimization #2: Batch Processing

Next, we implemented batching to reduce the per-transaction overhead:

// Queue for batching events
let eventQueue = [];
const BATCH_SIZE = 1000;
const MAX_BATCH_WAIT_MS = 50;

// Queue the event
app.post('/event', (req, res) => {
  eventQueue.push(req.body);
  res.status(202).send('Event queued');
});

// Process queue in batches
async function processBatch() {
  if (eventQueue.length === 0) return;
  
  const batch = eventQueue.splice(0, BATCH_SIZE);
  const client = await pool.connect();
  
  try {
    await client.query('BEGIN');
    
    const query = 
      'INSERT INTO events(data) VALUES ' + 
      batch.map((_, i) => `($${i + 1})`).join(',');
    
    await client.query(query, batch);
    await client.query('COMMIT');
  } catch (e) {
    await client.query('ROLLBACK');
    console.error('Batch processing error:', e);
  } finally {
    client.release();
  }
}

// Process batches on interval
setInterval(processBatch, MAX_BATCH_WAIT_MS);

This approach raised our throughput to around 15,000 TPS, as we were now making far fewer round trips to the database.

Optimization #3: Write-Ahead Log Tuning

PostgreSQL’s Write-Ahead Log (WAL) ensures durability but can be a bottleneck. We tuned the following parameters:

# postgresql.conf
wal_level = replica
fsync = on
synchronous_commit = off
wal_writer_delay = 10ms
wal_buffers = 16MB
checkpoint_timeout = 15min
max_wal_size = 4GB

Setting synchronous_commit = off was particularly impactful, as it allowed PostgreSQL to acknowledge transactions before they were persisted to disk. This increased our throughput to ~25,000 TPS but introduced a small risk of data loss in case of server crashes. For our use case, this was an acceptable trade-off, as we had replication for disaster recovery.

Optimization #4: Partitioning for Query Performance

Query performance started to degrade as our event table grew. We implemented table partitioning by time:

CREATE TABLE events (
  id SERIAL,
  timestamp TIMESTAMPTZ DEFAULT NOW(),
  data JSONB
) PARTITION BY RANGE (timestamp);

-- Create daily partitions
CREATE TABLE events_y2023_m09_d01 PARTITION OF events
  FOR VALUES FROM ('2023-09-01') TO ('2023-09-02');

-- Create function to manage partitions automatically
CREATE OR REPLACE FUNCTION create_partition_if_not_exists() RETURNS trigger AS
$$
BEGIN
  -- Create partition for upcoming day if it doesn't exist
  -- Implementation details omitted for brevity
END;
$$
LANGUAGE plpgsql;

This change not only improved insert performance but dramatically reduced query times, as the database could skip scanning irrelevant partitions.

Optimization #5: Avoiding N+1 Queries

Our API was suffering from the classic N+1 query problem when fetching events with associated metadata. We refactored our queries to use JOINs or JSON aggregation:

// Before: N+1 problem
async function getEventsWithMetadata(userId) {
  const events = await pool.query('SELECT * FROM events WHERE user_id = $1', [userId]);
  
  // Problem: One query per event!
  for (let event of events.rows) {
    const metadata = await pool.query('SELECT * FROM metadata WHERE event_id = $1', [event.id]);
    event.metadata = metadata.rows;
  }
  
  return events.rows;
}

// After: Single query solution
async function getEventsWithMetadata(userId) {
  const result = await pool.query(`
    SELECT e.*, 
           json_agg(m.*) as metadata
    FROM events e
    LEFT JOIN metadata m ON e.id = m.event_id
    WHERE e.user_id = $1
    GROUP BY e.id
  `, [userId]);
  
  return result.rows;
}

This change had a dramatic impact on our API response times, reducing them from seconds to milliseconds for complex queries.

Optimization #6: Hardware and OS Tuning

Finally, we optimized our Linux server configuration:

# /etc/sysctl.conf
fs.file-max = 100000
net.core.somaxconn = 4096
net.ipv4.tcp_max_syn_backlog = 4096
net.core.netdev_max_backlog = 4096
net.ipv4.ip_local_port_range = 10000 65535

We also adjusted our PostgreSQL configuration to match our hardware:

# postgresql.conf
max_connections = 200
shared_buffers = 8GB              # 25% of RAM
effective_cache_size = 24GB       # 75% of RAM
maintenance_work_mem = 1GB
work_mem = 50MB

The Results

After all these optimizations, we finally achieved our target of 50,000 transactions per second on a single PostgreSQL instance with Node.js. The system remained stable under this load, with acceptable latency and reliability.

Here’s a summary of the impact of each optimization:

OptimizationTransactions Per Second
Initial Implementation~2,000 TPS
Connection Pooling~4,000 TPS
Batch Processing~15,000 TPS
WAL Tuning~25,000 TPS
Partitioning~35,000 TPS
Hardware/OS Tuning~50,000 TPS

Lessons Learned

  1. Connection management is critical: PostgreSQL has significant per-connection overhead. Keep connections alive and reuse them.
  2. Batching is your friend: Always batch multiple operations when possible to amortize network and transaction overhead.
  3. Understand the CAP theorem trade-offs: By adjusting synchronous_commit, we traded some durability for throughput, which made sense for our use case.
  4. Partitioning pays dividends: Table partitioning not only improves write performance but dramatically speeds up queries.
  5. Avoid N+1 queries at all costs: They are silent performance killers. Use JOINs, JSON aggregation, or dataloader patterns.
  6. Hardware matters: SSD storage, sufficient RAM, and proper OS tuning can make or break your performance.

Conclusion

Achieving 50,000 transactions per second with PostgreSQL and Node.js required a holistic approach to optimization. No single change gave us the performance we needed; instead, it was the cumulative effect of many optimizations working together.

While your specific requirements might differ, these principles should help you get the most out of PostgreSQL in high-throughput applications. Remember that premature optimization is the root of all evil – measure first, then optimize where it matters.

Would you like to learn more about specific aspects of our optimization journey? Let us know in the comments below!

Read More at Medium https://medium.com/@rizqimulkisrc/postgresql-node-js-how-we-achieved-50k-transactions-per-second-ebf11c1768e4


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

CAPTCHA ImageChange Image