Case Study: Handling 1M+ Transactions per Second with MariaDB

Apr 25, 2025

—

Introduction

Scaling a database system to handle more than one million transactions per second (TPS) presents significant challenges even for the most seasoned database architects. This case study examines how our team at GlobalTech Financial Services transformed our MariaDB implementation from a standard setup struggling with 10,000 TPS to a high-performance system capable of processing over 1 million transactions per second.

Background

The Business Challenge

GlobalTech Financial Services operates a payment processing platform that experienced exponential growth over 18 months, with our customer base expanding from regional markets to global operations. This growth brought a dramatic increase in transaction volume, threatening to overwhelm our existing database infrastructure.

Initial System Metrics:

Average throughput: ~8,000 TPS
Peak loads: ~12,000 TPS
99th percentile response time: 850ms (well above our 200ms SLA)
Database size: 4TB
Hardware: 8-node cluster with 16 cores and 128GB RAM per node

The Technical Challenge

Our analysis identified several bottlenecks:

Connection management: High connection overhead with thousands of ephemeral connections
Query efficiency: Suboptimal query patterns causing excessive disk I/O
Schema design: Normalized schema requiring multiple joins for common operations
Resource contention: Locks and waits causing processing delays
Hardware limitations: Storage I/O becoming a significant bottleneck

The Solution Architecture

Phase 1: Foundation Optimization

Before making radical changes, we optimized the existing setup:

Connection Pooling Implementation

We implemented ProxySQL as a connection pooling layer:

# Sample ProxySQL configuration excerpt
mysql_servers =
(
    {
        address="mariadb-node1.internal"
        port=3306
        hostgroup=0
        max_connections=2000
        max_replication_lag=5
    },
    # Additional nodes configuration...
)

mysql_users =
(
    {
        username="app_user"
        password="****"
        default_hostgroup=0
        max_connections=1000
        transaction_persistent=1
    }
)

This reduced connection overhead dramatically, allowing each application server to maintain persistent connections to the database.

Query Optimization

We analyzed and optimized the top 20 most resource-intensive queries:

-- Before optimization
SELECT c.customer_id, c.name, c.email, 
       o.order_id, o.amount, o.status, 
       p.payment_method, p.transaction_id
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN payments p ON o.order_id = p.order_id
WHERE o.status = 'PROCESSING'
ORDER BY o.created_at DESC;

-- After optimization
SELECT c.customer_id, c.name, c.email,
       o.order_id, o.amount, o.status,
       p.payment_method, p.transaction_id
FROM orders o
FORCE INDEX (idx_status_created)
JOIN customers c ON o.customer_id = c.customer_id
JOIN payments p ON o.order_id = p.order_id
WHERE o.status = 'PROCESSING'
ORDER BY o.created_at DESC
LIMIT 1000;

Key optimizations included:

Changing join order to start with the most filtered table
Adding composite indexes to support common query patterns
Using FORCE INDEX where necessary
Implementing LIMIT clauses to prevent runaway result sets

Initial Schema Optimization

We implemented strategic denormalization for hot tables:

-- Adding frequently accessed columns directly to orders table
ALTER TABLE orders
ADD COLUMN customer_name VARCHAR(255) AFTER customer_id,
ADD COLUMN customer_email VARCHAR(255) AFTER customer_name;

-- Creating triggers to maintain data consistency
DELIMITER //
CREATE TRIGGER after_customer_update
AFTER UPDATE ON customers
FOR EACH ROW
BEGIN
    IF OLD.name != NEW.name OR OLD.email != NEW.email THEN
        UPDATE orders
        SET customer_name = NEW.name, customer_email = NEW.email
        WHERE customer_id = NEW.customer_id;
    END IF;
END//
DELIMITER ;

Phase 2: Horizontal Scaling Implementation

With the foundation optimized, we moved to a horizontal scaling strategy:

Sharding Strategy

We implemented a hybrid sharding approach:

Customer-based sharding for account data
Transaction ID-based sharding for transactional data

Our sharding function used consistent hashing:

function getShardId($key, $shardCount = 128) {
    $crc = crc32($key);
    $shardId = abs($crc % $shardCount);
    return $shardId;
}

Schema for Sharded Environment

We created a shard management database and modified application logic to route queries:

CREATE TABLE shard_map (
    shard_id INT NOT NULL,
    db_host VARCHAR(255) NOT NULL,
    db_port INT NOT NULL DEFAULT 3306,
    db_name VARCHAR(64) NOT NULL,
    shard_status ENUM('ACTIVE','MAINTENANCE','REBALANCING','INACTIVE') NOT NULL,
    weight INT NOT NULL DEFAULT 100,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    PRIMARY KEY (shard_id)
);

Data Distribution and Rebalancing

For initial data distribution, we developed a custom migration tool that:

Created shard schemas on target servers
Transferred data in batches based on sharding keys
Verified data integrity post-migration
Switched traffic gradually to the new sharded system

Phase 3: Advanced Optimizations

With our sharded architecture in place, we implemented advanced optimizations:

Columnar Storage for Analytics

For reporting workloads, we implemented ColumnStore engine:

CREATE TABLE transaction_analytics (
    transaction_id BIGINT NOT NULL,
    customer_id BIGINT NOT NULL,
    amount DECIMAL(20,2) NOT NULL,
    currency VARCHAR(3) NOT NULL,
    transaction_type VARCHAR(20) NOT NULL,
    status VARCHAR(15) NOT NULL,
    processing_time_ms INT NOT NULL,
    transaction_date DATE NOT NULL,
    created_at TIMESTAMP NOT NULL
) ENGINE=ColumnStore;

Memory-Optimized Tables

For high-velocity data, we used memory tables with automated archiving:

CREATE TABLE active_sessions (
    session_id VARCHAR(64) NOT NULL,
    user_id BIGINT NOT NULL,
    start_time TIMESTAMP NOT NULL,
    last_activity TIMESTAMP NOT NULL,
    session_data JSON,
    PRIMARY KEY (session_id),
    KEY (user_id),
    KEY (last_activity)
) ENGINE=MEMORY;

Custom MariaDB Server Configuration

We fine-tuned MariaDB configuration for high throughput:

# Buffer settings
innodb_buffer_pool_size = 96G
innodb_log_buffer_size = 256M

# I/O optimization
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_io_capacity = 20000
innodb_io_capacity_max = 40000

# Concurrency settings
innodb_thread_concurrency = 0
innodb_read_io_threads = 16
innodb_write_io_threads = 16
max_connections = 5000

# Transaction isolation
transaction_isolation = READ-COMMITTED

# Table settings
table_open_cache = 16384
table_definition_cache = 8192

# Thread pooling
thread_handling = pool-of-threads
thread_pool_size = 32

Hardware Optimization

We moved to an NVMe-based storage infrastructure:

NVMe SSDs in RAID 10 configuration
Separate volumes for data, logs, and temporary files
Direct I/O enabled to bypass OS caching

Implementation Results

Performance Metrics After Optimization

After completing all three phases, our system achieved:

Sustained throughput: 1.2M TPS
Peak capacity: 1.8M TPS
99th percentile response time: 45ms
Database size: 18TB distributed across shards
Hardware: 32-node cluster with 32 cores and 256GB RAM per node

Scalability Tests

We conducted stress tests to verify scalability:

Concurrent Users	Transactions per Second	Avg. Response Time	99th Percentile
100,000	350,000	18ms	35ms
250,000	820,000	22ms	42ms
500,000	1,250,000	28ms	45ms
750,000	1,650,000	35ms	52ms
1,000,000	1,820,000	38ms	65ms

Cost-Benefit Analysis

The optimization project required:

8 months of engineering effort
$1.2M in hardware investments
20% increase in monthly infrastructure costs

Benefits realized:

98% reduction in response time
150x increase in throughput capacity
Supported 10x customer growth without performance degradation
Eliminated performance-related customer complaints
Extended hardware lifecycle by 2 years due to more efficient resource utilization

Key Lessons Learned

Technical Insights

Connection management matters: Connection pooling provided immediate relief without schema changes.
Sharding complexity: Choosing the right sharding key is critical; our initial attempt using transaction date caused hot spots and required reworking.
Query patterns dictate indexes: Understanding query patterns is more important than generalized indexing strategies.
Hardware isn’t always the answer: Software optimizations provided 70% of our gains; hardware accelerated already-optimized software.
Monitoring is essential: Comprehensive monitoring allowed us to identify issues before they impacted customers:

-- Example of monitoring query we run every minute
SELECT 
    SUBSTRING_INDEX(SUBSTRING_INDEX(host, ':', 1), '[', 1) AS client_ip,
    COUNT(*) AS connection_count,
    SUM(IF(command != 'Sleep', 1, 0)) AS active_count,
    AVG(IF(command != 'Sleep', time, NULL)) AS avg_active_time
FROM information_schema.processlist
GROUP BY client_ip
ORDER BY active_count DESC;

Organizational Learnings

DevOps collaboration: Cross-functional teams between database administrators and application developers led to better solutions that addressed both database and application issues.
Incremental migration: Our phased approach allowed business continuity while improving performance.
Documentation matters: Detailed documentation of database design decisions streamlined onboarding and troubleshooting.

Conclusion

Scaling MariaDB to handle 1M+ transactions per second required a multifaceted approach combining:

Foundation optimization of the existing system
Horizontal scaling through strategic sharding
Advanced optimizations at both hardware and software levels
Continuous monitoring and performance tuning

The key to success was not just technical implementation but the methodical, phased approach that validated improvements at each step before moving forward. While not every organization requires this level of throughput, the principles and patterns we applied are valuable for any team looking to scale their database infrastructure efficiently.

Our journey demonstrates that MariaDB can indeed handle extreme transaction volumes when properly architected and optimized, making it a viable alternative to more complex and costly database solutions for high-throughput scenarios.