When our user base doubled in six months, our once-speedy MySQL database began to show signs of strain. Reports that previously generated in seconds were taking minutes to complete, and our customer dashboard was becoming frustratingly slow. This post shares our journey of diagnosing performance bottlenecks and implementing targeted optimizations that ultimately reduced our query times by over 90%.
The Initial Problem
Our application, a B2B analytics platform used by enterprise clients, relies heavily on MySQL to store and process transaction data. As our data grew to over 50 million rows in the main transactions table, we started receiving complaints about slow dashboard loading times and report generation.
Initial measurements confirmed the severity of the problem:
Query Type | Before Optimization | SLA Target |
---|---|---|
Dashboard Summary | 8.2 seconds | < 1 second |
Monthly Report | 45+ seconds | < 5 seconds |
Search by Transaction ID | 3.7 seconds | < 0.5 seconds |
Our backend team was tasked with improving performance without making major architectural changes, as we needed a solution we could implement quickly.
Diagnosing the Issues
Step 1: Query Profiling
We began by identifying the slowest queries in our production environment using MySQL’s performance schema:
SELECT
format_pico_time(total_latency) as total_latency,
format_pico_time(avg_latency) as avg_latency,
count_star as exec_count,
sql_text
FROM performance_schema.events_statements_summary_by_digest
ORDER BY avg_latency DESC
LIMIT 10;
This revealed that our dashboard query was particularly problematic:
SELECT
c.customer_name,
SUM(t.amount) as total_amount,
COUNT(t.id) as transaction_count,
AVG(t.amount) as average_transaction
FROM transactions t
JOIN customers c ON t.customer_id = c.id
WHERE t.transaction_date BETWEEN '2024-01-01' AND '2024-03-31'
GROUP BY c.customer_name
ORDER BY total_amount DESC;
Step 2: EXPLAIN Analysis
We used MySQL’s EXPLAIN command to understand how the query was being executed:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t
partitions: NULL
type: ALL
possible_keys: customer_id_idx,transaction_date_idx
key: NULL
key_len: NULL
ref: NULL
rows: 32145678
filtered: 11.11
Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: c
partitions: NULL
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: myapp.t.customer_id
rows: 1
filtered: 100.00
Extra: NULL
This revealed several issues:
- The query was performing a full table scan on the transactions table
- It wasn’t using any of the available indexes effectively
- It was using a temporary table and filesort operation for grouping and sorting
- It was processing over 32 million rows
Step 3: Database Structure Analysis
We also examined our table structures and noticed suboptimal indexing strategies:
SHOW CREATE TABLE transactions;
CREATE TABLE `transactions` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`customer_id` int(11) NOT NULL,
`amount` decimal(10,2) NOT NULL,
`transaction_date` date NOT NULL,
`payment_method` varchar(50) DEFAULT NULL,
`status` varchar(20) NOT NULL,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `customer_id_idx` (`customer_id`),
KEY `transaction_date_idx` (`transaction_date`),
KEY `status_idx` (`status`)
) ENGINE=InnoDB;
Individual indexes on customer_id
and transaction_date
weren’t helping for queries that filtered on both columns.
Our Optimization Strategy
Based on our findings, we developed a comprehensive optimization strategy:
1. Index Optimization
We created composite indexes that matched our most common query patterns:
-- Add composite index for customer + date range queries
ALTER TABLE transactions
ADD INDEX customer_date_idx (customer_id, transaction_date);
-- Add index for amount-based sorting
ALTER TABLE transactions
ADD INDEX amount_idx (amount);
2. Query Rewriting
We rewrote our problematic dashboard query:
-- Before optimization
SELECT
c.customer_name,
SUM(t.amount) as total_amount,
COUNT(t.id) as transaction_count,
AVG(t.amount) as average_transaction
FROM transactions t
JOIN customers c ON t.customer_id = c.id
WHERE t.transaction_date BETWEEN '2024-01-01' AND '2024-03-31'
GROUP BY c.customer_name
ORDER BY total_amount DESC;
-- After optimization
SELECT
c.customer_name,
SUM(t.amount) as total_amount,
COUNT(t.id) as transaction_count,
AVG(t.amount) as average_transaction
FROM transactions t
FORCE INDEX (customer_date_idx)
JOIN customers c ON t.customer_id = c.id
WHERE t.transaction_date BETWEEN '2024-01-01' AND '2024-03-31'
GROUP BY c.id, c.customer_name
ORDER BY total_amount DESC
LIMIT 100;
Key changes:
- Added FORCE INDEX to ensure the optimizer uses our new composite index
- Modified GROUP BY to include the primary key
- Added LIMIT clause to restrict results to only what’s needed
- Ensured consistent ordering of columns in WHERE clause and index definition
3. Table Partitioning
For the transactions table, we implemented partitioning by date range to improve query performance on historical data:
-- Restructure the table with partitioning
ALTER TABLE transactions
PARTITION BY RANGE (YEAR(transaction_date) * 100 + MONTH(transaction_date)) (
PARTITION p_2023_01 VALUES LESS THAN (202302),
PARTITION p_2023_02 VALUES LESS THAN (202303),
-- Additional partitions for each month
PARTITION p_2024_03 VALUES LESS THAN (202404),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
This allowed MySQL to scan only relevant partitions for date-range queries.
4. Database Configuration Tuning
We optimized the MySQL configuration based on our workload:
# Increased InnoDB buffer pool for better caching
innodb_buffer_pool_size = 12G
# Improved sorting performance
sort_buffer_size = 8M
read_rnd_buffer_size = 8M
# Increased temp table size in memory to reduce disk writes
tmp_table_size = 64M
max_heap_table_size = 64M
# Improved query cache settings
query_cache_type = 1
query_cache_size = 128M
5. Implementing Materialized Views
For complex reporting queries that were still slow, we created materialized views (implemented as tables that are periodically refreshed):
-- Create a summary table for dashboard data
CREATE TABLE dashboard_summary (
customer_id INT NOT NULL,
customer_name VARCHAR(255) NOT NULL,
report_date DATE NOT NULL,
total_amount DECIMAL(15,2) NOT NULL,
transaction_count INT NOT NULL,
average_transaction DECIMAL(10,2) NOT NULL,
PRIMARY KEY (customer_id, report_date)
);
-- Procedure to refresh the summary
DELIMITER $$
CREATE PROCEDURE refresh_dashboard_summary(in_date DATE)
BEGIN
DELETE FROM dashboard_summary WHERE report_date = in_date;
INSERT INTO dashboard_summary
SELECT
c.id as customer_id,
c.customer_name,
DATE(in_date) as report_date,
SUM(t.amount) as total_amount,
COUNT(t.id) as transaction_count,
AVG(t.amount) as average_transaction
FROM transactions t
JOIN customers c ON t.customer_id = c.id
WHERE DATE(t.transaction_date) = in_date
GROUP BY c.id, c.customer_name;
END$$
DELIMITER ;
-- Create event to refresh daily
CREATE EVENT refresh_dashboard_summary_daily
ON SCHEDULE EVERY 1 DAY
STARTS CURRENT_DATE + INTERVAL 1 HOUR
DO
CALL refresh_dashboard_summary(CURRENT_DATE - INTERVAL 1 DAY);
Results: 90%+ Reduction in Query Time
After implementing these optimizations, we saw dramatic improvements in performance:
Query Type | Before | After | Improvement |
---|---|---|---|
Dashboard Summary | 8.2s | 0.3s | 96.3% reduction |
Monthly Report | 45s | 3.8s | 91.6% reduction |
Search by Transaction ID | 3.7s | 0.08s | 97.8% reduction |
The EXPLAIN output for our optimized dashboard query showed much more efficient execution:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t
partitions: p_2024_01,p_2024_02,p_2024_03
type: range
possible_keys: customer_date_idx
key: customer_date_idx
key_len: 8
ref: NULL
rows: 1842
filtered: 100.00
Extra: Using index condition
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: c
partitions: NULL
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: myapp.t.customer_id
rows: 1
filtered: 100.00
Extra: NULL
Key improvements:
- The query now scans only relevant partitions
- It uses the composite index effectively
- Row examination reduced from 32 million to under 2,000
- No more temporary tables or filesort operations
Lessons Learned
1. Index Strategically, Not Just More
Rather than adding numerous single-column indexes, focus on creating composite indexes that match your query patterns. Pay attention to column order in the index definition.
2. Always EXPLAIN Your Queries
Make EXPLAIN a standard part of your development process. It reveals how MySQL is executing your queries and helps identify inefficiencies.
3. Use the Database Engine’s Features
Features like partitioning and materialized views are powerful tools for improving performance. Don’t be afraid to use advanced MySQL features when appropriate.
4. Monitor and Iterate
Set up continuous monitoring of query performance using tools like MySQL Performance Schema or Prometheus with mysqld_exporter. This allows you to identify regressions quickly.
5. Consider Read/Write Patterns
For our reporting use case, it made sense to optimize for read performance even if it meant slightly slower writes. Understanding your application’s read/write patterns is crucial for effective optimization.
Conclusion
By methodically diagnosing and addressing our MySQL performance issues, we achieved our goal of 90%+ reduction in query times without major architectural changes. The key was addressing multiple aspects of database performance: indexing strategy, query structure, database configuration, and leveraging MySQL features like partitioning.
For other teams facing similar challenges, we recommend starting with thorough diagnostics using EXPLAIN and Performance Schema before making changes. Often, the most significant improvements come from targeted optimizations rather than hardware upgrades or complete rewrites.
Our next step is implementing a horizontal scaling strategy with read replicas to further improve performance as we continue to grow, but these optimizations have bought us significant time and dramatically improved our user experience in the meantime.
Leave a Reply