How We Reduced Query Time by 90% in Our MySQL-Powered App

When our user base doubled in six months, our once-speedy MySQL database began to show signs of strain. Reports that previously generated in seconds were taking minutes to complete, and our customer dashboard was becoming frustratingly slow. This post shares our journey of diagnosing performance bottlenecks and implementing targeted optimizations that ultimately reduced our query times by over 90%.

The Initial Problem

Our application, a B2B analytics platform used by enterprise clients, relies heavily on MySQL to store and process transaction data. As our data grew to over 50 million rows in the main transactions table, we started receiving complaints about slow dashboard loading times and report generation.

Initial measurements confirmed the severity of the problem:

Query TypeBefore OptimizationSLA Target
Dashboard Summary8.2 seconds< 1 second
Monthly Report45+ seconds< 5 seconds
Search by Transaction ID3.7 seconds< 0.5 seconds

Our backend team was tasked with improving performance without making major architectural changes, as we needed a solution we could implement quickly.

Diagnosing the Issues

Step 1: Query Profiling

We began by identifying the slowest queries in our production environment using MySQL’s performance schema:

SELECT 
  format_pico_time(total_latency) as total_latency,
  format_pico_time(avg_latency) as avg_latency,
  count_star as exec_count,
  sql_text
FROM performance_schema.events_statements_summary_by_digest
ORDER BY avg_latency DESC
LIMIT 10;

This revealed that our dashboard query was particularly problematic:

SELECT 
  c.customer_name,
  SUM(t.amount) as total_amount,
  COUNT(t.id) as transaction_count,
  AVG(t.amount) as average_transaction
FROM transactions t
JOIN customers c ON t.customer_id = c.id
WHERE t.transaction_date BETWEEN '2024-01-01' AND '2024-03-31'
GROUP BY c.customer_name
ORDER BY total_amount DESC;

Step 2: EXPLAIN Analysis

We used MySQL’s EXPLAIN command to understand how the query was being executed:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
   partitions: NULL
         type: ALL
possible_keys: customer_id_idx,transaction_date_idx
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 32145678
     filtered: 11.11
        Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: c
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: myapp.t.customer_id
         rows: 1
     filtered: 100.00
        Extra: NULL

This revealed several issues:

  1. The query was performing a full table scan on the transactions table
  2. It wasn’t using any of the available indexes effectively
  3. It was using a temporary table and filesort operation for grouping and sorting
  4. It was processing over 32 million rows

Step 3: Database Structure Analysis

We also examined our table structures and noticed suboptimal indexing strategies:

SHOW CREATE TABLE transactions;
CREATE TABLE `transactions` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `customer_id` int(11) NOT NULL,
  `amount` decimal(10,2) NOT NULL,
  `transaction_date` date NOT NULL,
  `payment_method` varchar(50) DEFAULT NULL,
  `status` varchar(20) NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `customer_id_idx` (`customer_id`),
  KEY `transaction_date_idx` (`transaction_date`),
  KEY `status_idx` (`status`)
) ENGINE=InnoDB;

Individual indexes on customer_id and transaction_date weren’t helping for queries that filtered on both columns.

Our Optimization Strategy

Based on our findings, we developed a comprehensive optimization strategy:

1. Index Optimization

We created composite indexes that matched our most common query patterns:

-- Add composite index for customer + date range queries
ALTER TABLE transactions 
ADD INDEX customer_date_idx (customer_id, transaction_date);

-- Add index for amount-based sorting
ALTER TABLE transactions 
ADD INDEX amount_idx (amount);

2. Query Rewriting

We rewrote our problematic dashboard query:

-- Before optimization
SELECT 
  c.customer_name,
  SUM(t.amount) as total_amount,
  COUNT(t.id) as transaction_count,
  AVG(t.amount) as average_transaction
FROM transactions t
JOIN customers c ON t.customer_id = c.id
WHERE t.transaction_date BETWEEN '2024-01-01' AND '2024-03-31'
GROUP BY c.customer_name
ORDER BY total_amount DESC;

-- After optimization
SELECT 
  c.customer_name,
  SUM(t.amount) as total_amount,
  COUNT(t.id) as transaction_count,
  AVG(t.amount) as average_transaction
FROM transactions t
FORCE INDEX (customer_date_idx)
JOIN customers c ON t.customer_id = c.id
WHERE t.transaction_date BETWEEN '2024-01-01' AND '2024-03-31'
GROUP BY c.id, c.customer_name
ORDER BY total_amount DESC
LIMIT 100;

Key changes:

  • Added FORCE INDEX to ensure the optimizer uses our new composite index
  • Modified GROUP BY to include the primary key
  • Added LIMIT clause to restrict results to only what’s needed
  • Ensured consistent ordering of columns in WHERE clause and index definition

3. Table Partitioning

For the transactions table, we implemented partitioning by date range to improve query performance on historical data:

-- Restructure the table with partitioning
ALTER TABLE transactions
PARTITION BY RANGE (YEAR(transaction_date) * 100 + MONTH(transaction_date)) (
    PARTITION p_2023_01 VALUES LESS THAN (202302),
    PARTITION p_2023_02 VALUES LESS THAN (202303),
    -- Additional partitions for each month
    PARTITION p_2024_03 VALUES LESS THAN (202404),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

This allowed MySQL to scan only relevant partitions for date-range queries.

4. Database Configuration Tuning

We optimized the MySQL configuration based on our workload:

# Increased InnoDB buffer pool for better caching
innodb_buffer_pool_size = 12G

# Improved sorting performance
sort_buffer_size = 8M
read_rnd_buffer_size = 8M

# Increased temp table size in memory to reduce disk writes
tmp_table_size = 64M
max_heap_table_size = 64M

# Improved query cache settings
query_cache_type = 1
query_cache_size = 128M

5. Implementing Materialized Views

For complex reporting queries that were still slow, we created materialized views (implemented as tables that are periodically refreshed):

-- Create a summary table for dashboard data
CREATE TABLE dashboard_summary (
  customer_id INT NOT NULL,
  customer_name VARCHAR(255) NOT NULL,
  report_date DATE NOT NULL,
  total_amount DECIMAL(15,2) NOT NULL,
  transaction_count INT NOT NULL,
  average_transaction DECIMAL(10,2) NOT NULL,
  PRIMARY KEY (customer_id, report_date)
);

-- Procedure to refresh the summary
DELIMITER $$
CREATE PROCEDURE refresh_dashboard_summary(in_date DATE)
BEGIN
  DELETE FROM dashboard_summary WHERE report_date = in_date;
  
  INSERT INTO dashboard_summary
  SELECT 
    c.id as customer_id,
    c.customer_name,
    DATE(in_date) as report_date,
    SUM(t.amount) as total_amount,
    COUNT(t.id) as transaction_count,
    AVG(t.amount) as average_transaction
  FROM transactions t
  JOIN customers c ON t.customer_id = c.id
  WHERE DATE(t.transaction_date) = in_date
  GROUP BY c.id, c.customer_name;
END$$
DELIMITER ;

-- Create event to refresh daily
CREATE EVENT refresh_dashboard_summary_daily
ON SCHEDULE EVERY 1 DAY
STARTS CURRENT_DATE + INTERVAL 1 HOUR
DO
  CALL refresh_dashboard_summary(CURRENT_DATE - INTERVAL 1 DAY);

Results: 90%+ Reduction in Query Time

After implementing these optimizations, we saw dramatic improvements in performance:

Query TypeBeforeAfterImprovement
Dashboard Summary8.2s0.3s96.3% reduction
Monthly Report45s3.8s91.6% reduction
Search by Transaction ID3.7s0.08s97.8% reduction

The EXPLAIN output for our optimized dashboard query showed much more efficient execution:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
   partitions: p_2024_01,p_2024_02,p_2024_03
         type: range
possible_keys: customer_date_idx
          key: customer_date_idx
      key_len: 8
          ref: NULL
         rows: 1842
     filtered: 100.00
        Extra: Using index condition
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: c
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: myapp.t.customer_id
         rows: 1
     filtered: 100.00
        Extra: NULL

Key improvements:

  • The query now scans only relevant partitions
  • It uses the composite index effectively
  • Row examination reduced from 32 million to under 2,000
  • No more temporary tables or filesort operations

Lessons Learned

1. Index Strategically, Not Just More

Rather than adding numerous single-column indexes, focus on creating composite indexes that match your query patterns. Pay attention to column order in the index definition.

2. Always EXPLAIN Your Queries

Make EXPLAIN a standard part of your development process. It reveals how MySQL is executing your queries and helps identify inefficiencies.

3. Use the Database Engine’s Features

Features like partitioning and materialized views are powerful tools for improving performance. Don’t be afraid to use advanced MySQL features when appropriate.

4. Monitor and Iterate

Set up continuous monitoring of query performance using tools like MySQL Performance Schema or Prometheus with mysqld_exporter. This allows you to identify regressions quickly.

5. Consider Read/Write Patterns

For our reporting use case, it made sense to optimize for read performance even if it meant slightly slower writes. Understanding your application’s read/write patterns is crucial for effective optimization.

Conclusion

By methodically diagnosing and addressing our MySQL performance issues, we achieved our goal of 90%+ reduction in query times without major architectural changes. The key was addressing multiple aspects of database performance: indexing strategy, query structure, database configuration, and leveraging MySQL features like partitioning.

For other teams facing similar challenges, we recommend starting with thorough diagnostics using EXPLAIN and Performance Schema before making changes. Often, the most significant improvements come from targeted optimizations rather than hardware upgrades or complete rewrites.

Our next step is implementing a horizontal scaling strategy with read replicas to further improve performance as we continue to grow, but these optimizations have bought us significant time and dramatically improved our user experience in the meantime.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

CAPTCHA ImageChange Image