From Monolith to Microservices: How We Split a 5-Year-Old Node.js App Without Losing Sleep

Apr 19, 2025

—

Domain-Driven Design, Contract Testing, and the Strangler Pattern

When our startup hit hypergrowth, our once-elegant monolithic Node.js application started showing signs of strain. The codebase that had served us well for five years was becoming increasingly difficult to maintain, scale, and deploy. Feature development slowed to a crawl, and each deployment became a nerve-wracking event.

Sound familiar? If you’re facing similar challenges with your monolith, this post outlines our journey of breaking down a complex Node.js application into microservices—without disrupting our business or losing sleep over production incidents.

The Breaking Point

Our monolith started as a simple Express application handling a few core features. Over five years, it had grown to encompass:

User authentication and management
Product catalog and inventory
Order processing and payments
Analytics and reporting
Notification systems
Integration with 15+ third-party services

With a team that had grown from 3 to 30 developers, the monolith became a bottleneck:

Deployment nightmares: A small change to the notification system required testing and deploying the entire application
Scaling issues: The order processing system needed more resources during sales events, but we couldn’t scale it independently
Development conflicts: Multiple teams working in the same codebase led to frequent merge conflicts
Onboarding challenges: New developers needed weeks to understand the entire system before making contributions

After a particularly painful Black Friday where the system struggled under load despite our best preparations, we knew it was time for a change.

Our Approach: The Three Pillars

Rather than going for a “big bang” rewrite (which almost always ends in disaster), we adopted a gradual approach based on three pillars:

Domain-Driven Design to identify service boundaries
Contract Testing to ensure services could evolve independently
The Strangler Pattern to gradually migrate functionality

Let’s dive into each pillar and see how it contributed to our successful migration.

Pillar 1: Domain-Driven Design for Service Identification

The first challenge was deciding how to split the monolith. Random cutting lines would only create distributed chaos instead of a distributed system. Domain-Driven Design (DDD) provided the framework we needed.

Step 1: Event Storming to Map Business Domains

We started with an event storming workshop involving developers, product managers, and domain experts. Using colored sticky notes on a wall (virtual in our remote environment), we mapped out:

Domain Events: Things that happen in the business (OrderPlaced, PaymentReceived, etc.)
Commands: User intentions that trigger events (PlaceOrder, CancelOrder)
Aggregates: Clusters of domain objects that change together
Bounded Contexts: Logical boundaries where certain terms and rules apply

This exercise revealed seven distinct bounded contexts:

Identity and Access: User accounts, authentication, permissions
Product Catalog: Products, categories, pricing
Inventory: Stock levels, reservations, suppliers
Order Management: Orders, returns, fulfillment
Payments: Payment processing, refunds, invoicing
Notifications: Email, SMS, push notifications
Analytics: Reporting, dashboards, business intelligence

Step 2: Identifying Aggregates and Their Relationships

Within each bounded context, we identified key aggregates (clusters of entities that should be modified together) and their relationships:

// Example: Order aggregate in the Order Management context
class Order {
  constructor(customerId, items, shippingAddress, billingAddress) {
    this.id = uuid();
    this.customerId = customerId;
    this.items = items;
    this.shippingAddress = shippingAddress;
    this.billingAddress = billingAddress;
    this.status = 'created';
    this.createdAt = new Date();
    this.events = [];
    
    // Register domain event
    this.events.push(new OrderCreatedEvent(this));
  }
  
  confirm() {
    if (this.status !== 'created') {
      throw new Error('Order can only be confirmed when in created status');
    }
    
    this.status = 'confirmed';
    this.confirmedAt = new Date();
    
    // Register domain event
    this.events.push(new OrderConfirmedEvent(this));
  }
  
  // Other methods...
}

Step 3: Context Mapping to Define Service Interactions

We created a context map showing how bounded contexts interact. This helped us understand dependencies and potential integration points:

Partnership: Inventory and Product Catalog (close collaboration needed)
Customer-Supplier: Order Management depends on Identity for user information
Conformist: Analytics consumes events from all other contexts
Anti-corruption Layer: Order Management to legacy payment processor

This map became our blueprint for service boundaries and interaction patterns.

Pillar 2: Contract Testing for Safe Evolution

With service boundaries defined, we needed to ensure they could evolve independently without breaking each other. Enter contract testing.

Step 1: Define Service Contracts

For each service-to-service interaction, we defined explicit contracts covering:

API Endpoints: Routes, methods, headers
Request/Response Formats: Required fields, data types, validations
Error Handling: Status codes, error formats
Event Schemas: For asynchronous communication

We used OpenAPI (Swagger) for REST APIs and AsyncAPI for event-driven interactions.

Step 2: Implement Consumer-Driven Contract Tests

Instead of relying on traditional end-to-end tests (which are slow and brittle), we implemented consumer-driven contract tests using Pact.js:

// Consumer side: Order service testing its interaction with the Payment service
describe('Order Service - Payment Service Integration', () => {
  const pact = new Pact({
    consumer: 'OrderService',
    provider: 'PaymentService',
    port: 8888,
    log: path.resolve(process.cwd(), 'logs', 'pact.log')
  });

  before(() => pact.setup());
  after(() => pact.finalize());

  describe('Process Payment', () => {
    before(() => {
      return pact.addInteraction({
        state: 'Payment processor is available',
        uponReceiving: 'a request to process payment',
        withRequest: {
          method: 'POST',
          path: '/api/payments',
          headers: { 'Content-Type': 'application/json' },
          body: {
            orderId: like('order-123'),
            amount: like(99.99),
            currency: like('USD'),
            paymentMethod: {
              type: like('creditCard'),
              lastFour: like('4242'),
              expiryMonth: like(12),
              expiryYear: like(2025)
            }
          }
        },
        willRespondWith: {
          status: 201,
          headers: { 'Content-Type': 'application/json' },
          body: {
            id: like('payment-456'),
            status: like('processing'),
            transactionId: like('tx-789')
          }
        }
      });
    });

    it('processes a payment successfully', async () => {
      const paymentService = new PaymentService('http://localhost:8888');
      const result = await paymentService.processPayment({
        orderId: 'order-123',
        amount: 99.99,
        currency: 'USD',
        paymentMethod: {
          type: 'creditCard',
          lastFour: '4242',
          expiryMonth: 12,
          expiryYear: 2025
        }
      });
      
      expect(result).to.have.property('id');
      expect(result).to.have.property('status');
      expect(result).to.have.property('transactionId');
    });
  });
});

Step 3: Integrate Contract Tests into CI/CD

We integrated contract tests into our CI/CD pipeline:

Consumer services pushed contracts to a Pact Broker
Provider services verified they could fulfill these contracts
Any breaking changes would fail the build before deployment

This approach allowed services to evolve independently as long as they maintained their contracts.

Pillar 3: The Strangler Pattern for Gradual Migration

With service boundaries defined and contract testing in place, we were ready to start the actual migration. Rather than a risky big-bang approach, we adopted the Strangler Pattern:

Create new services alongside the monolith
Gradually redirect traffic from the monolith to new services
Eventually decommission the monolith code

Step 1: Implement an API Gateway

We introduced an API Gateway (using Kong) in front of our monolith:

// Kong configuration example
{
  "services": [
    {
      "name": "monolith-service",
      "url": "http://monolith-app:3000"
    },
    {
      "name": "identity-service",
      "url": "http://identity-service:3000"
    }
  ],
  "routes": [
    {
      "name": "all-traffic-to-monolith",
      "service": "monolith-service",
      "paths": ["/"]
    }
  ]
}

This gateway gave us the flexibility to route traffic without changing client applications.

Step 2: Extract Services One by One

We started with the simplest bounded context: Notifications. We:

Created a new microservice with the same functionality
Set up a database for the service
Migrated necessary data
Updated the API gateway to route notification requests to the new service

// Kong configuration update
{
  "routes": [
    {
      "name": "notifications-traffic",
      "service": "notification-service",
      "paths": ["/api/notifications", "/api/v1/notifications"]
    },
    {
      "name": "remaining-traffic-to-monolith",
      "service": "monolith-service",
      "paths": ["/"]
    }
  ]
}

Step 3: Use Feature Flags for Safe Transitions

For each migration, we implemented feature flags to enable gradual rollout and quick rollback:

// Using LaunchDarkly for feature flags
const ldClient = LaunchDarkly.init('YOUR_SDK_KEY');

app.post('/api/notifications', async (req, res) => {
  const user = req.user;
  
  // Check if this user should use the new service
  const useNewService = await ldClient.variation(
    'use-notification-service', 
    { key: user.id, email: user.email }, 
    false
  );
  
  if (useNewService) {
    // Forward to new service
    const response = await notificationServiceClient.sendNotification(req.body);
    return res.status(response.status).json(response.data);
  } else {
    // Use legacy implementation
    const notification = await legacyNotificationService.send(req.body);
    return res.status(201).json(notification);
  }
});

This allowed us to:

Start with 1% of traffic going to the new service
Gradually increase as we gained confidence
Quickly roll back if issues were detected

Step 4: Implement Event-Based Communication

For contexts that needed to share data, we implemented an event bus (using RabbitMQ) for asynchronous communication:

// Order service publishing events
class OrderService {
  async confirmOrder(orderId) {
    const order = await this.orderRepository.findById(orderId);
    order.confirm();
    await this.orderRepository.save(order);
    
    // Publish domain event
    await this.eventBus.publish('order-events', {
      type: 'OrderConfirmed',
      data: {
        orderId: order.id,
        customerId: order.customerId,
        totalAmount: order.totalAmount,
        confirmedAt: order.confirmedAt
      },
      metadata: {
        version: '1.0',
        timestamp: new Date().toISOString()
      }
    });
    
    return order;
  }
}

// Inventory service subscribing to events
class InventoryEventHandler {
  async initialize() {
    await this.eventBus.subscribe('order-events', async (event) => {
      if (event.type === 'OrderConfirmed') {
        await this.releaseReservation(event.data.orderId);
      }
    });
  }
  
  async releaseReservation(orderId) {
    const reservation = await this.reservationRepository.findByOrderId(orderId);
    reservation.release();
    await this.reservationRepository.save(reservation);
  }
}

This decoupled services and made the system more resilient to individual service failures.

Challenges and Solutions

Our journey wasn’t without challenges. Here are some we faced and how we overcame them:

Challenge 1: Shared Database Access

Multiple services initially needed access to the same database tables.

Solution: Database Views and Eventual Consistency

Created read-only views for shared data
Implemented a data replication service to synchronize data between databases
Embraced eventual consistency where possible

// Data replication service
class DataReplicationService {
  async syncCustomers() {
    const lastSyncTimestamp = await this.getLastSyncTimestamp('customers');
    
    const newCustomers = await this.monolithDb.query(`
      SELECT * FROM customers 
      WHERE updated_at > $1
    `, [lastSyncTimestamp]);
    
    for (const customer of newCustomers) {
      await this.identityServiceDb.query(`
        INSERT INTO customers (id, email, name, phone, created_at, updated_at)
        VALUES ($1, $2, $3, $4, $5, $6)
        ON CONFLICT (id) DO UPDATE
        SET email = $2, name = $3, phone = $4, updated_at = $6
      `, [
        customer.id,
        customer.email,
        customer.name,
        customer.phone,
        customer.created_at,
        customer.updated_at
      ]);
    }
    
    await this.updateLastSyncTimestamp('customers', new Date());
  }
}

Challenge 2: Authentication Across Services

Each service needed to validate user identity and permissions.

Solution: JWT Tokens and a Central Identity Service

Created a dedicated Identity Service
Implemented JWT-based authentication
Added middleware to validate tokens in each service

// JWT validation middleware
function authMiddleware(requiredScopes = []) {
  return async (req, res, next) => {
    try {
      const authHeader = req.headers.authorization;
      if (!authHeader || !authHeader.startsWith('Bearer ')) {
        return res.status(401).json({ error: 'Missing or invalid authorization header' });
      }
      
      const token = authHeader.split(' ')[1];
      const decoded = jwt.verify(token, process.env.JWT_SECRET);
      
      // Validate token hasn't expired
      if (decoded.exp < Date.now() / 1000) {
        return res.status(401).json({ error: 'Token expired' });
      }
      
      // Check required scopes
      if (requiredScopes.length > 0) {
        const hasAllScopes = requiredScopes.every(scope => 
          decoded.scopes && decoded.scopes.includes(scope)
        );
        
        if (!hasAllScopes) {
          return res.status(403).json({ error: 'Insufficient permissions' });
        }
      }
      
      // Attach user to request
      req.user = decoded;
      next();
    } catch (error) {
      return res.status(401).json({ error: 'Invalid token' });
    }
  };
}

// Usage in a service
app.get('/api/orders', 
  authMiddleware(['orders:read']),
  async (req, res) => {
    const orders = await orderService.findByCustomerId(req.user.sub);
    res.json(orders);
  }
);

Challenge 3: Distributed Transactions

Some operations needed to update data across multiple services.

Solution: Saga Pattern with Compensating Transactions

Implemented the Saga pattern for distributed transactions
Created compensating transactions for rollback scenarios
Used a state machine to track transaction progress

// Order processing saga
class OrderProcessingSaga {
  async process(orderId) {
    const saga = await this.sagaRepository.create({
      id: uuid(),
      type: 'ORDER_PROCESSING',
      status: 'STARTED',
      payload: { orderId },
      steps: [
        { name: 'RESERVE_INVENTORY', status: 'PENDING' },
        { name: 'PROCESS_PAYMENT', status: 'PENDING' },
        { name: 'UPDATE_ORDER_STATUS', status: 'PENDING' },
        { name: 'SEND_CONFIRMATION', status: 'PENDING' }
      ]
    });
    
    try {
      // Step 1: Reserve inventory
      await this.updateStepStatus(saga.id, 'RESERVE_INVENTORY', 'IN_PROGRESS');
      const inventoryResult = await this.inventoryService.reserve(saga.payload.orderId);
      await this.updateStepStatus(saga.id, 'RESERVE_INVENTORY', 'COMPLETED', inventoryResult);
      
      // Step 2: Process payment
      await this.updateStepStatus(saga.id, 'PROCESS_PAYMENT', 'IN_PROGRESS');
      const paymentResult = await this.paymentService.process(saga.payload.orderId);
      await this.updateStepStatus(saga.id, 'PROCESS_PAYMENT', 'COMPLETED', paymentResult);
      
      // Step 3: Update order status
      await this.updateStepStatus(saga.id, 'UPDATE_ORDER_STATUS', 'IN_PROGRESS');
      const orderResult = await this.orderService.updateStatus(saga.payload.orderId, 'PAID');
      await this.updateStepStatus(saga.id, 'UPDATE_ORDER_STATUS', 'COMPLETED', orderResult);
      
      // Step 4: Send confirmation
      await this.updateStepStatus(saga.id, 'SEND_CONFIRMATION', 'IN_PROGRESS');
      const notificationResult = await this.notificationService.sendOrderConfirmation(saga.payload.orderId);
      await this.updateStepStatus(saga.id, 'SEND_CONFIRMATION', 'COMPLETED', notificationResult);
      
      // Complete saga
      await this.sagaRepository.update(saga.id, { status: 'COMPLETED' });
      
    } catch (error) {
      // Failure - need to compensate
      await this.sagaRepository.update(saga.id, { 
        status: 'FAILED',
        error: error.message
      });
      
      // Run compensating transactions in reverse order
      await this.compensate(saga);
    }
  }
  
  async compensate(saga) {
    // Find the last completed step
    const completedSteps = saga.steps
      .filter(step => step.status === 'COMPLETED')
      .sort((a, b) => saga.steps.indexOf(b) - saga.steps.indexOf(a));
      
    for (const step of completedSteps) {
      switch (step.name) {
        case 'PROCESS_PAYMENT':
          await this.paymentService.refund(saga.payload.orderId);
          break;
        case 'RESERVE_INVENTORY':
          await this.inventoryService.releaseReservation(saga.payload.orderId);
          break;
        // Other compensating actions...
      }
    }
  }
}

Results: Six Months Later

Six months after beginning our journey, we had successfully migrated our monolith into seven microservices. The benefits were substantial:

Improved Scalability

Each service could scale independently based on demand
Black Friday traffic was handled smoothly with temporary scaling of just the Order and Payment services
60% reduction in infrastructure costs through more efficient resource utilization

Accelerated Development

Teams could deploy independently without coordinating releases
Onboarding time for new developers decreased from weeks to days
Feature development velocity increased by approximately 35%

Enhanced Reliability

System availability improved from 99.9% to 99.99%
Failures were isolated to specific services instead of bringing down the entire system
Mean time to recovery (MTTR) decreased by 70%

Technical Metrics

Deployment frequency increased from twice weekly to multiple times daily
Lead time for changes decreased from days to hours
Mean time to recovery decreased from hours to minutes

Lessons Learned

Our journey taught us valuable lessons about microservice migrations:

Start with boundaries, not services: Taking time to define proper domain boundaries using DDD avoided costly rework later.
Contract testing is non-negotiable: Without rigorous contract testing, service dependencies would have created a distributed monolith.
Data management is the hardest part: Shared data access and eventual consistency were our biggest challenges.
Don’t rush the transition: The gradual Strangler Pattern approach minimized risk and allowed us to learn as we went.
Monitoring is different for microservices: We had to invest in distributed tracing, centralized logging, and service-level alerting.

Conclusion

Breaking down a monolith is a challenging but rewarding journey. By following the three pillars—Domain-Driven Design, Contract Testing, and the Strangler Pattern—we successfully modernized our application architecture without disrupting our business.

Remember that microservices aren’t a goal in themselves but a means to address specific scaling and organizational challenges. Don’t break up your monolith because it’s trendy; do it because it solves real problems for your team and business.

Have you been through a similar migration? I’d love to hear about your experiences in the comments!