From Monolith to Microservices: How We Split a 5-Year-Old Node.js App Without Losing Sleep

Domain-Driven Design, Contract Testing, and the Strangler Pattern

When our startup hit hypergrowth, our once-elegant monolithic Node.js application started showing signs of strain. The codebase that had served us well for five years was becoming increasingly difficult to maintain, scale, and deploy. Feature development slowed to a crawl, and each deployment became a nerve-wracking event.

Sound familiar? If you’re facing similar challenges with your monolith, this post outlines our journey of breaking down a complex Node.js application into microservices—without disrupting our business or losing sleep over production incidents.

The Breaking Point

Our monolith started as a simple Express application handling a few core features. Over five years, it had grown to encompass:

  • User authentication and management
  • Product catalog and inventory
  • Order processing and payments
  • Analytics and reporting
  • Notification systems
  • Integration with 15+ third-party services

With a team that had grown from 3 to 30 developers, the monolith became a bottleneck:

  • Deployment nightmares: A small change to the notification system required testing and deploying the entire application
  • Scaling issues: The order processing system needed more resources during sales events, but we couldn’t scale it independently
  • Development conflicts: Multiple teams working in the same codebase led to frequent merge conflicts
  • Onboarding challenges: New developers needed weeks to understand the entire system before making contributions

After a particularly painful Black Friday where the system struggled under load despite our best preparations, we knew it was time for a change.

Our Approach: The Three Pillars

Rather than going for a “big bang” rewrite (which almost always ends in disaster), we adopted a gradual approach based on three pillars:

  1. Domain-Driven Design to identify service boundaries
  2. Contract Testing to ensure services could evolve independently
  3. The Strangler Pattern to gradually migrate functionality

Let’s dive into each pillar and see how it contributed to our successful migration.

Pillar 1: Domain-Driven Design for Service Identification

The first challenge was deciding how to split the monolith. Random cutting lines would only create distributed chaos instead of a distributed system. Domain-Driven Design (DDD) provided the framework we needed.

Step 1: Event Storming to Map Business Domains

We started with an event storming workshop involving developers, product managers, and domain experts. Using colored sticky notes on a wall (virtual in our remote environment), we mapped out:

  • Domain Events: Things that happen in the business (OrderPlaced, PaymentReceived, etc.)
  • Commands: User intentions that trigger events (PlaceOrder, CancelOrder)
  • Aggregates: Clusters of domain objects that change together
  • Bounded Contexts: Logical boundaries where certain terms and rules apply

This exercise revealed seven distinct bounded contexts:

  1. Identity and Access: User accounts, authentication, permissions
  2. Product Catalog: Products, categories, pricing
  3. Inventory: Stock levels, reservations, suppliers
  4. Order Management: Orders, returns, fulfillment
  5. Payments: Payment processing, refunds, invoicing
  6. Notifications: Email, SMS, push notifications
  7. Analytics: Reporting, dashboards, business intelligence

Step 2: Identifying Aggregates and Their Relationships

Within each bounded context, we identified key aggregates (clusters of entities that should be modified together) and their relationships:

// Example: Order aggregate in the Order Management context
class Order {
  constructor(customerId, items, shippingAddress, billingAddress) {
    this.id = uuid();
    this.customerId = customerId;
    this.items = items;
    this.shippingAddress = shippingAddress;
    this.billingAddress = billingAddress;
    this.status = 'created';
    this.createdAt = new Date();
    this.events = [];
    
    // Register domain event
    this.events.push(new OrderCreatedEvent(this));
  }
  
  confirm() {
    if (this.status !== 'created') {
      throw new Error('Order can only be confirmed when in created status');
    }
    
    this.status = 'confirmed';
    this.confirmedAt = new Date();
    
    // Register domain event
    this.events.push(new OrderConfirmedEvent(this));
  }
  
  // Other methods...
}

Step 3: Context Mapping to Define Service Interactions

We created a context map showing how bounded contexts interact. This helped us understand dependencies and potential integration points:

  • Partnership: Inventory and Product Catalog (close collaboration needed)
  • Customer-Supplier: Order Management depends on Identity for user information
  • Conformist: Analytics consumes events from all other contexts
  • Anti-corruption Layer: Order Management to legacy payment processor

This map became our blueprint for service boundaries and interaction patterns.

Pillar 2: Contract Testing for Safe Evolution

With service boundaries defined, we needed to ensure they could evolve independently without breaking each other. Enter contract testing.

Step 1: Define Service Contracts

For each service-to-service interaction, we defined explicit contracts covering:

  1. API Endpoints: Routes, methods, headers
  2. Request/Response Formats: Required fields, data types, validations
  3. Error Handling: Status codes, error formats
  4. Event Schemas: For asynchronous communication

We used OpenAPI (Swagger) for REST APIs and AsyncAPI for event-driven interactions.

Step 2: Implement Consumer-Driven Contract Tests

Instead of relying on traditional end-to-end tests (which are slow and brittle), we implemented consumer-driven contract tests using Pact.js:

// Consumer side: Order service testing its interaction with the Payment service
describe('Order Service - Payment Service Integration', () => {
  const pact = new Pact({
    consumer: 'OrderService',
    provider: 'PaymentService',
    port: 8888,
    log: path.resolve(process.cwd(), 'logs', 'pact.log')
  });

  before(() => pact.setup());
  after(() => pact.finalize());

  describe('Process Payment', () => {
    before(() => {
      return pact.addInteraction({
        state: 'Payment processor is available',
        uponReceiving: 'a request to process payment',
        withRequest: {
          method: 'POST',
          path: '/api/payments',
          headers: { 'Content-Type': 'application/json' },
          body: {
            orderId: like('order-123'),
            amount: like(99.99),
            currency: like('USD'),
            paymentMethod: {
              type: like('creditCard'),
              lastFour: like('4242'),
              expiryMonth: like(12),
              expiryYear: like(2025)
            }
          }
        },
        willRespondWith: {
          status: 201,
          headers: { 'Content-Type': 'application/json' },
          body: {
            id: like('payment-456'),
            status: like('processing'),
            transactionId: like('tx-789')
          }
        }
      });
    });

    it('processes a payment successfully', async () => {
      const paymentService = new PaymentService('http://localhost:8888');
      const result = await paymentService.processPayment({
        orderId: 'order-123',
        amount: 99.99,
        currency: 'USD',
        paymentMethod: {
          type: 'creditCard',
          lastFour: '4242',
          expiryMonth: 12,
          expiryYear: 2025
        }
      });
      
      expect(result).to.have.property('id');
      expect(result).to.have.property('status');
      expect(result).to.have.property('transactionId');
    });
  });
});

Step 3: Integrate Contract Tests into CI/CD

We integrated contract tests into our CI/CD pipeline:

  1. Consumer services pushed contracts to a Pact Broker
  2. Provider services verified they could fulfill these contracts
  3. Any breaking changes would fail the build before deployment

This approach allowed services to evolve independently as long as they maintained their contracts.

Pillar 3: The Strangler Pattern for Gradual Migration

With service boundaries defined and contract testing in place, we were ready to start the actual migration. Rather than a risky big-bang approach, we adopted the Strangler Pattern:

  1. Create new services alongside the monolith
  2. Gradually redirect traffic from the monolith to new services
  3. Eventually decommission the monolith code

Step 1: Implement an API Gateway

We introduced an API Gateway (using Kong) in front of our monolith:

// Kong configuration example
{
  "services": [
    {
      "name": "monolith-service",
      "url": "http://monolith-app:3000"
    },
    {
      "name": "identity-service",
      "url": "http://identity-service:3000"
    }
  ],
  "routes": [
    {
      "name": "all-traffic-to-monolith",
      "service": "monolith-service",
      "paths": ["/"]
    }
  ]
}

This gateway gave us the flexibility to route traffic without changing client applications.

Step 2: Extract Services One by One

We started with the simplest bounded context: Notifications. We:

  1. Created a new microservice with the same functionality
  2. Set up a database for the service
  3. Migrated necessary data
  4. Updated the API gateway to route notification requests to the new service
// Kong configuration update
{
  "routes": [
    {
      "name": "notifications-traffic",
      "service": "notification-service",
      "paths": ["/api/notifications", "/api/v1/notifications"]
    },
    {
      "name": "remaining-traffic-to-monolith",
      "service": "monolith-service",
      "paths": ["/"]
    }
  ]
}

Step 3: Use Feature Flags for Safe Transitions

For each migration, we implemented feature flags to enable gradual rollout and quick rollback:

// Using LaunchDarkly for feature flags
const ldClient = LaunchDarkly.init('YOUR_SDK_KEY');

app.post('/api/notifications', async (req, res) => {
  const user = req.user;
  
  // Check if this user should use the new service
  const useNewService = await ldClient.variation(
    'use-notification-service', 
    { key: user.id, email: user.email }, 
    false
  );
  
  if (useNewService) {
    // Forward to new service
    const response = await notificationServiceClient.sendNotification(req.body);
    return res.status(response.status).json(response.data);
  } else {
    // Use legacy implementation
    const notification = await legacyNotificationService.send(req.body);
    return res.status(201).json(notification);
  }
});

This allowed us to:

  • Start with 1% of traffic going to the new service
  • Gradually increase as we gained confidence
  • Quickly roll back if issues were detected

Step 4: Implement Event-Based Communication

For contexts that needed to share data, we implemented an event bus (using RabbitMQ) for asynchronous communication:

// Order service publishing events
class OrderService {
  async confirmOrder(orderId) {
    const order = await this.orderRepository.findById(orderId);
    order.confirm();
    await this.orderRepository.save(order);
    
    // Publish domain event
    await this.eventBus.publish('order-events', {
      type: 'OrderConfirmed',
      data: {
        orderId: order.id,
        customerId: order.customerId,
        totalAmount: order.totalAmount,
        confirmedAt: order.confirmedAt
      },
      metadata: {
        version: '1.0',
        timestamp: new Date().toISOString()
      }
    });
    
    return order;
  }
}

// Inventory service subscribing to events
class InventoryEventHandler {
  async initialize() {
    await this.eventBus.subscribe('order-events', async (event) => {
      if (event.type === 'OrderConfirmed') {
        await this.releaseReservation(event.data.orderId);
      }
    });
  }
  
  async releaseReservation(orderId) {
    const reservation = await this.reservationRepository.findByOrderId(orderId);
    reservation.release();
    await this.reservationRepository.save(reservation);
  }
}

This decoupled services and made the system more resilient to individual service failures.

Challenges and Solutions

Our journey wasn’t without challenges. Here are some we faced and how we overcame them:

Challenge 1: Shared Database Access

Multiple services initially needed access to the same database tables.

Solution: Database Views and Eventual Consistency

  1. Created read-only views for shared data
  2. Implemented a data replication service to synchronize data between databases
  3. Embraced eventual consistency where possible
// Data replication service
class DataReplicationService {
  async syncCustomers() {
    const lastSyncTimestamp = await this.getLastSyncTimestamp('customers');
    
    const newCustomers = await this.monolithDb.query(`
      SELECT * FROM customers 
      WHERE updated_at > $1
    `, [lastSyncTimestamp]);
    
    for (const customer of newCustomers) {
      await this.identityServiceDb.query(`
        INSERT INTO customers (id, email, name, phone, created_at, updated_at)
        VALUES ($1, $2, $3, $4, $5, $6)
        ON CONFLICT (id) DO UPDATE
        SET email = $2, name = $3, phone = $4, updated_at = $6
      `, [
        customer.id,
        customer.email,
        customer.name,
        customer.phone,
        customer.created_at,
        customer.updated_at
      ]);
    }
    
    await this.updateLastSyncTimestamp('customers', new Date());
  }
}

Challenge 2: Authentication Across Services

Each service needed to validate user identity and permissions.

Solution: JWT Tokens and a Central Identity Service

  1. Created a dedicated Identity Service
  2. Implemented JWT-based authentication
  3. Added middleware to validate tokens in each service
// JWT validation middleware
function authMiddleware(requiredScopes = []) {
  return async (req, res, next) => {
    try {
      const authHeader = req.headers.authorization;
      if (!authHeader || !authHeader.startsWith('Bearer ')) {
        return res.status(401).json({ error: 'Missing or invalid authorization header' });
      }
      
      const token = authHeader.split(' ')[1];
      const decoded = jwt.verify(token, process.env.JWT_SECRET);
      
      // Validate token hasn't expired
      if (decoded.exp < Date.now() / 1000) {
        return res.status(401).json({ error: 'Token expired' });
      }
      
      // Check required scopes
      if (requiredScopes.length > 0) {
        const hasAllScopes = requiredScopes.every(scope => 
          decoded.scopes && decoded.scopes.includes(scope)
        );
        
        if (!hasAllScopes) {
          return res.status(403).json({ error: 'Insufficient permissions' });
        }
      }
      
      // Attach user to request
      req.user = decoded;
      next();
    } catch (error) {
      return res.status(401).json({ error: 'Invalid token' });
    }
  };
}

// Usage in a service
app.get('/api/orders', 
  authMiddleware(['orders:read']),
  async (req, res) => {
    const orders = await orderService.findByCustomerId(req.user.sub);
    res.json(orders);
  }
);

Challenge 3: Distributed Transactions

Some operations needed to update data across multiple services.

Solution: Saga Pattern with Compensating Transactions

  1. Implemented the Saga pattern for distributed transactions
  2. Created compensating transactions for rollback scenarios
  3. Used a state machine to track transaction progress
// Order processing saga
class OrderProcessingSaga {
  async process(orderId) {
    const saga = await this.sagaRepository.create({
      id: uuid(),
      type: 'ORDER_PROCESSING',
      status: 'STARTED',
      payload: { orderId },
      steps: [
        { name: 'RESERVE_INVENTORY', status: 'PENDING' },
        { name: 'PROCESS_PAYMENT', status: 'PENDING' },
        { name: 'UPDATE_ORDER_STATUS', status: 'PENDING' },
        { name: 'SEND_CONFIRMATION', status: 'PENDING' }
      ]
    });
    
    try {
      // Step 1: Reserve inventory
      await this.updateStepStatus(saga.id, 'RESERVE_INVENTORY', 'IN_PROGRESS');
      const inventoryResult = await this.inventoryService.reserve(saga.payload.orderId);
      await this.updateStepStatus(saga.id, 'RESERVE_INVENTORY', 'COMPLETED', inventoryResult);
      
      // Step 2: Process payment
      await this.updateStepStatus(saga.id, 'PROCESS_PAYMENT', 'IN_PROGRESS');
      const paymentResult = await this.paymentService.process(saga.payload.orderId);
      await this.updateStepStatus(saga.id, 'PROCESS_PAYMENT', 'COMPLETED', paymentResult);
      
      // Step 3: Update order status
      await this.updateStepStatus(saga.id, 'UPDATE_ORDER_STATUS', 'IN_PROGRESS');
      const orderResult = await this.orderService.updateStatus(saga.payload.orderId, 'PAID');
      await this.updateStepStatus(saga.id, 'UPDATE_ORDER_STATUS', 'COMPLETED', orderResult);
      
      // Step 4: Send confirmation
      await this.updateStepStatus(saga.id, 'SEND_CONFIRMATION', 'IN_PROGRESS');
      const notificationResult = await this.notificationService.sendOrderConfirmation(saga.payload.orderId);
      await this.updateStepStatus(saga.id, 'SEND_CONFIRMATION', 'COMPLETED', notificationResult);
      
      // Complete saga
      await this.sagaRepository.update(saga.id, { status: 'COMPLETED' });
      
    } catch (error) {
      // Failure - need to compensate
      await this.sagaRepository.update(saga.id, { 
        status: 'FAILED',
        error: error.message
      });
      
      // Run compensating transactions in reverse order
      await this.compensate(saga);
    }
  }
  
  async compensate(saga) {
    // Find the last completed step
    const completedSteps = saga.steps
      .filter(step => step.status === 'COMPLETED')
      .sort((a, b) => saga.steps.indexOf(b) - saga.steps.indexOf(a));
      
    for (const step of completedSteps) {
      switch (step.name) {
        case 'PROCESS_PAYMENT':
          await this.paymentService.refund(saga.payload.orderId);
          break;
        case 'RESERVE_INVENTORY':
          await this.inventoryService.releaseReservation(saga.payload.orderId);
          break;
        // Other compensating actions...
      }
    }
  }
}

Results: Six Months Later

Six months after beginning our journey, we had successfully migrated our monolith into seven microservices. The benefits were substantial:

Improved Scalability

  • Each service could scale independently based on demand
  • Black Friday traffic was handled smoothly with temporary scaling of just the Order and Payment services
  • 60% reduction in infrastructure costs through more efficient resource utilization

Accelerated Development

  • Teams could deploy independently without coordinating releases
  • Onboarding time for new developers decreased from weeks to days
  • Feature development velocity increased by approximately 35%

Enhanced Reliability

  • System availability improved from 99.9% to 99.99%
  • Failures were isolated to specific services instead of bringing down the entire system
  • Mean time to recovery (MTTR) decreased by 70%

Technical Metrics

  • Deployment frequency increased from twice weekly to multiple times daily
  • Lead time for changes decreased from days to hours
  • Mean time to recovery decreased from hours to minutes

Lessons Learned

Our journey taught us valuable lessons about microservice migrations:

  1. Start with boundaries, not services: Taking time to define proper domain boundaries using DDD avoided costly rework later.
  2. Contract testing is non-negotiable: Without rigorous contract testing, service dependencies would have created a distributed monolith.
  3. Data management is the hardest part: Shared data access and eventual consistency were our biggest challenges.
  4. Don’t rush the transition: The gradual Strangler Pattern approach minimized risk and allowed us to learn as we went.
  5. Monitoring is different for microservices: We had to invest in distributed tracing, centralized logging, and service-level alerting.

Conclusion

Breaking down a monolith is a challenging but rewarding journey. By following the three pillars—Domain-Driven Design, Contract Testing, and the Strangler Pattern—we successfully modernized our application architecture without disrupting our business.

Remember that microservices aren’t a goal in themselves but a means to address specific scaling and organizational challenges. Don’t break up your monolith because it’s trendy; do it because it solves real problems for your team and business.

Have you been through a similar migration? I’d love to hear about your experiences in the comments!



Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

CAPTCHA ImageChange Image