Circuit Breaker Pattern Implementation

Overview

PadawanForge implements a robust circuit breaker pattern to handle failures in external services and prevent cascading failures. This system provides automatic failure detection, recovery mechanisms, and configurable retry logic with exponential backoff.

Architecture

Circuit Breaker States

The circuit breaker operates in three states:

  1. Closed: Normal operation - requests pass through
  2. Open: Failure threshold reached - requests are blocked
  3. Half-Open: Testing recovery - limited requests allowed

Key Features

  • Automatic Failure Detection: Monitors service health and failure rates
  • Configurable Thresholds: Adjustable failure and recovery parameters
  • Exponential Backoff: Intelligent retry logic with jitter
  • Service Isolation: Independent circuit breakers per service
  • Health Monitoring: Real-time service health tracking

Implementation

CircuitBreaker Class

import { CircuitBreaker, CircuitBreakerManager } from '@/lib/utils/circuit-breaker';

// Create a circuit breaker for a specific service
const breaker = new CircuitBreaker('ai-service', {
  failureThreshold: 5,
  recoveryTimeout: 60000, // 1 minute
  monitoringPeriod: 300000, // 5 minutes
  maxRetries: 3,
  initialRetryDelay: 1000,
  maxRetryDelay: 30000,
  retryMultiplier: 2
});

Configuration Options

interface CircuitBreakerOptions {
  failureThreshold: number;    // Number of failures before opening
  recoveryTimeout: number;     // Time to wait before half-open (ms)
  monitoringPeriod: number;    // Time window for failure counting (ms)
  maxRetries: number;          // Maximum retry attempts
  initialRetryDelay: number;   // Initial delay between retries (ms)
  maxRetryDelay: number;       // Maximum delay between retries (ms)
  retryMultiplier: number;     // Multiplier for exponential backoff
}

CircuitBreakerManager

Centralized management of multiple circuit breakers:

// Get the global manager
const manager = CircuitBreakerManager.getInstance();

// Create or get existing circuit breaker
const aiBreaker = manager.getOrCreate('ai-service', {
  failureThreshold: 3,
  recoveryTimeout: 30000
});

// Get all circuit breakers
const allBreakers = manager.getAllBreakers();

// Reset all circuit breakers
manager.resetAll();

Usage Examples

Basic Service Integration

// Wrap external service calls
const result = await breaker.execute(async () => {
  const response = await fetch('https://api.external-service.com/data');
  if (!response.ok) {
    throw new Error(`HTTP ${response.status}`);
  }
  return response.json();
});

With Custom Retry Options

const result = await breaker.execute(
  async () => {
    return await externalService.call();
  },
  {
    maxRetries: 5,
    initialDelay: 500,
    maxDelay: 10000,
    multiplier: 1.5,
    jitter: true
  }
);

Service Health Monitoring

// Check circuit breaker health
const health = breaker.getHealth();
console.log('Service Health:', {
  status: health.status,
  failureCount: health.details.failureCount,
  successCount: health.details.successCount,
  lastFailureTime: health.details.lastFailureTime
});

Multiple Service Management

// Create circuit breakers for different services
const manager = CircuitBreakerManager.getInstance();

const databaseBreaker = manager.getOrCreate('database', {
  failureThreshold: 10,
  recoveryTimeout: 120000
});

const aiBreaker = manager.getOrCreate('ai-service', {
  failureThreshold: 3,
  recoveryTimeout: 60000
});

const externalApiBreaker = manager.getOrCreate('external-api', {
  failureThreshold: 5,
  recoveryTimeout: 90000
});

// Get health summary for all services
const healthSummary = manager.getHealthSummary();

Advanced Patterns

Service Wrapper Pattern

class AIServiceWrapper {
  private breaker: CircuitBreaker;

  constructor() {
    this.breaker = new CircuitBreaker('ai-service', {
      failureThreshold: 3,
      recoveryTimeout: 60000
    });
  }

  async generateResponse(prompt: string): Promise<string> {
    return await this.breaker.execute(async () => {
      const response = await this.aiService.generate(prompt);
      return response.text;
    });
  }

  async testConnection(): Promise<boolean> {
    return await this.breaker.execute(async () => {
      const response = await this.aiService.ping();
      return response.status === 'ok';
    });
  }
}

Fallback Pattern

async function getDataWithFallback(id: string) {
  const breaker = new CircuitBreaker('primary-service');
  
  try {
    return await breaker.execute(async () => {
      return await primaryService.getData(id);
    });
  } catch (error) {
    if (error.name === 'CircuitBreakerOpenError') {
      // Use fallback service
      return await fallbackService.getData(id);
    }
    throw error;
  }
}

Monitoring Integration

// Integrate with monitoring system
const breaker = new CircuitBreaker('external-api');

breaker.onStateChange((oldState, newState) => {
  monitoring.trackEvent('circuit_breaker_state_change', {
    service: 'external-api',
    oldState,
    newState,
    timestamp: new Date().toISOString()
  });
});

breaker.onFailure((error) => {
  monitoring.trackError(requestId, error, 'external_service');
});

Configuration Best Practices

Service-Specific Configuration

// Database service - higher threshold due to connection pooling
const dbBreaker = new CircuitBreaker('database', {
  failureThreshold: 15,
  recoveryTimeout: 30000,
  maxRetries: 2
});

// AI service - lower threshold due to external dependency
const aiBreaker = new CircuitBreaker('ai-service', {
  failureThreshold: 3,
  recoveryTimeout: 60000,
  maxRetries: 3
});

// External API - moderate threshold
const apiBreaker = new CircuitBreaker('external-api', {
  failureThreshold: 5,
  recoveryTimeout: 90000,
  maxRetries: 2
});

Environment-Based Configuration

const getBreakerConfig = (serviceName: string) => {
  const baseConfig = {
    failureThreshold: 5,
    recoveryTimeout: 60000,
    maxRetries: 3
  };

  // Adjust for production
  if (process.env.NODE_ENV === 'production') {
    return {
      ...baseConfig,
      failureThreshold: 3, // More sensitive in production
      recoveryTimeout: 120000 // Longer recovery time
    };
  }

  // Adjust for development
  if (process.env.NODE_ENV === 'development') {
    return {
      ...baseConfig,
      failureThreshold: 10, // Less sensitive in development
      recoveryTimeout: 30000 // Faster recovery for testing
    };
  }

  return baseConfig;
};

Error Handling

Circuit Breaker Errors

try {
  const result = await breaker.execute(async () => {
    return await externalService.call();
  });
} catch (error) {
  if (error.name === 'CircuitBreakerOpenError') {
    // Circuit breaker is open - service is down
    console.log('Service temporarily unavailable');
    return await useFallbackService();
  } else if (error.name === 'CircuitBreakerTimeoutError') {
    // Request timed out
    console.log('Service request timed out');
    return await useCachedData();
  } else {
    // Other errors
    throw error;
  }
}

Graceful Degradation

async function getDataWithGracefulDegradation(id: string) {
  const breaker = new CircuitBreaker('data-service');
  
  try {
    return await breaker.execute(async () => {
      return await dataService.getData(id);
    });
  } catch (error) {
    if (error.name === 'CircuitBreakerOpenError') {
      // Return cached data or default values
      return await cacheService.getData(id) || getDefaultData(id);
    }
    throw error;
  }
}

Monitoring and Alerting

Health Dashboard

// Get comprehensive health information
const manager = CircuitBreakerManager.getInstance();
const healthSummary = manager.getHealthSummary();

console.log('Circuit Breaker Health Summary:', {
  totalServices: healthSummary.total,
  healthyServices: healthSummary.healthy,
  degradedServices: healthSummary.degraded,
  unhealthyServices: healthSummary.unhealthy,
  services: healthSummary.services
});

Alert Integration

// Set up alerts for circuit breaker state changes
const manager = CircuitBreakerManager.getInstance();

manager.onServiceUnhealthy((serviceName, health) => {
  alertService.send({
    type: 'circuit_breaker_unhealthy',
    service: serviceName,
    failureCount: health.failureCount,
    lastFailureTime: health.lastFailureTime
  });
});

manager.onServiceRecovered((serviceName, health) => {
  alertService.send({
    type: 'circuit_breaker_recovered',
    service: serviceName,
    recoveryTime: health.recoveryTime
  });
});

Testing

Unit Testing

describe('CircuitBreaker', () => {
  let breaker: CircuitBreaker;

  beforeEach(() => {
    breaker = new CircuitBreaker('test-service', {
      failureThreshold: 2,
      recoveryTimeout: 1000
    });
  });

  it('should open circuit after failure threshold', async () => {
    // Simulate failures
    for (let i = 0; i < 3; i++) {
      try {
        await breaker.execute(async () => {
          throw new Error('Service failure');
        });
      } catch (error) {
        // Expected
      }
    }

    // Circuit should be open
    expect(breaker.getState().status).toBe('open');
  });

  it('should allow requests in half-open state', async () => {
    // Open the circuit
    breaker.getState().status = 'open';
    breaker.getState().nextRetryTime = Date.now() - 1000;

    // Should allow one request in half-open state
    const result = await breaker.execute(async () => {
      return 'success';
    });

    expect(result).toBe('success');
  });
});

Integration Testing

describe('CircuitBreaker Integration', () => {
  it('should handle external service failures gracefully', async () => {
    const service = new AIServiceWrapper();
    
    // Mock external service to fail
    jest.spyOn(service, 'aiService').mockRejectedValue(new Error('Service unavailable'));
    
    // Should throw CircuitBreakerOpenError after failures
    await expect(service.generateResponse('test')).rejects.toThrow('CircuitBreakerOpenError');
  });
});

Performance Considerations

Memory Management

// Clean up circuit breakers for unused services
const manager = CircuitBreakerManager.getInstance();

// Remove circuit breakers for inactive services
setInterval(() => {
  const activeServices = getActiveServices();
  const allBreakers = manager.getAllBreakers();
  
  for (const [serviceName, breaker] of allBreakers) {
    if (!activeServices.includes(serviceName)) {
      manager.removeBreaker(serviceName);
    }
  }
}, 300000); // Every 5 minutes

Configuration Optimization

// Optimize configuration based on service characteristics
const getOptimizedConfig = (serviceType: 'database' | 'api' | 'ai') => {
  switch (serviceType) {
    case 'database':
      return {
        failureThreshold: 10,
        recoveryTimeout: 30000,
        maxRetries: 1 // Database connections are expensive
      };
    case 'api':
      return {
        failureThreshold: 5,
        recoveryTimeout: 60000,
        maxRetries: 2
      };
    case 'ai':
      return {
        failureThreshold: 3,
        recoveryTimeout: 120000,
        maxRetries: 3 // AI services can be slow
      };
  }
};

This circuit breaker implementation provides robust failure handling and ensures system resilience in the face of external service failures.

PadawanForge v1.4.1