Circuit Breaker Pattern Implementation
Overview
PadawanForge implements a robust circuit breaker pattern to handle failures in external services and prevent cascading failures. This system provides automatic failure detection, recovery mechanisms, and configurable retry logic with exponential backoff.
Architecture
Circuit Breaker States
The circuit breaker operates in three states:
- Closed: Normal operation - requests pass through
- Open: Failure threshold reached - requests are blocked
- Half-Open: Testing recovery - limited requests allowed
Key Features
- Automatic Failure Detection: Monitors service health and failure rates
- Configurable Thresholds: Adjustable failure and recovery parameters
- Exponential Backoff: Intelligent retry logic with jitter
- Service Isolation: Independent circuit breakers per service
- Health Monitoring: Real-time service health tracking
Implementation
CircuitBreaker Class
import { CircuitBreaker, CircuitBreakerManager } from '@/lib/utils/circuit-breaker';
// Create a circuit breaker for a specific service
const breaker = new CircuitBreaker('ai-service', {
failureThreshold: 5,
recoveryTimeout: 60000, // 1 minute
monitoringPeriod: 300000, // 5 minutes
maxRetries: 3,
initialRetryDelay: 1000,
maxRetryDelay: 30000,
retryMultiplier: 2
});
Configuration Options
interface CircuitBreakerOptions {
failureThreshold: number; // Number of failures before opening
recoveryTimeout: number; // Time to wait before half-open (ms)
monitoringPeriod: number; // Time window for failure counting (ms)
maxRetries: number; // Maximum retry attempts
initialRetryDelay: number; // Initial delay between retries (ms)
maxRetryDelay: number; // Maximum delay between retries (ms)
retryMultiplier: number; // Multiplier for exponential backoff
}
CircuitBreakerManager
Centralized management of multiple circuit breakers:
// Get the global manager
const manager = CircuitBreakerManager.getInstance();
// Create or get existing circuit breaker
const aiBreaker = manager.getOrCreate('ai-service', {
failureThreshold: 3,
recoveryTimeout: 30000
});
// Get all circuit breakers
const allBreakers = manager.getAllBreakers();
// Reset all circuit breakers
manager.resetAll();
Usage Examples
Basic Service Integration
// Wrap external service calls
const result = await breaker.execute(async () => {
const response = await fetch('https://api.external-service.com/data');
if (!response.ok) {
throw new Error(`HTTP ${response.status}`);
}
return response.json();
});
With Custom Retry Options
const result = await breaker.execute(
async () => {
return await externalService.call();
},
{
maxRetries: 5,
initialDelay: 500,
maxDelay: 10000,
multiplier: 1.5,
jitter: true
}
);
Service Health Monitoring
// Check circuit breaker health
const health = breaker.getHealth();
console.log('Service Health:', {
status: health.status,
failureCount: health.details.failureCount,
successCount: health.details.successCount,
lastFailureTime: health.details.lastFailureTime
});
Multiple Service Management
// Create circuit breakers for different services
const manager = CircuitBreakerManager.getInstance();
const databaseBreaker = manager.getOrCreate('database', {
failureThreshold: 10,
recoveryTimeout: 120000
});
const aiBreaker = manager.getOrCreate('ai-service', {
failureThreshold: 3,
recoveryTimeout: 60000
});
const externalApiBreaker = manager.getOrCreate('external-api', {
failureThreshold: 5,
recoveryTimeout: 90000
});
// Get health summary for all services
const healthSummary = manager.getHealthSummary();
Advanced Patterns
Service Wrapper Pattern
class AIServiceWrapper {
private breaker: CircuitBreaker;
constructor() {
this.breaker = new CircuitBreaker('ai-service', {
failureThreshold: 3,
recoveryTimeout: 60000
});
}
async generateResponse(prompt: string): Promise<string> {
return await this.breaker.execute(async () => {
const response = await this.aiService.generate(prompt);
return response.text;
});
}
async testConnection(): Promise<boolean> {
return await this.breaker.execute(async () => {
const response = await this.aiService.ping();
return response.status === 'ok';
});
}
}
Fallback Pattern
async function getDataWithFallback(id: string) {
const breaker = new CircuitBreaker('primary-service');
try {
return await breaker.execute(async () => {
return await primaryService.getData(id);
});
} catch (error) {
if (error.name === 'CircuitBreakerOpenError') {
// Use fallback service
return await fallbackService.getData(id);
}
throw error;
}
}
Monitoring Integration
// Integrate with monitoring system
const breaker = new CircuitBreaker('external-api');
breaker.onStateChange((oldState, newState) => {
monitoring.trackEvent('circuit_breaker_state_change', {
service: 'external-api',
oldState,
newState,
timestamp: new Date().toISOString()
});
});
breaker.onFailure((error) => {
monitoring.trackError(requestId, error, 'external_service');
});
Configuration Best Practices
Service-Specific Configuration
// Database service - higher threshold due to connection pooling
const dbBreaker = new CircuitBreaker('database', {
failureThreshold: 15,
recoveryTimeout: 30000,
maxRetries: 2
});
// AI service - lower threshold due to external dependency
const aiBreaker = new CircuitBreaker('ai-service', {
failureThreshold: 3,
recoveryTimeout: 60000,
maxRetries: 3
});
// External API - moderate threshold
const apiBreaker = new CircuitBreaker('external-api', {
failureThreshold: 5,
recoveryTimeout: 90000,
maxRetries: 2
});
Environment-Based Configuration
const getBreakerConfig = (serviceName: string) => {
const baseConfig = {
failureThreshold: 5,
recoveryTimeout: 60000,
maxRetries: 3
};
// Adjust for production
if (process.env.NODE_ENV === 'production') {
return {
...baseConfig,
failureThreshold: 3, // More sensitive in production
recoveryTimeout: 120000 // Longer recovery time
};
}
// Adjust for development
if (process.env.NODE_ENV === 'development') {
return {
...baseConfig,
failureThreshold: 10, // Less sensitive in development
recoveryTimeout: 30000 // Faster recovery for testing
};
}
return baseConfig;
};
Error Handling
Circuit Breaker Errors
try {
const result = await breaker.execute(async () => {
return await externalService.call();
});
} catch (error) {
if (error.name === 'CircuitBreakerOpenError') {
// Circuit breaker is open - service is down
console.log('Service temporarily unavailable');
return await useFallbackService();
} else if (error.name === 'CircuitBreakerTimeoutError') {
// Request timed out
console.log('Service request timed out');
return await useCachedData();
} else {
// Other errors
throw error;
}
}
Graceful Degradation
async function getDataWithGracefulDegradation(id: string) {
const breaker = new CircuitBreaker('data-service');
try {
return await breaker.execute(async () => {
return await dataService.getData(id);
});
} catch (error) {
if (error.name === 'CircuitBreakerOpenError') {
// Return cached data or default values
return await cacheService.getData(id) || getDefaultData(id);
}
throw error;
}
}
Monitoring and Alerting
Health Dashboard
// Get comprehensive health information
const manager = CircuitBreakerManager.getInstance();
const healthSummary = manager.getHealthSummary();
console.log('Circuit Breaker Health Summary:', {
totalServices: healthSummary.total,
healthyServices: healthSummary.healthy,
degradedServices: healthSummary.degraded,
unhealthyServices: healthSummary.unhealthy,
services: healthSummary.services
});
Alert Integration
// Set up alerts for circuit breaker state changes
const manager = CircuitBreakerManager.getInstance();
manager.onServiceUnhealthy((serviceName, health) => {
alertService.send({
type: 'circuit_breaker_unhealthy',
service: serviceName,
failureCount: health.failureCount,
lastFailureTime: health.lastFailureTime
});
});
manager.onServiceRecovered((serviceName, health) => {
alertService.send({
type: 'circuit_breaker_recovered',
service: serviceName,
recoveryTime: health.recoveryTime
});
});
Testing
Unit Testing
describe('CircuitBreaker', () => {
let breaker: CircuitBreaker;
beforeEach(() => {
breaker = new CircuitBreaker('test-service', {
failureThreshold: 2,
recoveryTimeout: 1000
});
});
it('should open circuit after failure threshold', async () => {
// Simulate failures
for (let i = 0; i < 3; i++) {
try {
await breaker.execute(async () => {
throw new Error('Service failure');
});
} catch (error) {
// Expected
}
}
// Circuit should be open
expect(breaker.getState().status).toBe('open');
});
it('should allow requests in half-open state', async () => {
// Open the circuit
breaker.getState().status = 'open';
breaker.getState().nextRetryTime = Date.now() - 1000;
// Should allow one request in half-open state
const result = await breaker.execute(async () => {
return 'success';
});
expect(result).toBe('success');
});
});
Integration Testing
describe('CircuitBreaker Integration', () => {
it('should handle external service failures gracefully', async () => {
const service = new AIServiceWrapper();
// Mock external service to fail
jest.spyOn(service, 'aiService').mockRejectedValue(new Error('Service unavailable'));
// Should throw CircuitBreakerOpenError after failures
await expect(service.generateResponse('test')).rejects.toThrow('CircuitBreakerOpenError');
});
});
Performance Considerations
Memory Management
// Clean up circuit breakers for unused services
const manager = CircuitBreakerManager.getInstance();
// Remove circuit breakers for inactive services
setInterval(() => {
const activeServices = getActiveServices();
const allBreakers = manager.getAllBreakers();
for (const [serviceName, breaker] of allBreakers) {
if (!activeServices.includes(serviceName)) {
manager.removeBreaker(serviceName);
}
}
}, 300000); // Every 5 minutes
Configuration Optimization
// Optimize configuration based on service characteristics
const getOptimizedConfig = (serviceType: 'database' | 'api' | 'ai') => {
switch (serviceType) {
case 'database':
return {
failureThreshold: 10,
recoveryTimeout: 30000,
maxRetries: 1 // Database connections are expensive
};
case 'api':
return {
failureThreshold: 5,
recoveryTimeout: 60000,
maxRetries: 2
};
case 'ai':
return {
failureThreshold: 3,
recoveryTimeout: 120000,
maxRetries: 3 // AI services can be slow
};
}
};
This circuit breaker implementation provides robust failure handling and ensures system resilience in the face of external service failures.