Skip to main content

Scaling Architecture

How to scale Bellamy Book for high traffic.

Scaling Strategies

Horizontal Scaling

Add more instances:

  • API — stateless; scale behind a load balancer
  • Frontend / Admin — static assets; scale or use CDN
  • Workers — scale per type: GraphWorker, ScoringWorker, TrendingWorker, HashtagWorker, ElasticsearchSyncWorker, BlogAutoGenerationWorker, MediaProcessingWorker, WebSocketWorker, WebPushNotificationWorker, ChatWorker (each consumes Kafka or runs as a hosted service)

Vertical Scaling

Increase resources:

  • CPU
  • Memory
  • Storage

Load Balancing

API Load Balancing

┌─────────────┐
│ Load │
│ Balancer │
└──────┬──────┘

┌───────────┼───────────┐
│ │ │
┌───▼───┐ ┌───▼───┐ ┌───▼───┐
│ API 1 │ │ API 2 │ │ API 3 │
└───────┘ └───────┘ └───────┘

Load Balancing Algorithms

  • Round Robin
  • Least Connections
  • IP Hash
  • Weighted Round Robin

Database Scaling

Read Replicas

Master (Write) ──┐

├──→ Replica 1 (Read)
├──→ Replica 2 (Read)
└──→ Replica 3 (Read)

Sharding

Partition data across multiple databases:

  • User sharding
  • Post sharding
  • Geographic sharding

Caching Strategy

Multi-Level Caching

1. Browser Cache

2. CDN Cache

3. Application Cache (Redis)

4. Database

Cache Invalidation

  • Time-based expiration
  • Event-based invalidation
  • Manual invalidation

Worker Scaling

Auto-Scaling Workers

Based on queue length:

  • High queue → Scale up
  • Low queue → Scale down

Worker Types

  • Stateless workers (easy to scale)
  • Stateful workers (require coordination)

Performance Optimization

Database Optimization

  • Indexes
  • Query optimization
  • Connection pooling

Application Optimization

  • Code optimization
  • Caching
  • Async processing

Monitoring Scaling

Metrics to Monitor

  • Request rate
  • Response time
  • Error rate
  • Resource usage
  • Queue length

Auto-Scaling Rules

minReplicas: 2
maxReplicas: 10
targetCPUUtilization: 70
targetMemoryUtilization: 80

Next Steps