Scaling Architecture
How to scale Bellamy Book for high traffic.
Scaling Strategies
Horizontal Scaling
Add more instances:
- API — stateless; scale behind a load balancer
- Frontend / Admin — static assets; scale or use CDN
- Workers — scale per type: GraphWorker, ScoringWorker, TrendingWorker, HashtagWorker, ElasticsearchSyncWorker, BlogAutoGenerationWorker, MediaProcessingWorker, WebSocketWorker, WebPushNotificationWorker, ChatWorker (each consumes Kafka or runs as a hosted service)
Vertical Scaling
Increase resources:
- CPU
- Memory
- Storage
Load Balancing
API Load Balancing
┌─────────────┐
│ Load │
│ Balancer │
└──────┬──────┘
│
┌───────────┼───────────┐
│ │ │
┌───▼───┐ ┌───▼───┐ ┌───▼───┐
│ API 1 │ │ API 2 │ │ API 3 │
└───────┘ └───────┘ └───────┘
Load Balancing Algorithms
- Round Robin
- Least Connections
- IP Hash
- Weighted Round Robin
Database Scaling
Read Replicas
Master (Write) ──┐
│
├──→ Replica 1 (Read)
├──→ Replica 2 (Read)
└──→ Replica 3 (Read)
Sharding
Partition data across multiple databases:
- User sharding
- Post sharding
- Geographic sharding
Caching Strategy
Multi-Level Caching
1. Browser Cache
↓
2. CDN Cache
↓
3. Application Cache (Redis)
↓
4. Database
Cache Invalidation
- Time-based expiration
- Event-based invalidation
- Manual invalidation
Worker Scaling
Auto-Scaling Workers
Based on queue length:
- High queue → Scale up
- Low queue → Scale down
Worker Types
- Stateless workers (easy to scale)
- Stateful workers (require coordination)
Performance Optimization
Database Optimization
- Indexes
- Query optimization
- Connection pooling
Application Optimization
- Code optimization
- Caching
- Async processing
Monitoring Scaling
Metrics to Monitor
- Request rate
- Response time
- Error rate
- Resource usage
- Queue length
Auto-Scaling Rules
minReplicas: 2
maxReplicas: 10
targetCPUUtilization: 70
targetMemoryUtilization: 80