System Architecture Deep Dive
This document provides an accurate, detailed view of key subsystems in Bellamy Book: media processing, user behavior tracking and recommendations, web/Expo push, search, chat and video calls, and how the system scales. For the high-level picture, see Understanding the Architecture.
1. Media Processing System
Overview
Media processing is asynchronous and decoupled from the API. The API accepts uploads, stores originals in a temp bucket, and creates a MediaFile record with status Pending. A dedicated MediaProcessingWorker (polling PostgreSQL for pending files, not Kafka) processes them and writes results to MinIO and the database.
Architecture Highlights
| Aspect | Implementation |
|---|---|
| Trigger | Worker polls PostgreSQL for MediaFile records with Status = Pending (no Kafka for media). |
| Storage | MinIO (S3-compatible): temp bucket for originals, public bucket for processed assets. Paths vary by type (e.g. posts/{size}/{id}, chat/{size}/{id}, avatars). |
| Image pipeline | ImageSharp: resize with ResizeMode.Max (aspect ratio preserved). Multiple sizes: XS (32px), SM (64px), MD (128px), LG (256px), XL (512px), 2XL (1024px). Output formats: WebP, AVIF, JPEG for broad compatibility. |
| Video pipeline | FFMpegCore: transcode to H.264, generate HLS/DASH for adaptive streaming, multiple resolutions, thumbnails at key frames. Aspect ratio preserved (vertical/landscape). |
| Unified processor | MediaProcessor routes by MediaType: Avatar, Cover, Post, Story, Comment, Chat, Blog (featured/content/file), Ticket. Each type has defined sizes and storage paths. |
| Post-processing | On completion: update MediaFile (processed URLs, status Completed, ProcessedAt). For avatars, worker updates Neo4j UserNode with new avatar URL. Redis caches processed URLs for fast reads. |
| Notifications | Avatar/cover processing completion triggers a notification to the user (e.g. via RabbitMQ/MassTransit and real-time delivery). |
Strengths
- Single pipeline for all media types (post, story, comment, chat, avatar, cover, blog) with type-specific rules.
- Aspect-ratio preservation everywhere (no stretching or arbitrary cropping).
- Web-optimized images (WebP/AVIF + JPEG fallback) and adaptive video (HLS/DASH) for good performance and UX.
- Decoupled worker: API returns quickly; heavy work runs in a separate process, so upload API stays responsive.
- Observability: Health checks, metrics, and TempFileCleanupJob (Quartz) to remove old temp files.
Data Flow (Simplified)
API: Upload → MinIO (temp) + DB (MediaFile, Pending)
↓
MediaProcessingWorker: Poll DB → GetPendingFilesAsync
↓
Download from MinIO (temp) → Process (Image/Video) → Upload to MinIO (public)
↓
Update MediaFile (URLs, Completed), Neo4j (avatar), Redis (cache), optional notification
2. User Behavior Tracking and Post Recommendations
Overview
The system uses event-driven architecture to track user behavior and power personalized feeds (posts and stories). The API publishes domain events that are translated into Kafka messages. Four workers consume these events and maintain interaction summaries, trending content, personalized content scores, and social graph data.
Event Flow
| Topic | When published | Main consumers |
|---|---|---|
| user_interacted | View, like, love, comment, share, reaction removal | InteractionWorker, TrendingWorker, ScoringWorker, GraphWorker |
| content_created | New post, story, media | TrendingWorker, ScoringWorker |
| content_deleted | Delete, archive, expiry | InteractionWorker, TrendingWorker, ScoringWorker |
Events are produced by Kafka event handlers in the API (e.g. after MediatR domain events). Workers do not block the API; they run in separate processes.
Tracking Components
-
InteractionWorker
- Consumes
user_interacted,content_deleted. - Writes PostgreSQL (
InteractionSummaries: like_count, view_count, comment_count, etc.). - Writes Redis (
counts:story:{id},counts:post:{id}) for very fast reads. - Stories: view_count = unique viewers (one view per user). Posts: view_count = total views (each view counted).
- Consumes
-
TrendingWorker
- Consumes
user_interacted,content_created,content_deleted. - Computes engagement velocity (e.g. interactions per hour) and global_trend_factor.
- Writes PostgreSQL (
GlobalTrends). Used to inject trending posts/stories into feeds.
- Consumes
-
ScoringWorker
- Consumes same topics; uses Neo4j for relationship weights.
- Computes per-user, per-content scores and writes PostgreSQL (
ContentScores). - Score formula (conceptually):
Final = w1×Recency + w2×Interaction + w3×Relationship + w4×Popularity - Weights differ for stories vs posts (e.g. recency higher for stories).
-
GraphWorker
- Consumes
user_interacted. - Updates Neo4j
INTERACTED_WITHrelationship weights (view +0.05, like +0.10, comment +0.20, share +0.30, etc.). - Powers friend suggestions (“People You May Know”) and relationship-based scoring.
- Consumes
How the Feed Uses This
- Post feed (e.g.
GetFeedPostsQuery): Fetches top scored posts for the user fromContentScores, injects trending posts fromGlobalTrends, and merges with friends/following posts so connected users always appear. Result is ordered by score and recency. - Story feed (recommended endpoint): Fetches stories from friends + recommended list, looks up scores in
ContentScores, and returns stories ordered by recommendation score (with optional score details for debugging).
Strengths
- Real-time counts via Redis; personalized ranking via ContentScores and Neo4j.
- Clear separation: API only publishes events; workers own aggregation and scoring.
- Stories vs posts semantics (unique viewers vs total views) match product expectations.
3. Web Push and Expo Push System
Overview
Web Push (browser) and Expo Push (mobile) deliver chat messages and incoming call notifications when the user’s tab or app is in the background. Both use the same backend pipeline: the API (and ChatWorker) publish events to RabbitMQ; dedicated workers consume and send platform-specific push.
Architecture
| Channel | Worker | Target | Subscription storage |
|---|---|---|---|
| Web Push | WebPushNotificationWorker | Browsers (PWA/web) | MongoDB: push subscriptions (VAPID) |
| Expo Push | ExpoPushNotificationWorker | Expo / React Native app | MongoDB: Expo push tokens |
- API (and ChatWorker when a message is sent or a call is initiated) publishes to RabbitMQ (e.g.
ChatMessage,IncomingCallPushMessage). - WebPushNotificationWorker and ExpoPushNotificationWorker consume these messages and send push to the recipient’s registered endpoints/tokens.
Flow
-
Registration
- Web: User enables notifications in Settings; frontend uses Web Push API, gets subscription, sends it to
POST /api/push-subscriptions. - Mobile: App gets Expo push token and registers it with the API (e.g.
POST /api/push-subscriptions/expoor equivalent).
- Web: User enables notifications in Settings; frontend uses Web Push API, gets subscription, sends it to
-
Delivery
- User sends a chat message or starts a call → API/ChatWorker publishes to RabbitMQ → WebPush or Expo worker consumes → sends push to recipient’s device(s).
What Must Be Running
- RabbitMQ (message broker).
- API (publishes on message/call).
- MongoDB (subscriptions and tokens).
- WebPushNotificationWorker and/or ExpoPushNotificationWorker (depending on which clients you support).
Strengths
- Single event pipeline for both channels; one place to add new push triggers.
- Platform-appropriate: Web Push for browsers, Expo for iOS/Android without managing FCM/APNs directly.
4. Search System
Overview
Full-text search over users, posts, blogs, and hashtags is provided by Elasticsearch (or OpenSearch). The API queries Elasticsearch via the Search API; data is kept in sync by ElasticsearchSyncWorker and Quartz jobs (e.g. catch-up sync).
Components
- ElasticsearchSyncWorker: Consumes events (e.g. from Kafka) and/or runs sync jobs to index/update documents in Elasticsearch (posts, blogs, users, hashtags).
- SearchService (backend): Builds Elasticsearch queries (multi-match, filters, highlighting). Supports universal search, type-specific (posts, users, blogs, hashtags), autocomplete, trending, recommendations, history, and related content.
- CachedSearchService: Optional caching layer (e.g. Redis) for hot queries.
- Blocked users are excluded from search results where applicable.
Features
- Fuzziness and phrase prefix for “search-as-you-type” and typo tolerance.
- Boosting (e.g. caption, username) for relevance.
- Filters: post type, visibility, user, hashtags, date range.
- Search history and record endpoint for analytics and recommendations.
Strengths
- Dedicated search engine (Elasticsearch) for complex queries and scaling.
- Unified API (
/api/Search, autocomplete, recommendations, trending) with consistent auth and blocked-user handling.
5. Chat and Video Call System
Overview
- Chat: Real-time messaging via SignalR (ChatWorker hub at
/hubs/chat), with REST fallback. Messages and conversations are stored in MongoDB. - Calls: Two options — LiveKit (room-based WebRTC) and in-app WebRTC (signaling over SignalR). Incoming calls are signaled via SignalR and can trigger Web Push or Expo Push when the app is in the background.
Chat Highlights
- SignalR for real-time send/receive, typing indicators, presence (e.g.
/hubs/chat-online). - REST for history, conversations, uploads, read receipts.
- Media in chat: upload via API; can go through media pipeline (e.g. chat images in
chat/{size}/{id}). - Group chats: create, add/remove participants, roles.
- Push: When recipient is not connected, RabbitMQ → WebPush/Expo workers send notifications.
Video Call Highlights
- LiveKit: Backend issues tokens via
api/Call(e.g.POST /api/Call/token,POST /api/Call/initiate). Frontend joins LiveKit room for audio/video. Suited for multi-party and managed infrastructure. - WebRTC over SignalR: Peer-to-peer offer/answer and ICE candidates exchanged via SignalR (
SendWebRtcOffer,SendWebRtcAnswer,SendIceCandidate). Good for 1:1 without extra service; uses public STUN (e.g. Google). - Incoming call flow: Initiate → backend notifies callee via SignalR (and optionally RabbitMQ for push) → callee accepts/rejects → token/room and media flow.
Pain Points / Considerations
- WebRTC over SignalR: Relies on STUN only; NAT/firewall may require TURN for some networks. Adding TURN would improve connectivity.
- Dual call paths (LiveKit vs in-app WebRTC): Product and ops must decide when to use which; consistency in UX and monitoring helps.
- Scaling SignalR: ChatWorker is stateful (connections). Horizontal scaling requires a backplane (e.g. Redis) so all API/worker instances see the same connection map.
- Mobile: Expo push is needed for reliable call alerts when app is backgrounded; WebView-only push is not reliable.
6. How the Whole System Can Scale
API
- Stateless HTTP and SignalR (with backplane). Scale horizontally behind a load balancer.
- Rate limiting, caching (Redis), and async processing (Kafka/RabbitMQ) keep the API from becoming the bottleneck.
Workers
- Kafka consumers: InteractionWorker, TrendingWorker, ScoringWorker, GraphWorker, HashtagWorker, ElasticsearchSyncWorker, MediaProcessingWorker, WebSocketWorker. Scale by adding instances of each worker type; Kafka partitioning and consumer groups handle distribution.
- RabbitMQ consumers: WebPushNotificationWorker, ExpoPushNotificationWorker, BlogAutoGenerationWorker. Scale by adding consumer instances (and queues as needed).
- ChatWorker: SignalR hub; scale with Redis backplane so multiple instances share connection state.
Data Stores
- PostgreSQL: Read replicas for feed and read-heavy endpoints; connection pooling; tune indexes and queries.
- MongoDB: Replica set; shard collections (e.g. chat, notifications) if needed.
- Redis: Cluster or sentinel for cache and session; used by API, workers, and (if used) SignalR backplane.
- Neo4j: Causal cluster or read replicas for graph and scoring.
- Elasticsearch: Cluster with multiple nodes; index sharding and replication.
- MinIO: Distributed mode for object storage and high throughput.
Message Brokers
- Kafka: Add partitions and broker nodes; increase replication for durability.
- RabbitMQ: Cluster and mirror queues for availability; scale consumers per queue.
Media and Static Assets
- MediaProcessingWorker: Scale out worker instances; ensure MinIO and DB can handle the load.
- CDN: Put in front of MinIO (or S3) for public media URLs to reduce origin load and latency.
Caching and CDN
- Redis: Multi-level caching (counts, feed, sessions).
- CDN: For frontend/admin static assets and, where applicable, public media.
Summary Table
| Component | Scale strategy |
|---|---|
| API | Horizontal (stateless + load balancer) |
| SignalR (ChatWorker) | Horizontal + Redis backplane |
| Kafka workers | Add instances per consumer group |
| RabbitMQ workers | Add consumer instances |
| PostgreSQL | Read replicas, pooling, indexing |
| MongoDB | Replica set, sharding if needed |
| Redis | Cluster/sentinel |
| Neo4j | Causal cluster / read replicas |
| Elasticsearch | Multi-node cluster |
| MinIO | Distributed mode |
| Media processing | More MediaProcessingWorker instances |
| Push delivery | More WebPush/Expo worker instances |
Related Documentation
- Understanding the Architecture — High-level overview.
- System Overview — Component diagram and patterns.
- Workers — Per-worker responsibilities.
- Scaling — Load balancing and scaling strategies.
- Media — Media API and usage.
- Push Notifications — Web Push and Expo setup.
- Search — Search API and behavior.
- Messaging — Chat and real-time.
- Calls — Voice and video calls.