System Architecture Deep Dive

This document provides an accurate, detailed view of key subsystems in Bellamy Book: media processing, user behavior tracking and recommendations, web/Expo push, search, chat and video calls, and how the system scales. For the high-level picture, see Understanding the Architecture.

1. Media Processing System

Overview

Media processing is asynchronous and decoupled from the API. The API accepts uploads, stores originals in a temp bucket, and creates a MediaFile record with status Pending. A dedicated MediaProcessingWorker (polling PostgreSQL for pending files, not Kafka) processes them and writes results to MinIO and the database.

Architecture Highlights

Aspect	Implementation
Trigger	Worker polls PostgreSQL for `MediaFile` records with `Status = Pending` (no Kafka for media).
Storage	MinIO (S3-compatible): `temp` bucket for originals, `public` bucket for processed assets. Paths vary by type (e.g. `posts/{size}/{id}`, `chat/{size}/{id}`, avatars).
Image pipeline	ImageSharp: resize with ResizeMode.Max (aspect ratio preserved). Multiple sizes: XS (32px), SM (64px), MD (128px), LG (256px), XL (512px), 2XL (1024px). Output formats: WebP, AVIF, JPEG for broad compatibility.
Video pipeline	FFMpegCore: transcode to H.264, generate HLS/DASH for adaptive streaming, multiple resolutions, thumbnails at key frames. Aspect ratio preserved (vertical/landscape).
Unified processor	MediaProcessor routes by `MediaType`: Avatar, Cover, Post, Story, Comment, Chat, Blog (featured/content/file), Ticket. Each type has defined sizes and storage paths.
Post-processing	On completion: update MediaFile (processed URLs, status `Completed`, `ProcessedAt`). For avatars, worker updates Neo4j `UserNode` with new avatar URL. Redis caches processed URLs for fast reads.
Notifications	Avatar/cover processing completion triggers a notification to the user (e.g. via RabbitMQ/MassTransit and real-time delivery).

Strengths

Single pipeline for all media types (post, story, comment, chat, avatar, cover, blog) with type-specific rules.
Aspect-ratio preservation everywhere (no stretching or arbitrary cropping).
Web-optimized images (WebP/AVIF + JPEG fallback) and adaptive video (HLS/DASH) for good performance and UX.
Decoupled worker: API returns quickly; heavy work runs in a separate process, so upload API stays responsive.
Observability: Health checks, metrics, and TempFileCleanupJob (Quartz) to remove old temp files.

Data Flow (Simplified)

API: Upload → MinIO (temp) + DB (MediaFile, Pending)
       ↓
MediaProcessingWorker: Poll DB → GetPendingFilesAsync
       ↓
Download from MinIO (temp) → Process (Image/Video) → Upload to MinIO (public)
       ↓
Update MediaFile (URLs, Completed), Neo4j (avatar), Redis (cache), optional notification

2. User Behavior Tracking and Post Recommendations

Overview

The system uses event-driven architecture to track user behavior and power personalized feeds (posts and stories). The API publishes domain events that are translated into Kafka messages. Four workers consume these events and maintain interaction summaries, trending content, personalized content scores, and social graph data.

Event Flow

Topic	When published	Main consumers
user_interacted	View, like, love, comment, share, reaction removal	InteractionWorker, TrendingWorker, ScoringWorker, GraphWorker
content_created	New post, story, media	TrendingWorker, ScoringWorker
content_deleted	Delete, archive, expiry	InteractionWorker, TrendingWorker, ScoringWorker

Events are produced by Kafka event handlers in the API (e.g. after MediatR domain events). Workers do not block the API; they run in separate processes.

Tracking Components

InteractionWorker
- Consumes user_interacted, content_deleted.
- Writes PostgreSQL (InteractionSummaries: like_count, view_count, comment_count, etc.).
- Writes Redis (counts:story:{id}, counts:post:{id}) for very fast reads.
- Stories: view_count = unique viewers (one view per user). Posts: view_count = total views (each view counted).
TrendingWorker
- Consumes user_interacted, content_created, content_deleted.
- Computes engagement velocity (e.g. interactions per hour) and global_trend_factor.
- Writes PostgreSQL (GlobalTrends). Used to inject trending posts/stories into feeds.
ScoringWorker
- Consumes same topics; uses Neo4j for relationship weights.
- Computes per-user, per-content scores and writes PostgreSQL (ContentScores).
- Score formula (conceptually):
  Final = w1×Recency + w2×Interaction + w3×Relationship + w4×Popularity
- Weights differ for stories vs posts (e.g. recency higher for stories).
GraphWorker
- Consumes user_interacted.
- Updates Neo4j INTERACTED_WITH relationship weights (view +0.05, like +0.10, comment +0.20, share +0.30, etc.).
- Powers friend suggestions (“People You May Know”) and relationship-based scoring.

How the Feed Uses This

Post feed (e.g. GetFeedPostsQuery): Fetches top scored posts for the user from ContentScores, injects trending posts from GlobalTrends, and merges with friends/following posts so connected users always appear. Result is ordered by score and recency.
Story feed (recommended endpoint): Fetches stories from friends + recommended list, looks up scores in ContentScores, and returns stories ordered by recommendation score (with optional score details for debugging).

Strengths

Real-time counts via Redis; personalized ranking via ContentScores and Neo4j.
Clear separation: API only publishes events; workers own aggregation and scoring.
Stories vs posts semantics (unique viewers vs total views) match product expectations.

3. Web Push and Expo Push System

Overview

Web Push (browser) and Expo Push (mobile) deliver chat messages and incoming call notifications when the user’s tab or app is in the background. Both use the same backend pipeline: the API (and ChatWorker) publish events to RabbitMQ; dedicated workers consume and send platform-specific push.

Architecture

Channel	Worker	Target	Subscription storage
Web Push	WebPushNotificationWorker	Browsers (PWA/web)	MongoDB: push subscriptions (VAPID)
Expo Push	ExpoPushNotificationWorker	Expo / React Native app	MongoDB: Expo push tokens

API (and ChatWorker when a message is sent or a call is initiated) publishes to RabbitMQ (e.g. ChatMessage, IncomingCallPushMessage).
WebPushNotificationWorker and ExpoPushNotificationWorker consume these messages and send push to the recipient’s registered endpoints/tokens.

Flow

Registration
- Web: User enables notifications in Settings; frontend uses Web Push API, gets subscription, sends it to POST /api/push-subscriptions.
- Mobile: App gets Expo push token and registers it with the API (e.g. POST /api/push-subscriptions/expo or equivalent).
Delivery
- User sends a chat message or starts a call → API/ChatWorker publishes to RabbitMQ → WebPush or Expo worker consumes → sends push to recipient’s device(s).

What Must Be Running

RabbitMQ (message broker).
API (publishes on message/call).
MongoDB (subscriptions and tokens).
WebPushNotificationWorker and/or ExpoPushNotificationWorker (depending on which clients you support).

Strengths

Single event pipeline for both channels; one place to add new push triggers.
Platform-appropriate: Web Push for browsers, Expo for iOS/Android without managing FCM/APNs directly.

4. Search System

Overview

Full-text search over users, posts, blogs, and hashtags is provided by Elasticsearch (or OpenSearch). The API queries Elasticsearch via the Search API; data is kept in sync by ElasticsearchSyncWorker and Quartz jobs (e.g. catch-up sync).

Components

ElasticsearchSyncWorker: Consumes events (e.g. from Kafka) and/or runs sync jobs to index/update documents in Elasticsearch (posts, blogs, users, hashtags).
SearchService (backend): Builds Elasticsearch queries (multi-match, filters, highlighting). Supports universal search, type-specific (posts, users, blogs, hashtags), autocomplete, trending, recommendations, history, and related content.
CachedSearchService: Optional caching layer (e.g. Redis) for hot queries.
Blocked users are excluded from search results where applicable.

Features

Fuzziness and phrase prefix for “search-as-you-type” and typo tolerance.
Boosting (e.g. caption, username) for relevance.
Filters: post type, visibility, user, hashtags, date range.
Search history and record endpoint for analytics and recommendations.

Strengths

Dedicated search engine (Elasticsearch) for complex queries and scaling.
Unified API (/api/Search, autocomplete, recommendations, trending) with consistent auth and blocked-user handling.

5. Chat and Video Call System

Overview

Chat: Real-time messaging via SignalR (ChatWorker hub at /hubs/chat), with REST fallback. Messages and conversations are stored in MongoDB.
Calls: Two options — LiveKit (room-based WebRTC) and in-app WebRTC (signaling over SignalR). Incoming calls are signaled via SignalR and can trigger Web Push or Expo Push when the app is in the background.

Chat Highlights

SignalR for real-time send/receive, typing indicators, presence (e.g. /hubs/chat-online).
REST for history, conversations, uploads, read receipts.
Media in chat: upload via API; can go through media pipeline (e.g. chat images in chat/{size}/{id}).
Group chats: create, add/remove participants, roles.
Push: When recipient is not connected, RabbitMQ → WebPush/Expo workers send notifications.

Video Call Highlights

LiveKit: Backend issues tokens via api/Call (e.g. POST /api/Call/token, POST /api/Call/initiate). Frontend joins LiveKit room for audio/video. Suited for multi-party and managed infrastructure.
WebRTC over SignalR: Peer-to-peer offer/answer and ICE candidates exchanged via SignalR (SendWebRtcOffer, SendWebRtcAnswer, SendIceCandidate). Good for 1:1 without extra service; uses public STUN (e.g. Google).
Incoming call flow: Initiate → backend notifies callee via SignalR (and optionally RabbitMQ for push) → callee accepts/rejects → token/room and media flow.

Pain Points / Considerations

WebRTC over SignalR: Relies on STUN only; NAT/firewall may require TURN for some networks. Adding TURN would improve connectivity.
Dual call paths (LiveKit vs in-app WebRTC): Product and ops must decide when to use which; consistency in UX and monitoring helps.
Scaling SignalR: ChatWorker is stateful (connections). Horizontal scaling requires a backplane (e.g. Redis) so all API/worker instances see the same connection map.
Mobile: Expo push is needed for reliable call alerts when app is backgrounded; WebView-only push is not reliable.

6. How the Whole System Can Scale

API

Stateless HTTP and SignalR (with backplane). Scale horizontally behind a load balancer.
Rate limiting, caching (Redis), and async processing (Kafka/RabbitMQ) keep the API from becoming the bottleneck.

Workers

Kafka consumers: InteractionWorker, TrendingWorker, ScoringWorker, GraphWorker, HashtagWorker, ElasticsearchSyncWorker, MediaProcessingWorker, WebSocketWorker. Scale by adding instances of each worker type; Kafka partitioning and consumer groups handle distribution.
RabbitMQ consumers: WebPushNotificationWorker, ExpoPushNotificationWorker, BlogAutoGenerationWorker. Scale by adding consumer instances (and queues as needed).
ChatWorker: SignalR hub; scale with Redis backplane so multiple instances share connection state.

Data Stores

PostgreSQL: Read replicas for feed and read-heavy endpoints; connection pooling; tune indexes and queries.
MongoDB: Replica set; shard collections (e.g. chat, notifications) if needed.
Redis: Cluster or sentinel for cache and session; used by API, workers, and (if used) SignalR backplane.
Neo4j: Causal cluster or read replicas for graph and scoring.
Elasticsearch: Cluster with multiple nodes; index sharding and replication.
MinIO: Distributed mode for object storage and high throughput.

Message Brokers

Kafka: Add partitions and broker nodes; increase replication for durability.
RabbitMQ: Cluster and mirror queues for availability; scale consumers per queue.

Media and Static Assets

MediaProcessingWorker: Scale out worker instances; ensure MinIO and DB can handle the load.
CDN: Put in front of MinIO (or S3) for public media URLs to reduce origin load and latency.

Caching and CDN

Redis: Multi-level caching (counts, feed, sessions).
CDN: For frontend/admin static assets and, where applicable, public media.

Summary Table

Component	Scale strategy
API	Horizontal (stateless + load balancer)
SignalR (ChatWorker)	Horizontal + Redis backplane
Kafka workers	Add instances per consumer group
RabbitMQ workers	Add consumer instances
PostgreSQL	Read replicas, pooling, indexing
MongoDB	Replica set, sharding if needed
Redis	Cluster/sentinel
Neo4j	Causal cluster / read replicas
Elasticsearch	Multi-node cluster
MinIO	Distributed mode
Media processing	More MediaProcessingWorker instances
Push delivery	More WebPush/Expo worker instances

Understanding the Architecture — High-level overview.
System Overview — Component diagram and patterns.
Workers — Per-worker responsibilities.
Scaling — Load balancing and scaling strategies.
Media — Media API and usage.
Push Notifications — Web Push and Expo setup.
Search — Search API and behavior.
Messaging — Chat and real-time.
Calls — Voice and video calls.

1. Media Processing System​

Overview​

Architecture Highlights​

Strengths​

Data Flow (Simplified)​

2. User Behavior Tracking and Post Recommendations​

Overview​

Event Flow​

Tracking Components​

How the Feed Uses This​

Strengths​

3. Web Push and Expo Push System​

Overview​

Architecture​

Flow​

What Must Be Running​

Strengths​

4. Search System​

Overview​

Components​

Features​

Strengths​

5. Chat and Video Call System​

Overview​

Chat Highlights​

Video Call Highlights​

Pain Points / Considerations​

6. How the Whole System Can Scale​

API​

Workers​

Data Stores​

Message Brokers​

Media and Static Assets​

Caching and CDN​

Summary Table​

Related Documentation​

1. Media Processing System

Overview

Architecture Highlights

Strengths

Data Flow (Simplified)

2. User Behavior Tracking and Post Recommendations

Overview

Event Flow

Tracking Components

How the Feed Uses This

Strengths

3. Web Push and Expo Push System

Overview

Architecture

Flow

What Must Be Running

Strengths

4. Search System

Overview

Components

Features

Strengths

5. Chat and Video Call System

Overview

Chat Highlights

Video Call Highlights

Pain Points / Considerations

6. How the Whole System Can Scale

API

Workers

Data Stores

Message Brokers

Media and Static Assets

Caching and CDN

Summary Table

Related Documentation