
🧩 System Design Prompt #2: Design a Notification System
🎯 Prompt:
Design a notification system that alerts users when:
- They’re assigned a ticket
- A comment is added to a ticket they follow
- A ticket they’re watching changes status
The system should support:
- Multiple channels (in-app, email, Slack, etc.)
- High throughput (millions of notifications/day)
- Delivery guarantees (e.g., at-least-once)
- Retry on failure
- Preferences (e.g., opt out of email)
✅ Your goal:
Build an architecture that’s:
- Scalable: Should handle massive volume (Jira-scale)
- Reliable: No missed notifications
- Flexible: Easy to add new channels later
- Respectful of user preferences
🧠 Key Areas to Cover:
1. Clarify Requirements
Ask:
- Should notifications be real-time or batched?
- Do users receive a daily digest, or immediate updates?
- Is “read/unread” status needed for in-app?
2. Core Components
Split into 3 decoupled systems:
🔄 Trigger Service
- Listens to events (e.g., ticket assigned, status changed)
- Generates notification “intents” and pushes them to a queue
- Validates against user preferences (DB or cache)
📬 Notification Dispatcher
- Consumes from the queue
- Formats message
- Sends to one or more channels via adapters (email, push, Slack)
🔔 Channel Services
- Email: SendGrid, SES
- In-app: Store in DB, deliver via WebSocket
- Slack/Webhook: Use retries and fallbacks
3. Data Model
notifications (
notification_id UUID,
user_id UUID,
type ENUM(...),
payload JSONB,
channel ENUM('email', 'in_app', 'slack'),
status ENUM('pending', 'sent', 'failed'),
created_at, sent_at
)
user_preferences (
user_id UUID,
channel ENUM,
enabled BOOLEAN
)
4. Scalability
- Use Kafka for high-throughput event queues
- Partition by
user_id
to balance loads - Bulk email sending via 3rd-party provider
- Rate limit channels like Slack to avoid API bans
5. Reliability
- Ensure at-least-once delivery via:
- Kafka offset management
- Idempotent writes in dispatchers
- Implement DLQs (Dead Letter Queues) for failed notifications
- Add retries with exponential backoff
6. Extensibility
- Use a plugin model for adding new channels
- Payload normalization layer before formatting
- Add hooks for A/B testing or analytics
🧠 Advanced Discussion Prompts (Interviewer Follow-ups)
🎤 1. How would you guarantee in-order delivery for in-app notifications per user?
✅ Answer:
To guarantee per-user ordering of in-app notifications:
🧱 Option 1: Kafka Partitioning
- Partition the Kafka topic by
user_id
. - This ensures all events for a single user are processed in order by a single consumer.
🧠 Consumer Strategy:
- Have multiple consumers for parallelism.
- But for any given partition (i.e., user), there’s only one active consumer.
✅ For in-app only:
- Persist in a SQL table with a
created_at
orsequence_number
. - When displaying notifications, order by that field.
🚨 Limitations:
- If a user has multiple streams (e.g. multiple orgs), you might shard by
user_id + org_id
.
🔁 For other channels (email, Slack):
- Ordering isn’t guaranteed or needed — they’re fire-and-forget.
🎤 2. How would you support internationalization (i18n) of messages across different channels?
✅ Answer:
To support i18n across channels:
🔤 Translation Strategy:
- Store message templates in a template service or DB: jsonCopyEdit
{ "en": "You were assigned to ticket {ticket_id}", "es": "Se le asignó el ticket {ticket_id}", ... }
- Maintain user locale in the user profile (e.g.,
user.locale = 'es'
).
🏗️ Templating Engine:
- Use a centralized templating layer to:
- Pick the correct message template based on
locale
- Render with proper substitutions (ticket ID, comment text, etc.)
- Format correctly per channel (email vs in-app vs Slack)
- Pick the correct message template based on
🧪 Testing:
- Create snapshot tests for localized formats and encoding (e.g., RTL languages, emojis).
🎤 3. How do you avoid notifying the same user multiple times for the same event (e.g., comment spam)?
✅ Answer:
This is a common deduplication problem.
🧠 Options:
a. Event Deduplication Layer:
- Use a deduplication key: e.g.
event_type + user_id + ticket_id + timestamp
. - Store dedup keys in Redis with TTL (e.g. 10 min).
- Skip sending if dedup key already exists.
b. Aggregate Events:
- Instead of sending individual comment notifications:
- Buffer comment events for 2–5 minutes.
- Send a single notification like: “5 new comments on ticket #123”.
c. UI Throttling:
- If in-app, mark duplicate events as read-only updates (no new toast/pop-up).
🎤 4. What’s your data retention and cleanup policy for notification history?
✅ Answer:
Retention policies vary by channel and use case.
🧱 For In-App Notifications:
- Store in a SQL DB (e.g.,
notifications
table). - Retention options:
- Keep last 500 per user (ring buffer)
- Or retain for 90 days, then archive to cold storage (e.g. S3, Glacier)
🧹 Cleanup Methods:
- Run a daily background job to:
- Delete or archive expired rows
- Reindex large tables if needed
📥 For Email/Slack:
- No need to store full content long-term
- Just retain logs/metadata (
notification_id
,channel
,status
,timestamp
) for auditing
🔐 Compliance:
- Respect GDPR/CCPA if required — provide delete/export functionality per user.