
🧩 System Design Prompt #4: Design a Task Automation Rule Engine
🎯 Prompt:
Design a rule engine that allows users to automate actions based on events in a task/project management system (like Jira Automation or Trello Butler).
Users can define rules like:
- “When a ticket is moved to Done, then notify the reporter.”
- “If a task is overdue, assign it to a backup user.”
- “When a comment is added, tag the assignee.”
✅ Core Requirements
👩💼 Functional:
- Users can create rules via a simple UI (trigger → condition → action).
- Rules must run reliably in response to system events.
- Should support custom conditions, delays, and retry logic.
🚀 Non-functional:
- Scalable to millions of events per day
- Rules should execute asynchronously but with low latency
- Fault-tolerant and idempotent
- Easy to extend with new triggers/actions
🧠 What You Should Cover
1. Rule Model / DSL
{
"trigger": "ticket_status_changed",
"condition": "status == 'Done'",
"action": "notify_reporter"
}
Rules can be stored in JSON or a custom DSL. Eventually this can power both UI-driven rule builders and advanced scripting (like Jira’s automation rules).
2. High-Level Architecture
🔁 Event Bus
- All domain services (ticket, comment, assignment) emit events (Kafka or Pub/Sub).
- Event format:
{event_type, entity_id, user_id, payload}
⚙️ Rule Engine
- Subscribes to event streams.
- For each event, checks if any user-defined rules are triggered.
- If yes, evaluates conditions and executes actions.
🧩 Action Executor
- Executes action functions (send notification, assign user, add label).
- Supports retries, delay queues, and exponential backoff.
3. Key Components
Component | Role |
---|---|
Rule Service | CRUD APIs for managing rules |
Event Listener | Kafka consumer or webhook handler |
Rule Evaluator | Parses trigger/condition, runs it safely |
Action Dispatcher | Executes or schedules the action |
Audit Logger | Stores logs of rule execution per user/task |
4. Scalability & Reliability
- Kafka partitions per event type for parallelism
- Redis/Memcached for rule caches to reduce DB lookups
- Dead Letter Queue (DLQ) for failed actions
- Use Lambda-like workers or containerized microservices to process each rule evaluation
- Store idempotency tokens to prevent duplicated executions
5. Security & Isolation
- Rules are scoped per user or project.
- Sandboxing: Use a safe, restricted DSL or expression evaluator (e.g.,
jexl
,cel
) — no raw JS. - Rate limit rules per user/org to prevent abuse.
6. Example Use Cases
Event | Condition | Action |
---|---|---|
ticket_status_changed | status == "Done" | send_email(to=reporter) |
due_date_passed | status != "Done" | assign(user_id=backup_user) |
comment_added | user != assignee | mention(assignee) |
7. Extensibility Plan
- Add scheduling: “run this every Monday”
- Add workflow chaining: one action triggers another rule
- Add approval gates or human-in-the-loop steps
🧠 Interviewer Follow-Up Questions (for mock discussion):
- How would you support complex conditions like “if a ticket has more than 3 comments”?
- How would you ensure actions don’t fire multiple times from the same event?
- How would you debug or audit why a rule didn’t fire?
- How would you allow users to test rules safely before enabling them in production?
🎤 1. How would you support complex conditions like “if a ticket has more than 3 comments”?
✅ Answer:
We need the rule evaluator to support state-aware conditions, not just event-based data.
🧠 Implementation Options:
Option 1: Enrich the event payload
- When emitting the event (e.g.,
comment_added
), includecomment_count
in the payload. - This lets the rule engine evaluate the condition without additional lookups.
Option 2: Query live data on condition evaluation
- Allow the rule engine to call out to the Ticket Service to fetch current metadata (e.g., comment count).
- Add internal caching to reduce DB load.
Option 3: Pre-computed fields
- Maintain a denormalized
ticket_stats
table:ticket_id
,comment_count
,attachment_count
, etc.- Updated asynchronously via CDC or event consumers.
In production, you’d likely combine Options 1 and 3 to balance performance and accuracy.
🎤 2. How would you ensure actions don’t fire multiple times from the same event?
✅ Answer:
We’d use idempotency + deduplication strategies:
🔐 Idempotency Key:
- Each event (e.g.,
ticket_status_changed
) has a unique event ID. - When a rule is triggered, generate an idempotency key: scssCopyEdit
hash(rule_id + event_id)
- Store processed keys in a short-lived cache (e.g., Redis).
- If a duplicate arrives, skip it.
🔁 Retry Safety:
- Ensure that action executions are idempotent (e.g., don’t send the same email twice).
- For example, store a notification log: cssCopyEdit
{rule_id, event_id, user_id, status: sent}
This guarantees at-least-once processing with no duplicates.
🎤 3. How would you debug or audit why a rule didn’t fire?
✅ Answer:
We’d implement a Rule Execution Log + Debug Mode.
🔍 Execution Logs:
For each rule, log:
- Rule ID
- Event ID and type
- Condition result (true/false)
- Action status (executed/skipped/failed)
- Timestamps
Stored in something like:
rule_audit_log (
rule_id,
event_id,
trigger_time,
condition_result BOOLEAN,
action_status ENUM('executed', 'skipped', 'failed'),
error_message TEXT
)
🧪 Debug/Test Mode:
- Allow users to test rules against sample events.
- Show output: “Condition matched ✅ / Action executed ✅”
Optional:
- UI view with rule run history and filters by status/error
This greatly improves observability and user trust.
🎤 4. How would you allow users to test rules safely before enabling them in production?
✅ Answer:
We can offer a “Dry Run” mode and Sandbox Testing.
👷 Dry Run:
- Evaluate the rule (trigger + condition) on live events.
- Log what would happen without executing the action.
- Example: “This rule would have run 8 times in the last week.”
🧪 Test with Sample Events:
- Provide a UI where users can:
- Pick an event type
- Input a sample payload
- Evaluate the rule (and view output)
🧱 Shadow Execution:
- Internally enable the rule in shadow mode:
- It runs but doesn’t trigger actions
- Results are visible in audit logs only
✅ Benefits:
- Helps prevent bad rules from spamming users or causing workflow chaos
- Encourages confident rule creation