Atlassian mock interview System Design Prompt 4

🧩 System Design Prompt #4: Design a Task Automation Rule Engine


🎯 Prompt:

Design a rule engine that allows users to automate actions based on events in a task/project management system (like Jira Automation or Trello Butler).

Users can define rules like:

  • “When a ticket is moved to Done, then notify the reporter.”
  • “If a task is overdue, assign it to a backup user.”
  • “When a comment is added, tag the assignee.”

✅ Core Requirements

👩‍💼 Functional:

  • Users can create rules via a simple UI (trigger → condition → action).
  • Rules must run reliably in response to system events.
  • Should support custom conditions, delays, and retry logic.

🚀 Non-functional:

  • Scalable to millions of events per day
  • Rules should execute asynchronously but with low latency
  • Fault-tolerant and idempotent
  • Easy to extend with new triggers/actions

🧠 What You Should Cover


1. Rule Model / DSL

{
  "trigger": "ticket_status_changed",
  "condition": "status == 'Done'",
  "action": "notify_reporter"
}

Rules can be stored in JSON or a custom DSL. Eventually this can power both UI-driven rule builders and advanced scripting (like Jira’s automation rules).


2. High-Level Architecture

🔁 Event Bus

  • All domain services (ticket, comment, assignment) emit events (Kafka or Pub/Sub).
  • Event format: {event_type, entity_id, user_id, payload}

⚙️ Rule Engine

  • Subscribes to event streams.
  • For each event, checks if any user-defined rules are triggered.
  • If yes, evaluates conditions and executes actions.

🧩 Action Executor

  • Executes action functions (send notification, assign user, add label).
  • Supports retries, delay queues, and exponential backoff.

3. Key Components

ComponentRole
Rule ServiceCRUD APIs for managing rules
Event ListenerKafka consumer or webhook handler
Rule EvaluatorParses trigger/condition, runs it safely
Action DispatcherExecutes or schedules the action
Audit LoggerStores logs of rule execution per user/task

4. Scalability & Reliability

  • Kafka partitions per event type for parallelism
  • Redis/Memcached for rule caches to reduce DB lookups
  • Dead Letter Queue (DLQ) for failed actions
  • Use Lambda-like workers or containerized microservices to process each rule evaluation
  • Store idempotency tokens to prevent duplicated executions

5. Security & Isolation

  • Rules are scoped per user or project.
  • Sandboxing: Use a safe, restricted DSL or expression evaluator (e.g., jexl, cel) — no raw JS.
  • Rate limit rules per user/org to prevent abuse.

6. Example Use Cases

EventConditionAction
ticket_status_changedstatus == "Done"send_email(to=reporter)
due_date_passedstatus != "Done"assign(user_id=backup_user)
comment_addeduser != assigneemention(assignee)

7. Extensibility Plan

  • Add scheduling: “run this every Monday”
  • Add workflow chaining: one action triggers another rule
  • Add approval gates or human-in-the-loop steps

🧠 Interviewer Follow-Up Questions (for mock discussion):

  1. How would you support complex conditions like “if a ticket has more than 3 comments”?
  2. How would you ensure actions don’t fire multiple times from the same event?
  3. How would you debug or audit why a rule didn’t fire?
  4. How would you allow users to test rules safely before enabling them in production?

🎤 1. How would you support complex conditions like “if a ticket has more than 3 comments”?


Answer:

We need the rule evaluator to support state-aware conditions, not just event-based data.

🧠 Implementation Options:

Option 1: Enrich the event payload

  • When emitting the event (e.g., comment_added), include comment_count in the payload.
  • This lets the rule engine evaluate the condition without additional lookups.

Option 2: Query live data on condition evaluation

  • Allow the rule engine to call out to the Ticket Service to fetch current metadata (e.g., comment count).
  • Add internal caching to reduce DB load.

Option 3: Pre-computed fields

  • Maintain a denormalized ticket_stats table:
    • ticket_id, comment_count, attachment_count, etc.
    • Updated asynchronously via CDC or event consumers.

In production, you’d likely combine Options 1 and 3 to balance performance and accuracy.


🎤 2. How would you ensure actions don’t fire multiple times from the same event?


Answer:

We’d use idempotency + deduplication strategies:

🔐 Idempotency Key:

  • Each event (e.g., ticket_status_changed) has a unique event ID.
  • When a rule is triggered, generate an idempotency key: scssCopyEdithash(rule_id + event_id)
  • Store processed keys in a short-lived cache (e.g., Redis).
  • If a duplicate arrives, skip it.

🔁 Retry Safety:

  • Ensure that action executions are idempotent (e.g., don’t send the same email twice).
  • For example, store a notification log: cssCopyEdit{rule_id, event_id, user_id, status: sent}

This guarantees at-least-once processing with no duplicates.


🎤 3. How would you debug or audit why a rule didn’t fire?


Answer:

We’d implement a Rule Execution Log + Debug Mode.

🔍 Execution Logs:

For each rule, log:

  • Rule ID
  • Event ID and type
  • Condition result (true/false)
  • Action status (executed/skipped/failed)
  • Timestamps

Stored in something like:

rule_audit_log (
  rule_id,
  event_id,
  trigger_time,
  condition_result BOOLEAN,
  action_status ENUM('executed', 'skipped', 'failed'),
  error_message TEXT
)

🧪 Debug/Test Mode:

  • Allow users to test rules against sample events.
  • Show output: “Condition matched ✅ / Action executed ✅”

Optional:

  • UI view with rule run history and filters by status/error

This greatly improves observability and user trust.


🎤 4. How would you allow users to test rules safely before enabling them in production?


Answer:

We can offer a “Dry Run” mode and Sandbox Testing.

👷 Dry Run:

  • Evaluate the rule (trigger + condition) on live events.
  • Log what would happen without executing the action.
  • Example: “This rule would have run 8 times in the last week.”

🧪 Test with Sample Events:

  • Provide a UI where users can:
    • Pick an event type
    • Input a sample payload
    • Evaluate the rule (and view output)

🧱 Shadow Execution:

  • Internally enable the rule in shadow mode:
    • It runs but doesn’t trigger actions
    • Results are visible in audit logs only

✅ Benefits:

  • Helps prevent bad rules from spamming users or causing workflow chaos
  • Encourages confident rule creation
Java Sleep