System Design Patterns

This lens isolates failures of design. It focuses on the structural decisions—synchronicity, coupling, and state management—that determine whether a system creates leverage or technical debt.

The Architecture of Reliability

System Design Patterns Isometric Visualization showing a clean digital blueprint. — Fig 1. The Blueprint: Orchestrated vs Choreographed Architecture.

The most common failure in modern business automation is not a bug in the code, but a flaw in the architecture. Teams often transition from "SOP thinking" to "Automation thinking" without transitioning to "Systems thinking." They treat automation as a linear sequence of tasks—a script—rather than a living, distributed system.

In a simple script, if Step A fails, the script stop. In a distributed business system—where Zapier talks to Salesforce, which triggers a webhook to Stripe, which updates a database—failure is rarely linear. It is cascading, silent, and multiplicative.

Reliability is not a feature you add at the end of a build; it is the structural integrity of the build itself. This category defines the patterns required to move from brittle automations that break under pressure to resilient architectures that scale with the business.

The Orchestration vs. Choreography Debate

One of the first structural decisions an operator must make is how individual services will coordinate. This is the choice between Orchestration and Choreography.

Orchestration: The Conductor

In an Orchestrated model, a central manager (the "orchestrator") coordinates the interaction. The orchestrator is responsible for telling each service what to do and when to do it. Tools like Make (Integromat) or specialized workflow engines (Camunda, Temporal) are primary examples of orchestrators.

Pros: Centralized error handling, clear visibility into the "State" of a workflow, and easier modification of the sequence.

Cons: The orchestrator becomes a single point of failure and a potential performance bottleneck. It creates a "Hub and Spoke" dependency that can become rigid over time.

Choreography: The Event

In a Choreographed model, there is no central manager. Instead, each service "listeners" for events and acts independently. Service A finishes its task and publishes an event: "User Created." Service B hears that event and starts the "Welcome Email" process. Service C hears it and starts "CRM Sync."

Pros: Highly decoupled, scalable, and resilient. If Service B is down, Service C still functions perfectly.

Cons: Observability is difficult. Without a central map, answering "Where is this lead right now?" requires querying multiple independent systems. It is easy to lose track of global system state.

The Diagnostic Standard: Use Orchestration for high-complexity, multi-step business logic (like Order Fulfillment). Use Choreography for high-volume, low-dependency synchronization (like Data Mirroring).

The Saga Pattern: Managing Distributed Transactions

In traditional software, if you want to update two tables in a database, you use a "transaction." If one update fails, they both roll back. In automation, there is no "Universal Rollback" button for APIs. You cannot "un-send" an email or "un-charge" a credit card without complex logic.

The Saga Pattern solves this by breaking a long-running transaction into a series of local transactions. Each step in the Saga has a corresponding Compensating Transaction—a way to undo the work if a later step fails. This is a standard distributed transaction pattern used in microservices architecture.

The Structure of a Saga

Command: Charge Card (Stripe).
Success: Update Inventory (Shopify).
Failure: If Shopify is down, trigger the Compensating Transaction (Stripe Refund).

Without the Saga pattern, you end up with "Zombie States"—where a customer is charged but the order doesn't exist. Operators must design "Undo" logic for every destructive "Do" action in their system.

Idempotency: The Gold Standard of Reliability

The network is unreliable. Webhooks will fire twice. APIs will timeout after successfully processing a request. If your automation triggers twice for the same event, will it create two invoices? Will it send two emails? Will it double the ship cost?

An Idempotent system is one where the same operation can be executed multiple times without changing the result beyond the initial application.

How to Achieve Idempotency

External IDs: Never create a record without checking if an "External ID" (like a Stripe Customer ID or a specific Email address) already exists in the target system.
Idempotency Keys: Many professional APIs (like Stripe) allow you to pass a unique string (idempotency_key). The API remembers that key and will ignore any subsequent requests with the same key for 24 hours. This is the cornerstone of our automation reliability checklist.
Deduplication Tables: Before processing an event, check a "processed_events" table. If the unique Event ID is present, skip the run immediately.

If you don't design for idempotency, you are building a system that requires manual cleanup by design.

Event-Driven Architecture (EDA)

Most novice automations rely on "Polling"—checking an app every 15 minutes to see if anything is new. This is inefficient, expensive, and leads to high latency.

Event-Driven Architecture (EDA) flips this. The system waits for an event to happen (via a Webhook or Message Queue) and acts immediately. This moves the system from a "Pull" model to a "Push" model, a concept defined by Gartner as a foundational modern infrastructure.

However, EDA introduces new risks, particularly around silent API drift and state mismatch. What happens if the push is lost? What happens if the webhook arrives out of order? This is where Message Queues (like RabbitMQ, Amazon SQS, or even a simple Airtable Queue) become essential. They act as a buffer, ensuring that even if your automation tool is down, the events are saved and can be processed later—a key part of managing the hidden cost of observability.

CQRS: Separating Command and Query

CQRS (Command Query Responsibility Segregation) is a pattern that separates the logic for writing data (Commands) from the logic for reading data (Queries).

In automation, we often try to do both in one task: "Find lead, calculate total, update lead." As systems grow, this leads to race conditions—where you are querying a record that is currently being modified by another automation.

By segregating these responsibilities, you can optimize the "Read" side of your system (using a high-speed cache or a dedicated reporting database) without slowing down the "Write" side (the actual business logic). This is the foundation of high-performance automation.

Implementation Standards: DLQ and Circuit Breakers

Architecture is theoretical until it fails. Two standards are non-negotiable for professional systems:

1. The Dead Letter Queue (DLQ)

A DLQ is where "unsent mail" goes. If an automation fails after 5 retries, do not let the data vanish. Send the raw JSON payload to a dedicated DLQ (a spreadsheet or a log file). A human operator must be able to review and "re-play" these events once the root cause is fixed.

2. The Circuit Breaker

If an API starts returning 500 errors (Internal Server Error), your automation shouldn't keep banging its head against the wall. A Circuit Breaker pattern detects these repeated failures and "trips the circuit," stopping all requests to that API for a set period (e.g., 5 minutes). This prevents your system from incurring massive unnecessary costs or getting your API key banned for "abuse" during an outage.

Structural fragility is rarely the result of incompetence; it is most often the result of speed. The transition from "Automation" to "Architecture" is the moment a business moves from hobbyist experimentation to professional systemic leverage. Resolve undefined ownership within your fleet and apply these rules within our automation reliability checklist.

Operators diagnosing these design patterns often see the downstream effects in → CRM & Data Integrity