Communication Patterns in Complex Microservices
Executive summary
Complex microservice systems rarely rely on a single “best” communication style. They typically combine synchronous request/response (for user-facing queries and short, deterministic commands) with asynchronous pub/sub (for fan-out notifications, workload buffering, and integration), and then add higher-level coordination patterns such as sagas (for cross-service business transactions) and event sourcing (when an append-only event log must become a system of record). The core architectural work is choosing where you want temporal coupling, what consistency model is acceptable, and how you will make failure semantics explicit (retries, deduplication, ordering, and compensation).
Synchronous request/response (HTTP/REST or gRPC) is straightforward to reason about, but it creates runtime and temporal coupling: availability and latency of upstream services become directly dependent on downstream services. It also makes “chatty” call graphs prone to cascading failures and tail-latency amplification, so production-grade use requires timeouts/deadlines, retries with backoff, and circuit breaking (often via a service mesh). gRPC formalizes RPC types (unary and streaming) over HTTP/2 framing, enabling low-latency service-to-service calls and streaming when appropriate.
Asynchronous pub/sub decouples producers from consumers in time and often in deployment cadence; it improves elasticity and fault isolation, but shifts complexity to delivery semantics (at-most-once vs at-least-once vs exactly-once), ordering (typically “per partition/stream” rather than global), and idempotency (consumers must handle duplicates and replays). For example: Kafka provides total order only within a partition; NATS Core is at-most-once while JetStream adds persistence and at-least-once options; RabbitMQ uses acknowledgements and publisher confirms for reliability and provides different ordering characteristics depending on queue vs stream constructs.
Event sourcing persists state changes as a sequence of domain events, making the event log the primary source of truth. This enables auditability and replay, but demands careful schema evolution, projection rebuild strategies, and operational discipline around versioning and observability.
Sagas address distributed transaction needs by coordinating a sequence of local transactions with compensating actions on failure. They explicitly avoid holding locks across services (unlike classic 2-phase commit approaches); instead they embrace eventual consistency with well-defined rollback/compensation behavior. Orchestration engines (such as workflow orchestrators) can make saga execution and retries durable, while choreography relies on event-driven interactions between services.
Cross-cutting requirements—observability and security—must be designed into each communication path. Standardized trace context propagation (W3C traceparent/tracestate) and OpenTelemetry context propagation are foundational for troubleshooting in both synchronous calls and asynchronous messaging, and event formats like CloudEvents help interoperate across tools and platforms.
Scope and evaluation dimensions
This report covers the following communication and coordination patterns, focusing on how they behave in complex microservice systems (multiple teams, independent deployments, partial failures, evolving schemas):
- Request/response (HTTP/REST and gRPC)
- Pub/sub (message brokers and event streaming)
- Observer (in-process reactive/event listener pattern, and how it relates to pub/sub)
- Event sourcing (often paired with CQRS)
- Saga (orchestration and choreography)
- Supporting patterns that almost always become necessary at scale (transactional outbox, schema registry, trace propagation, and request/reply over messaging)
The comparison dimensions used throughout are: message flow, coupling, consistency model, latency, scalability, fault tolerance, delivery guarantees, ordering, idempotency needs, transactional boundaries, observability/monitoring, security implications, and typical failure modes. These dimensions directly align with how real systems fail and how teams operate them.
Pattern catalog with rigorous definitions
Request/response
Definition. A client service sends a request to a server service and waits (synchronously) for a response. Common transports include HTTP APIs (often called REST when Fielding’s constraints are followed) and RPC frameworks like gRPC. REST as an architectural style is defined in Roy Fielding's dissertation chapter on REST.
Core properties.
- Strong temporal coupling: the caller’s success depends on the callee being available within a deadline.
- Clear latency budget and backpressure via timeouts/deadlines (especially important on gRPC where streaming can keep connections busy).
- Idempotency matters because retries are a standard resilience tactic. HTTP semantics explicitly define idempotent methods; PUT/DELETE and “safe” methods are idempotent per the HTTP Semantics RFC.
gRPC specifics. gRPC defines unary RPC and streaming RPC types (server streaming, client streaming, bidirectional) and is carried over HTTP/2 framing; this affects multiplexing, load, and observability decisions.
sequenceDiagram
autonumber
participant C as Client Service
participant S as Server Service
Note over C,S: Request/Response (HTTP/REST or gRPC)
C->>S: Request (deadline/timeout, auth, trace context)
alt Success
S-->>C: Response (200/OK or gRPC status OK)
else Failure/timeout
S-->>C: Error / deadline exceeded
Note over C: Retry? Only safe if idempotent<br/>or protected by Idempotency-Key
end
Pub/sub
Definition. Publishers emit messages/events to a broker (topic/subject/exchange), and subscribers receive them without the publisher directly addressing specific consumers. Pub/sub can be ephemeral (best-effort) or durable (persisted streams/queues).
Core properties.
- Reduced deployment coupling: producers do not need to know who consumes.
- Reduced temporal coupling when the broker buffers messages, enabling consumers to be offline temporarily (durable pub/sub).
- Increased complexity in:
- delivery semantics (duplication/replay),
- ordering, and
- consumer state (offset/ack management).
Concrete delivery semantics examples (tool-driven).
- Kafka: default is at-least-once; at-most-once is achievable with certain producer/consumer configurations; transactions enable stronger exactly-once processing workflows.
- NATS Core: at-most-once; if a subscriber is offline it will miss messages; ordering guarantees are limited to a given publisher stream.
- NATS JetStream: adds persistence and can provide at-least-once delivery; it also documents an “exactly once” quality-of-service on the publishing side via message IDs (deduplication).
- RabbitMQ: uses consumer acknowledgements and publisher confirms to improve data safety; durability depends on declarations and persistence settings, and semantics must be explicitly configured.
sequenceDiagram
autonumber
participant P as Publisher
participant B as Broker (topic/subject/exchange)
participant S1 as Subscriber A
participant S2 as Subscriber B
Note over P,S2: Pub/Sub delivery depends on the broker configuration
P->>B: Publish(event, key, headers)
par Fan-out
B-->>S1: Deliver(event)
B-->>S2: Deliver(event)
end
alt Durable semantics (acks/offsets)
S1->>B: Ack / commit offset
S2->>B: Ack / commit offset
else Best-effort semantics
Note over S1,S2: No ack; offline consumers may miss events
end
Observer
Definition. The Observer pattern is fundamentally in-process: a publisher (subject) notifies multiple subscribers (observers) of a sequence of events. In modern systems, Observer often manifests as reactive streams (Publisher/Subscriber) within a service boundary, and it can be “bridged” to distributed pub/sub at the edge. Java’s Flow Publisher describes how items are delivered to subscribers in order (modulo drops/errors), representing a standardized reactive observer model.
Why it matters in microservices. Many microservice failures and coupling problems begin inside a single service: a “domain event” is raised, multiple handlers run, side effects execute, and then the service tries to publish an integration event. Treating internal eventing as Observer and external eventing as Pub/Sub clarifies boundaries:
- Observer: deterministic ordering and transactional context within a process (if designed that way).
- Pub/Sub: distributed delivery with retries, duplicates, and eventual consistency.
sequenceDiagram
autonumber
participant Pub as In-process Publisher
participant Obs1 as Observer/Subscriber 1
participant Obs2 as Observer/Subscriber 2
Pub-->>Obs1: onNext(event)
Pub-->>Obs2: onNext(event)
alt Completion
Pub-->>Obs1: onComplete()
Pub-->>Obs2: onComplete()
else Error
Pub-->>Obs1: onError(err)
Pub-->>Obs2: onError(err)
end
Event sourcing
Definition. Event sourcing stores all changes to application state as an append-only sequence of events; current state is derived by replaying events (often with snapshots for efficiency). This concept is clearly articulated by Martin Fowler: store changes as events and reconstruct past states by replay.
Core properties.
- The event log becomes the primary source of truth; projections/read models are derived.
- Excellent auditability and replay; supports retroactive fixes by reprocessing.
- Requires strict governance: event schema evolution, versioning and compatibility, and operational tooling for replays and projection rebuilds.
flowchart LR
UI[Client/UI] --> CMD[Command API]
CMD --> AGG[Domain Aggregate]
AGG -->|append| ES[(Event Store)]
ES --> PROJ[Projection Builders]
PROJ --> RM[(Read Models / Views)]
UI --> QRY[Query API]
QRY --> RM
ES --> BUS[Integration Event Stream]
BUS --> OTH[Other Services]
Saga
Definition. A saga is a sequence of local transactions (each within a single service’s boundary), coordinated so the overall business process reaches a consistent outcome; if a step fails, compensating transactions undo previously completed steps. This idea originates in the classic “Sagas” paper by Hector Garcia-Molina and Kenneth Salem on long-lived transactions, and it is widely adopted in microservices guidance such as the Saga pattern in Microsoft's architecture patterns.
Two implementation styles.
- Choreography: services publish domain events and react to each other’s events, forming a distributed “dance”.
- Orchestration: a central orchestrator/workflow issues commands and decides the next step; compensation is explicitly triggered.
The Azure Saga guidance highlights local transactions, compensation, and the fact that cross-service ACID transactions are not directly applicable when each microservice has its own datastore.
sequenceDiagram
autonumber
participant Orch as Orchestrator/Workflow
participant A as Service A
participant B as Service B
participant C as Service C
Note over Orch,C: Saga orchestration with compensation
Orch->>A: Do Step A (local tx)
A-->>Orch: OK (event/response)
Orch->>B: Do Step B (local tx)
B-->>Orch: FAIL
Orch->>A: Compensate A (undo)
A-->>Orch: Compensation OK
Orch-->>Orch: Mark saga failed / emit outcome
Temporal explicitly frames sagas as a compensating-actions design pattern for distributed operations and shows how workflows can persist state and run compensations on failure, reducing the amount of hand-rolled reliability logic.
Architecture and flow diagrams for common hybrids
Asynchronous request/reply over messaging
A frequent “hybrid” is request/reply using a broker to reduce direct coupling while still getting a response. NATS describes request/reply as publish/subscribe with a “reply subject” (inbox) that responders use to reply. RabbitMQ’s RPC tutorial describes the callback queue approach.
sequenceDiagram
autonumber
participant Req as Requester
participant Br as Broker
participant Rep as Responder
Req->>Br: Publish(Request, replyTo=inbox.X)
Br-->>Rep: Deliver(Request)
Rep->>Br: Publish(Response, to=inbox.X)
Br-->>Req: Deliver(Response)
Note over Req: Timeout + correlationId required
Transactional outbox for reliable event publication
Publishing an event to a broker and writing local state is the well-known “dual write” hazard. The outbox pattern solves this by writing the outbound event to an outbox table in the same local database transaction, and then asynchronously relaying it to the broker (often via CDC). Debezium describes the outbox pattern explicitly as a way to reliably exchange data and avoid inconsistent state between a service’s database and downstream consumers.
flowchart LR
Svc[Service] -->|local tx| DB[(DB + Outbox table)]
DB -->|CDC / log tailing| Relay[Outbox Relay]
Relay --> Broker[(Broker / Event Bus)]
Broker --> Cons[Consumers]
Note1[Idempotent consumers + schema governance] -.-> Cons
Microservices.io also frames the transactional outbox as the pattern for atomically updating state and sending messages/events, especially for saga participants.
Trace context for distributed observability
OpenTelemetry defines context propagation as the core mechanism to correlate traces/metrics/logs across service boundaries.
The W3C Trace Context specification standardizes traceparent and tracestate headers for interoperability across tracing vendors.
For messaging, W3C also adds protocol-specific guidance (e.g., AMQP trace context mapping), which is relevant when you propagate trace context through brokers.
Comparative analysis matrix
The table below compares the patterns themselves (not a specific broker configuration). “Delivery guarantees” and “ordering” are expressed as what the pattern typically implies and what is commonly required for correctness in real systems.
| Dimension | Request/response | Pub/sub | Observer (in-process) | Event sourcing | Saga |
|---|---|---|---|---|---|
| Message flow | Direct call + response; caller waits | Publish to broker; consumers receive asynchronously | Publisher pushes events to local subscribers | Commands append events; projections consume/replay | Sequence of local transactions with compensation |
| Coupling | High runtime + temporal coupling; schema coupling on API | Lower runtime coupling; schema coupling on event contracts | Tight in-process coupling if not modularized | Consumers couple to event schema + replay semantics | Participants couple via coordination protocol + compensations |
| Consistency model | Strong within a single service transaction; cross-service consistency requires additional coordination | Typically eventual consistency across services | Strong within process; depends on transaction scope | Event log is source of truth; projections are eventually consistent | Eventual consistency with compensations and “point of no return” |
| Latency profile | Low best-case; tail latency amplified in call chains | Producer fast; end-to-end latency includes queueing/consumer lag | Very low (in-process), can backpressure | Write path fast; read path depends on projection freshness | Higher: multi-step, includes retries/timeouts and compensation paths |
| Scalability | Scales via load balancing; bottlenecked by synchronous dependencies | Scales via partitions/queues/consumer groups | Scales by CPU threads; single-process bound | Scales by partitioned event streams + projection workers | Scales by parallel steps where possible; coordination overhead |
| Fault tolerance | Needs timeouts, retries, circuit breakers; cascading failures possible | Naturally buffers load; consumers can retry; poison messages possible | Process crash loses ephemeral events unless persisted | Replay + rebuild possible; event store becomes critical dependency | Explicit failure handling; compensation can fail; long-lived workflows |
| Delivery guarantees | “At-most-once per attempt”; retries can duplicate side effects unless idempotent | Depends on broker: often at-least-once; exactly-once rare end-to-end | In-process delivery; errors handled locally | Event append is durable; projections often at-least-once | Each step should be idempotent; compensation must be reliable |
| Ordering | Not guaranteed across concurrent calls; depends on client logic | Often per partition/stream/queue; not global | Typically deterministic in-process ordering | Total order per aggregate stream; global order not assumed | Order is defined by saga state machine / workflow |
| Idempotency needs | Required for safe retries; HTTP defines idempotent methods (PUT/DELETE, safe methods) | Strongly required: duplicates and replays are normal | Moderate (depends on retry/error handling) | Required for projection rebuild and at-least-once consumption | Required for forward retries and compensations |
| Transactional boundaries | Single-service DB transaction; distributed transactions discouraged | “Dual write” hazard; needs outbox/CDC for atomicity | Can share a DB transaction if designed | Event append is atomic; side effects need outbox/handlers | Local transactions only; global atomicity via compensation logic |
| Observability/monitoring | Distributed tracing via context headers; request metrics easy | Needs correlation across async hops; trace context in message headers | Local tracing; bridges to distributed tracing at edges | Need tooling for replay, projection lag, event schema | Needs end-to-end saga visibility; timeline and state tracking |
| Security implications | TLS + service authN/authZ; API gateways and mTLS common | Broker ACLs, encryption, client identity; secret management | In-process only; relies on service boundary security | Event store access is sensitive; encryption + ACLs | Orchestrator is high-privilege; must secure commands/events |
| Typical failure modes | Timeouts; retries causing duplication; cascading failures | Consumer lag; duplicates; reordering; poison messages; broker partition | Subscriber exceptions; blocking handlers; memory leaks | Bad event versioning; replay storms; projection inconsistency | Stuck sagas; compensation failures; split-brain outcomes |
Key grounded points behind the table:
- HTTP idempotent methods are defined in RFC 9110.
- Kafka ordering is per partition, not across partitions.
- NATS Core at-most-once and JetStream at-least-once are explicitly documented.
- RabbitMQ discusses consumer acknowledgements and publisher confirms as reliability primitives and clarifies ordering considerations (especially with streams/queues).
- Saga definition and compensation behavior are documented in the original saga literature and Microsoft’s saga pattern guidance.
- Event sourcing definition and replay rationale are articulated by Fowler and reinforced by microservices pattern catalogs.
Tooling and implementation mapping with concrete examples
Technology-to-pattern map
| Technology / Tool | Best-fit patterns | Notes on semantics and operational implications |
|---|---|---|
| Kafka | Pub/sub (event streaming), event sourcing substrate, async integration | Default delivery often at-least-once; exactly-once requires idempotence + transactions and careful consumer offset handling. Ordering is per partition. |
| RabbitMQ | Pub/sub (fanout/topic), work queues, request/reply over messaging | Reliability via publisher confirms + consumer acks. Ordering depends on queue/stream usage and consumer concurrency. |
| NATS Core | Pub/sub, request/reply | At-most-once by design; request/reply uses reply subjects (“inbox”). |
| NATS JetStream | Durable pub/sub, replay, persisted streams | Adds persistence and at-least-once; provides publish-side dedup (“exactly once” QoS) via message IDs. |
| gRPC | Request/response (unary), streaming, internal RPC | Runs over HTTP/2 framing; has standardized health checking and authentication guidance. |
| HTTP/REST | Request/response, webhooks (event notification) | REST constraints described by Fielding; idempotency semantics in RFC 9110. |
| Axon Framework | Event sourcing + CQRS, saga support | Provides event store abstractions and saga guidance/testing fixtures. |
| Temporal | Saga/workflow orchestration, durable retries and compensation | Provides workflow model and explicitly documents saga as a workflow pattern and testing capabilities. |
| Debezium Outbox | Transactional outbox + CDC relay | Outbox pattern avoids DB/event-bus inconsistency; can propagate trace context via outbox fields. |
| Confluent Schema Registry | Event contract governance | Schema evolution and compatibility modes formalize event versioning for Avro/JSON Schema/Protobuf. |
| CloudEvents | Standard event envelope | Defines interoperable event metadata and formats under CNCF. |
| OpenTelemetry + W3C Trace Context | Observability across patterns | Context propagation links traces across process/network boundaries; W3C standardizes headers; applies to both sync and async. |
Code-level examples
Kafka pub/sub with idempotent + transactional producer
Kafka’s Java producer API documents idempotent and transactional modes; transactional producers can send to multiple partitions/topics atomically, and idempotence strengthens delivery semantics by preventing duplicates from retries.
// Kafka Java: transactional publish (illustrative)
Properties props = new Properties();
props.put("bootstrap.servers", "kafka:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
// Idempotence + transactions (exact settings depend on your environment)
props.put("enable.idempotence", "true");
props.put("transactional.id", "orders-service-tx-1");
KafkaProducer<String, String> p = new KafkaProducer<>(props);
p.initTransactions();
p.beginTransaction();
p.send(new ProducerRecord<>("orders.events", "order-123", "{...Event...}"));
// If also consuming, you typically commit offsets as part of the same transaction.
p.commitTransaction();
RabbitMQ publisher confirms + consumer acknowledgements
RabbitMQ’s documentation treats publisher confirms and consumer acknowledgements as key safety features, directly aimed at preventing message loss in the presence of failures.
# RabbitMQ (pika) conceptual sketch: publisher confirms + manual acks
channel.confirm_delivery() # publisher confirms mode
ok = channel.basic_publish(
exchange="events.topic",
routing_key="order.created",
body=b"...",
properties=pika.BasicProperties(delivery_mode=2) # persistent message (depends on queue/exchange durability too)
)
if not ok:
raise RuntimeError("Publish was not confirmed by the broker")
def on_message(ch, method, properties, body):
try:
handle_event(body) # must be idempotent
ch.basic_ack(delivery_tag=method.delivery_tag)
except Exception:
ch.basic_nack(delivery_tag=method.delivery_tag, requeue=True)
NATS request/reply using inbox subjects
NATS documents request/reply as publish/subscribe using a reply subject (“inbox”), encapsulated by client libraries.
// NATS Go: request/reply sketch
msg, err := nc.Request("inventory.check", []byte(`{"sku":"A1","qty":2}`), 500*time.Millisecond)
if err != nil { /* timeout, retry policy */ }
fmt.Println("reply:", string(msg.Data))
Temporal saga-style workflow with compensation
Temporal explicitly frames saga as a design pattern implemented by compensating actions in workflows and documents workflow concepts and testing frameworks (including time skipping).
// Temporal TypeScript: simplified saga-like workflow sketch
export async function placeOrderWorkflow(input: PlaceOrderInput) {
const compensations: Array<() => Promise<void>> = [];
try {
await reserveInventory(input);
compensations.push(() => releaseInventory(input));
await capturePayment(input);
compensations.push(() => refundPayment(input));
await createShipment(input);
compensations.push(() => cancelShipment(input));
return { status: "CONFIRMED" };
} catch (e) {
// compensate in reverse order
for (const compensate of compensations.reverse()) {
await compensate();
}
return { status: "FAILED", reason: String(e) };
}
}
Scenario-driven recommendations, migration, testing, and anti-patterns
When to choose each pattern
Choose request/response when:
- You need a fast user-visible response and the interaction is inherently synchronous (authentication, read queries requiring fresh data, short commands).
- You can keep the call chain shallow and enforce deadlines and idempotency. RFC 9110’s definition of idempotent methods is directly relevant when building retry policies and safe API designs.
Choose pub/sub when:
- You want to decouple producers and consumers, support fan-out, and handle bursty workloads via buffering.
- You can tolerate eventual consistency and are prepared to engineer around duplicates and ordering constraints (e.g., Kafka per-partition order).
Choose Observer (in-process) when:
- You need modular internal event handling within a service boundary (domain events, reactive pipelines).
- You want explicit backpressure and structured handling (e.g., reactive streams Publisher/Subscriber semantics).
Choose event sourcing when:
- The event log itself is valuable: audit trails, temporal queries (“what did we know when?”), replay/rebuild, and strong traceability.
- You are ready for projection management, schema governance, and operational practices around rebuilds and snapshots.
Choose sagas when:
- A business process spans multiple services/datastores and must converge on a consistent outcome without distributed locking.
- You can define compensating actions and accept eventual consistency as operational reality. This is precisely the motivation in the original saga paper and Microsoft’s saga guidance.
Hybrid approaches that work in practice
A common “mature” architecture uses:
- Request/response for queries + commands that need immediate feedback, but emits integration events via pub/sub for downstream reactions.
- Sagas for cross-service “order-like” workflows (order → payment → inventory → shipping), with pub/sub carrying events, and an orchestrator when complexity rises.
- Transactional outbox + CDC to make event publication reliable relative to database changes (avoid dual-write inconsistencies).
For event interoperability and debugging across such hybrids:
- Use CloudEvents as a common event envelope where it helps multi-platform integration. CloudEvents is a CNCF-hosted specification intended to standardize event data formats.
- Use W3C trace context fields and OpenTelemetry propagation so that both synchronous calls and asynchronous events can be stitched into a single trace narrative.
Migration strategies
Migrating from synchronous-only to event-driven:
- Start by emitting domain/integration events alongside existing APIs (dual-write risk). Adopt the outbox pattern early to keep database state and emitted events consistent.
- Introduce consumers gradually, focusing on “side effects” first (notifications, analytics, search indexing) before migrating core business invariants.
- Add schema governance (Schema Registry or equivalent) before events become widely consumed; formal compatibility and versioning rules are essential as teams scale.
Migrating to sagas:
- Identify distributed transaction boundaries and separate them into local transactions with compensations.
- Prefer orchestration when debugging complexity and cross-team ownership make choreography fragile (a concern frequently surfaced in saga guidance and practical patterns).
Adopting event sourcing incrementally:
- Begin with event capture for audit/history while still maintaining current-state tables, then evolve toward projections as read models.
- Plan for snapshotting and replay tooling; event-sourced aggregates can accumulate large event histories.
Testing strategies aligned to the patterns
For request/response APIs:
- Use consumer-driven contract testing to prevent breaking changes. Pact is explicitly designed as a consumer-driven contract testing tool where consumer tests generate contracts.
- Spring Cloud Contract provides an approach and documentation for consumer-driven contracts in JVM ecosystems.
For pub/sub and event-driven integration:
- Treat event schemas as contracts and enforce compatibility with schema tooling (e.g., Schema Registry compatibility checks).
- Use ephemeral real dependencies in tests to catch broker-specific behaviors. Testcontainers provides disposable instances of dependencies (including Kafka) and documents Kafka modules explicitly.
For sagas/workflows:
- Prefer deterministic workflow tests and time-skipping test environments when supported. Temporal documents testing frameworks (e.g., Java
TestWorkflowEnvironmentwith time skipping) to test long-running workflows quickly.
Across all patterns:
- Make idempotency testable: inject duplicates, reorderings, and retries in tests to verify handlers can safely reprocess messages or requests.
Anti-patterns and failure modes to actively avoid
Distributed monolith via synchronous call chains:
- Too many synchronous hops amplify tail latency and make outages contagious. The failure mode is “everything waits on everything.” This is structurally inherent to request/response temporal coupling.
“Fire-and-forget” events for critical business steps without durability:
- Using best-effort pub/sub for state-changing workflows leads to silent data loss. NATS Core explicitly notes at-most-once delivery and that offline subscribers will not receive messages.
Assuming global ordering:
- Kafka provides total order only within a partition; if you need total ordering you must use a single partition (with scalability costs) or redesign invariants.
Ignoring idempotency:
- Kafka retries can create duplicates without idempotent/transactional modes; RabbitMQ redelivery can duplicate; JetStream at-least-once implies duplicates on redelivery. Correctness requires idempotent consumers and/or dedup keys.
No outbox/CDC for dual writes:
- Writing DB state and publishing an event in separate steps is a classic source of inconsistency; outbox guidance exists precisely to avoid that mismatch.
Saga without observable state and compensation discipline:
- If sagas aren’t observable (timeline/state), you get “stuck” business processes. Microsoft explicitly highlights debugging complexity challenges in sagas as the number of services grows.
Security treated as an afterthought:
- Brokers and service-to-service links require encryption and strong identity. Kafka provides authorization and ACL guidance; RabbitMQ documents TLS support; NATS documents mutual TLS; gRPC has authentication guidance; TLS 1.3 is defined in an IETF RFC.