In distributed systems, there is a fundamental axiom often derived from the Two Generals’ Problem: it is mathematically impossible to guarantee exactly-once delivery of messages over an unreliable network. Acknowledgments get lost, connections time out, and retries are inevitable.
Because we cannot prevent duplicate message delivery, we must design systems that can withstand it. The goal, therefore, is not exactly-once delivery, but exactly-once processing.
1. The Core Mechanism: Idempotency
To achieve exactly-once processing, operations must be idempotent - meaning the result of performing the operation once is the same as performing it multiple times.
The standard pattern for this is the Idempotency Key.
- Tag: The producer assigns a unique key to every message.
- Check: Upon receipt, the consumer checks if this key has already been processed.
- Act:
- If seen: Discard the message (or return the previous result).
- If new: Process the message.
The Atomicity Requirement
A critical failure mode discussed in system design is the “check-then-act” race condition.
For this pattern to work, the processing of the business logic and the recording of the idempotency key must happen atomically.
- Correct: Wrap the state change (e.g.,
INSERT INTO orders) and the key storage (INSERT INTO processed_keys) in a single ACID database transaction. - Failure Mode: If you process the order, commit, and then try to save the key, a crash in between results in a duplicate order upon retry.
2. Choosing the Right Key Strategy
The “best” key depends entirely on your throughput, storage capacity, and producer architecture.
A. Random Identifiers (UUIDv4)
The producer generates a standard random UUID for every message.
- Pros: Stateless; producers don’t need to coordinate; trivial to implement.
- Cons: Infinite Storage Growth. To guarantee uniqueness, the consumer must store every UUID ever received.
- Mitigation: Use UUIDv7 or ULID. These embed a timestamp in the identifier. The consumer can then enforce a “retention window” (e.g., “reject any key older than 7 days”). While this technically breaks strict exactly-once guarantees for very old duplicates, it is a pragmatic tradeoff for most systems.
B. Deterministic Content Hashing (UUIDv5)
Instead of a random ID, the key is derived from the message content itself (a hash of the namespace + payload).
- Pros: If a producer unknowingly sends the same logical request twice, the key is identical. It enables “stateless” deduplication.
- Cons: False Positives. If a user legitimately wants to buy the same item twice in a row, a content-hash key might incorrectly reject the second purchase.
- Best Practice: Hash the intent (e.g.,
hash(cart_id + timestamp)), not just the payload.
C. Monotonic Sequences (The “High Watermark”)
The producer uses strictly increasing integers (1, 2, 3…).
- Pros: O(1) Storage. The consumer only needs to store the highest ID seen (the “High Watermark”). If , it’s a duplicate.
- Cons: Hard to generate. Producing strictly monotonic numbers in a distributed, multi-threaded environment creates a bottleneck.
3. Solving the Producer Concurrency Problem
If you choose Monotonic Sequences for their consumer efficiency, you must solve the producer bottleneck. If Thread A takes ID 100 and Thread B takes ID 101, but B finishes first, the consumer will see 101 and set the watermark. When 100 arrives later, it is incorrectly dropped.
Here are three ways to solve this:
The Hi/Lo Algorithm
To avoid hitting the database for every sequence number, use the Hi/Lo Algorithm:
-
Hi: The database provides a “block” of IDs (e.g., 1000 at a time) to a producer instance.
-
Lo: The producer increments IDs within that block in memory.
This reduces database contention significantly (1 request per 1000 messages) while maintaining uniqueness, though strict monotonicity across multiple producers requires careful partition management.
Log-Based Change Data Capture (CDC)
Instead of generating IDs in the application layer, use the database’s Transaction Log.
- Outbox Pattern: The producer writes the message intent to an
outboxtable in the same transaction as the business logic. - Derive Key: A CDC tool (like Debezium) reads the database Write-Ahead Log (WAL).
- Composite Key: In PostgreSQL, the Log Sequence Number (LSN) is monotonic. The idempotency key becomes {Commit LSN, Event LSN}.
This allows you to “have your cake and eat it too”—high throughput production with monotonic keys for the consumer.
Single-Threaded Partitioning
Tools like Kafka handle this by serializing messages per partition. The “Offset” acts as a naturally monotonic idempotency key. This shifts the complexity from the database to the message broker infrastructure.
4. The “Side Effect” Trap
The atomic transaction model works for database updates. But what if your message processing involves calling an external API (e.g., Stripe, Salesforce)? You cannot rollback a REST call inside a database transaction.
If the DB transaction rolls back but the API call succeeded, you have created a phantom state.
- Solution 1: Idempotency Propagation. Pass your idempotency key to the downstream service. (e.g., Stripe accepts an
Idempotency-Keyheader). - Solution 2: The Saga Pattern. Break the transaction into steps. If the local DB commit fails, trigger a “compensating transaction” (e.g., a refund) to undo the external side effect.
Summary Comparison
| Strategy | Storage Cost (Consumer) | Implementation Complexity | Best Use Case |
|---|---|---|---|
| UUIDv4 | High (Index everything) | Low | Low-to-medium volume; simple setups. |
| UUIDv7/ULID | Medium (Prunable index) | Low | High volume where “retention windows” are acceptable. |
| UUIDv5 | Zero (Deterministic) | Medium | Content-based deduplication; careful regarding false positives. |
| Monotonic / CDC | Very Low (High Watermark) | High (Requires CDC or complex producing) | Massive scale; systems requiring strict ordering; Kafka consumers. |
While TCP guarantees packet ordering at the transport layer, it cannot solve application-level duplicates caused by crash-recovery cycles. Whether you use UUIDs with a TTL or complex CDC pipelines, the principle remains: assume the network will lie to you, and trust only your persisted state.