In distributed systems, there is a fundamental axiom often derived from the Two Generals’ Problem: it is mathematically impossible to guarantee exactly-once delivery of messages over an unreliable network. Acknowledgments get lost, connections time out, and retries are inevitable.

Because we cannot prevent duplicate message delivery, we must design systems that can withstand it. The goal, therefore, is not exactly-once delivery, but exactly-once processing.

1. The Core Mechanism: Idempotency

To achieve exactly-once processing, operations must be idempotent - meaning the result of performing the operation once is the same as performing it multiple times.

The standard pattern for this is the Idempotency Key.

Tag: The producer assigns a unique key to every message.
Check: Upon receipt, the consumer checks if this key has already been processed.
Act:
- If seen: Discard the message (or return the previous result).
- If new: Process the message.

The Atomicity Requirement

A critical failure mode discussed in system design is the “check-then-act” race condition.

For this pattern to work, the processing of the business logic and the recording of the idempotency key must happen atomically.

Correct: Wrap the state change (e.g., INSERT INTO orders) and the key storage (INSERT INTO processed_keys) in a single ACID database transaction.
Failure Mode: If you process the order, commit, and then try to save the key, a crash in between results in a duplicate order upon retry.

2. Choosing the Right Key Strategy

The “best” key depends entirely on your throughput, storage indexes, and producer architecture.

A. Random Identifiers (UUIDv4)

The producer generates a standard random UUID for every message.

Pros: Stateless; producers don’t need to coordinate; trivial to implement.
Cons: Infinite Storage Growth. To guarantee uniqueness, the consumer must store every UUID ever received.
Mitigation: Use UUIDv7 or ULID. These embed a timestamp in the identifier. The consumer can then enforce a “retention window” (e.g., “reject any key older than 7 days”). While this technically breaks strict exactly-once guarantees for very old duplicates, it is a pragmatic tradeoff for most systems.

B. Deterministic Content Hashing (UUIDv5)

Instead of a random ID, the key is derived from the message content itself (a hash of the namespace + payload).

Pros: If a producer unknowingly sends the same logical request twice, the key is identical. It enables “stateless” deduplication.
Cons: False Positives. If a user legitimately wants to buy the same item twice in a row, a content-hash key might incorrectly reject the second purchase.
Best Practice: Hash the intent (e.g., hash(cart_id + timestamp)), not just the payload.

C. Monotonic Sequences (The “High Watermark”)

The producer uses strictly increasing integers (1, 2, 3…).

Pros: O(1) Storage. The consumer only needs to store the highest ID seen (the “High Watermark”). If $I D_{in co min g} \leq I D_{s t ore d}$ , it’s a duplicate.
Cons: Hard to generate. Producing strictly monotonic numbers in a distributed, multi-threaded environment creates a bottleneck.

3. Solving the Producer Concurrency Problem

If you choose Monotonic Sequences for their consumer efficiency, you must solve the producer bottleneck. If Thread A takes ID 100 and Thread B takes ID 101, but B finishes first, the consumer will see 101 and set the watermark. When 100 arrives later, it is incorrectly dropped.

Here are three ways to solve this:

The Hi/Lo Algorithm

To avoid hitting the database for every sequence number, use the Hi/Lo Algorithm:

Hi: The database provides a “block” of IDs (e.g., 1000 at a time) to a producer instance.
Lo: The producer increments IDs within that block in memory.

This reduces database contention significantly (1 request per 1000 messages) while maintaining uniqueness, though strict monotonicity across multiple producers requires careful partition management.

Log-Based Change Data Capture (CDC)

Instead of generating IDs in the application layer, use the database’s Transaction Log.

Outbox Pattern: The producer writes the message intent to an outbox table in the same transaction as the business logic.
Derive Key: A CDC tool (like Debezium) reads the database Write-Ahead Log (WAL).
Composite Key: In PostgreSQL, the Log Sequence Number (LSN) is monotonic. The idempotency key becomes {Commit LSN, Event LSN}.

This allows you to “have your cake and eat it too”—high throughput production with monotonic keys for the consumer.

Single-Threaded Partitioning

Tools like Kafka handle this by serializing messages per partition. The “Offset” acts as a naturally monotonic idempotency key. This shifts the complexity from the database to the message broker infrastructure.

4. The “Side Effect” Trap

The atomic transaction model works for database updates. But what if your message processing involves calling an external API (e.g., Stripe, Salesforce)? You cannot rollback a REST call inside a database transaction.

If the DB transaction rolls back but the API call succeeded, you have created a phantom state.

Solution 1: Idempotency Propagation. Pass your idempotency key to the downstream service. (e.g., Stripe accepts an Idempotency-Key header).
Solution 2: The Saga Pattern. Break the transaction into steps. If the local DB commit fails, trigger a “compensating transaction” (e.g., a refund) to undo the external side effect.

Summary Comparison

Strategy	Storage Cost (Consumer)	Implementation Complexity	Best Use Case
UUIDv4	High (Index everything)	Low	Low-to-medium volume; simple setups.
UUIDv7/ULID	Medium (Prunable index)	Low	High volume where “retention windows” are acceptable.
UUIDv5	Zero (Deterministic)	Medium	Content-based deduplication; careful regarding false positives.
Monotonic / CDC	Very Low (High Watermark)	High (Requires CDC or complex producing)	Massive scale; systems requiring strict ordering; Kafka consumers.

Note

While TCP guarantees packet ordering at the transport layer, it cannot solve application-level duplicates caused by crash-recovery cycles. Whether you use UUIDs with a TTL or complex CDC pipelines, the principle remains: assume the network will lie to you, and trust only your persisted state.

Implementation gotchas

1. The Danger of “Natural” or Business Idempotency Keys

A common mistake is to derive the idempotency key from business data (e.g., user_id + product_id) instead of requiring a random UUID. While this feels “cleaner,” it introduces a major risk: Semantic Drift.

The Collision Problem

Imagine a subscription service where the key is generated as membership_id + month.

The Intent: Prevent charging a user twice for the same month.
The Change: The business introduces “Add-on Packs” that can be purchased multiple times a month.
The Failure: The old idempotency logic sees the same key and rejects the second purchase as a duplicate, even though it’s a valid, separate transaction.

Strategy: Versioning and Intent Scoping

If you cannot use opaque UUIDs and must rely on business data, you must include a Version or Intent prefix in your hashing logic.

Rule of Thumb: If the definition of a “unique action” changes, the key generation algorithm must change too.

2. The “Payload Mismatch” Trap

This is the most frequent implementation bug. It occurs when a client reuses an Idempotency Key but changes the request parameters.

The Scenario:
1. Client sends POST /transfer { amount: 100 } with Key: uuid-1.
2. Server processes it and saves the result.
3. Client (due to a bug or malice) sends POST /transfer { amount: 9000 } with Key: uuid-1.
The Gotcha: A naive implementation just checks “Does uuid-1 exist?” It sees “Yes, status: COMPLETED” and returns the saved success response from the $100 transfer.
- Result: The client thinks they successfully transferred $9000, b u t o n l y$ 100 moved.
The Fix: You must store a hash (checksum) of the request body alongside the key. If the key exists but the hash doesn’t match the current request, throw a 422 Unprocessable Entity or 409 Conflict.

3. The “Burned Key” on Failure

Deciding what to do when the first attempt fails is tricky.

The Scenario: Client sends a request. The database is temporarily down. The server catches the exception and records the key status as FAILED.
The Gotcha: The client retries the request (as they should for a 500 error). The server sees the key exists with status FAILED and returns the error again—forever. You have effectively “burned” the key on a transient error.
The Fix:
- Transient Errors (Network/DB Connection): Do not save the key, or roll back the transaction entirely so the key is never persisted. Allow the retry to proceed as a fresh request.
- Terminal Errors (Validation, Business Logic): Save the key as FAILED. If the client retries, they should get the same validation error.

4. Namespace Collisions (Data Leaks)

This is a critical security vulnerability.

The Scenario: You rely solely on the Idempotency-Key header for uniqueness.
- User A sends Key: order-1.
- User B (malicious or accidental) sends Key: order-1.
The Gotcha: The server sees order-1 is completed and returns the cached response. User B just received User A’s order confirmation details, including potential PII.
The Fix: Never use the Idempotency Key as the global primary key. The composite primary key must be {user_id, idempotency_key}. User B cannot access User A’s keys.

5. The “Zombie Worker” (TTL Exhaustion)

This applies if you use Redis/Memcached locks instead of database transactions.

The Scenario:
1. Worker A locks Key X with a 30-second TTL (Time-To-Live).
2. Worker A gets stuck in a Garbage Collection pause or slow network call for 35 seconds.
3. Redis expires the lock.
4. Worker B picks up the retry, locks Key X, and starts processing.
5. Worker A wakes up and finishes processing.
The Gotcha: Both workers process the transaction. You have double-charged the customer despite having an “idempotency” lock.
The Fix: Use a Fencing Token.
- When acquiring a lock, increment a token (e.g., version 1, version 2).
- When performing the final write/side-effect, check that the token hasn’t been superseded. If the database sees a write from Token 1 but Token 2 already exists, reject the write.

6. Client-Side Key Rotation

Idempotency relies on the client behaving correctly.

The Scenario: The client sends a request. The server processes it but the response times out (network cut). The client library sees a timeout.
The Gotcha: The client code catches the timeout, generates a new UUID, and retries.
- Result: Since the key is new, the server treats it as a new request. Exactly-once processing is broken because the client failed to hold onto the original key.
The Fix: This is a documentation and client-library issue. You must educate consumers that retries must reuse the same key.

7. The “Resource Deleted” Race

Returning a cached response blindly can be confusing if the world changed in the meantime.

The Scenario:
1. User creates Order-1 (idempotent). Success.
2. User deletes Order-1.
3. User retries the creation of Order-1 (maybe an old browser tab refreshed).
The Gotcha: The idempotency system sees the key Order-1 was “successfully created” in the past and returns 200 OK with the order details. The user thinks the order is back, but it doesn’t actually exist in the orders table anymore.
The Fix: This is a philosophical design choice.
- Strict Idempotency: Return the original success (technically correct: “At time T, this succeeded”).
- State-Aware Idempotency: Check if the resulting resource still exists. If not, return 404 or 410 Gone. (This is harder to implement as it breaks the separation of concerns).

Resource Deleted Race

Below some other ideas how to fix the problem with delete

- The “State Validation” Fix (Recommended)

Instead of blindly returning the cached response, the idempotency layer should perform a lightweight check against the primary database to ensure the resource still exists.

The Logic: If the idempotency key exists and points to a “Success” state, verify the existence of the record in the Orders table.
The Outcome:
- If the order exists: Return the cached 200 OK.
- If the order is missing: Treat the request as a brand new request. Re-run the creation logic or return a 409 Conflict explaining the ID was previously used and deleted.

- The “Cascading Deletion” Fix

When a user deletes a resource, the system must also invalidate or delete the associated idempotency key.

The Logic: Treat the idempotency record and the resource as a single unit. In your DeleteOrder service, wrap the database deletion and the idempotency cache eviction in a single transaction (or a reliable distributed routine).
The Outcome: When the user retries the creation, the idempotency store has no record of it. The system sees it as a fresh request and successfully recreates the order.

- The “Soft Delete” Strategy

Instead of physically removing the row from the orders table, we use a deleted_at timestamp.

The Logic: The idempotency system returns the cached response. However, because the record still exists (just marked as deleted), you can design your API to:
1. Automatically “un-delete” it.
2. Return a 410 Gone or a 409 Conflict specifically stating that this resource was previously deleted and the ID cannot be reused.

A Personal Journal of Learning and Discovery

Archive

100c1a2⁝ Idempotency Keys - Ideas

1. The Core Mechanism: Idempotency

The Atomicity Requirement

2. Choosing the Right Key Strategy

A. Random Identifiers (UUIDv4)

B. Deterministic Content Hashing (UUIDv5)

C. Monotonic Sequences (The “High Watermark”)

3. Solving the Producer Concurrency Problem

The Hi/Lo Algorithm

Log-Based Change Data Capture (CDC)

Single-Threaded Partitioning

4. The “Side Effect” Trap

Summary Comparison

Implementation gotchas

1. The Danger of “Natural” or Business Idempotency Keys

The Collision Problem

Strategy: Versioning and Intent Scoping

2. The “Payload Mismatch” Trap

3. The “Burned Key” on Failure

4. Namespace Collisions (Data Leaks)

5. The “Zombie Worker” (TTL Exhaustion)

6. Client-Side Key Rotation

7. The “Resource Deleted” Race

- The “State Validation” Fix (Recommended)

- The “Cascading Deletion” Fix

- The “Soft Delete” Strategy

Table of Contents

Backlinks

Graph View