1. The “Payload Mismatch” Trap

This is the most frequent implementation bug. It occurs when a client reuses an Idempotency Key but changes the request parameters.

  • The Scenario:

    1. Client sends POST /transfer { amount: 100 } with Key: uuid-1.
    2. Server processes it and saves the result.
    3. Client (due to a bug or malice) sends POST /transfer { amount: 9000 } with Key: uuid-1.
  • The Gotcha: A naive implementation just checks “Does uuid-1 exist?” It sees “Yes, status: COMPLETED” and returns the saved success response from the $100 transfer.

    • Result: The client thinks they successfully transferred 100 moved.
  • The Fix: You must store a hash (checksum) of the request body alongside the key. If the key exists but the hash doesn’t match the current request, throw a 422 Unprocessable Entity or 409 Conflict.

2. The “Burned Key” on Failure

Deciding what to do when the first attempt fails is tricky.

  • The Scenario: Client sends a request. The database is temporarily down. The server catches the exception and records the key status as FAILED.

  • The Gotcha: The client retries the request (as they should for a 500 error). The server sees the key exists with status FAILED and returns the error again—forever. You have effectively “burned” the key on a transient error.

  • The Fix:

    • Transient Errors (Network/DB Connection): Do not save the key, or roll back the transaction entirely so the key is never persisted. Allow the retry to proceed as a fresh request.
    • Terminal Errors (Validation, Business Logic): Save the key as FAILED. If the client retries, they should get the same validation error.

3. Namespace Collisions (Data Leaks)

This is a critical security vulnerability.

  • The Scenario: You rely solely on the Idempotency-Key header for uniqueness.
    • User A sends Key: order-1.
    • User B (malicious or accidental) sends Key: order-1.
  • The Gotcha: The server sees order-1 is completed and returns the cached response. User B just received User A’s order confirmation details, including potential PII.
  • The Fix: Never use the Idempotency Key as the global primary key. The composite primary key must be {user_id, idempotency_key}. User B cannot access User A’s keys.

4. The “Zombie Worker” (TTL Exhaustion)

This applies if you use Redis/Memcached locks instead of database transactions.

  • The Scenario:

    1. Worker A locks Key X with a 30-second TTL (Time-To-Live).
    2. Worker A gets stuck in a Garbage Collection pause or slow network call for 35 seconds.
    3. Redis expires the lock.
    4. Worker B picks up the retry, locks Key X, and starts processing.
    5. Worker A wakes up and finishes processing.
  • The Gotcha: Both workers process the transaction. You have double-charged the customer despite having an “idempotency” lock.

  • The Fix: Use a Fencing Token.

    • When acquiring a lock, increment a token (e.g., version 1, version 2).
    • When performing the final write/side-effect, check that the token hasn’t been superseded. If the database sees a write from Token 1 but Token 2 already exists, reject the write.

5. Client-Side Key Rotation

Idempotency relies on the client behaving correctly.

  • The Scenario: The client sends a request. The server processes it but the response times out (network cut). The client library sees a timeout.
  • The Gotcha: The client code catches the timeout, generates a new UUID, and retries.
    • Result: Since the key is new, the server treats it as a new request. Exactly-once processing is broken because the client failed to hold onto the original key.
  • The Fix: This is a documentation and client-library issue. You must educate consumers that retries must reuse the same key.

6. The “Resource Deleted” Race

Returning a cached response blindly can be confusing if the world changed in the meantime.

  • The Scenario:

    1. User creates Order-1 (idempotent). Success.
    2. User deletes Order-1.
    3. User retries the creation of Order-1 (maybe an old browser tab refreshed).
  • The Gotcha: The idempotency system sees the key Order-1 was “successfully created” in the past and returns 200 OK with the order details. The user thinks the order is back, but it doesn’t actually exist in the orders table anymore.

  • The Fix: This is a philosophical design choice.

    • Strict Idempotency: Return the original success (technically correct: “At time T, this succeeded”).
    • State-Aware Idempotency: Check if the resulting resource still exists. If not, return 404 or 410 Gone. (This is harder to implement as it breaks the separation of concerns).

100c1a2⁝ Idempotency Keys - Ideas ( second draft )