Tutorials
9 min read

Webhooks Will Fail. Here are the Retry and Idempotency Patterns That Save You.

Practical patterns for reliable webhook delivery: exponential backoff with jitter, idempotency keys, dead letter queues, and signature verification. TypeScript code included.

Transactional Team
Mar 7, 2026
9 min read
Share
Webhooks Will Fail. Here are the Retry and Idempotency Patterns That Save You.

Webhooks fail. Not sometimes. Regularly. Industry data shows that about 3-5% of webhook attempts fail on the first try. The receiver is deploying, their server is overloaded, there is a DNS blip, or the connection times out.

The question is not whether your webhooks will fail. The question is whether your system handles failures gracefully or silently drops events.

What You Will Learn

  • Why webhooks fail and how often
  • Implementing exponential backoff with jitter
  • Idempotency keys to prevent duplicate processing
  • Dead letter queues for permanently failed deliveries
  • Webhook signature verification
  • Ordering guarantees (and when to skip them)

Webhook Delivery: Naive vs Resilient Retry Strategy

Naive (Fixed Retry)Resilient (Backoff + Jitter)
First-Attempt Failure Rate3-5%3-5%
Final Delivery Rate~90%99.7%
Duplicate ProcessingCommonPrevented
Failed Event RecoveryLostDead Letter Queue
Thundering Herd RiskHighEliminated

Why Webhooks Fail

Common failure modes, roughly ordered by frequency:

  1. Timeouts (40%) -- receiver takes too long to respond
  2. 5xx errors (25%) -- receiver's server is down or erroring
  3. Connection refused (15%) -- receiver's server is not listening
  4. DNS resolution failures (10%) -- temporary DNS issues
  5. TLS errors (5%) -- certificate problems
  6. 4xx errors (5%) -- misconfigured endpoint, bad URL

The important distinction: some failures are transient (timeouts, 5xx, DNS) and worth retrying. Others are permanent (404, 401) and retrying will never succeed.

Pattern 1: Exponential Backoff with Jitter

Naive retry (retry immediately, fixed intervals) causes thundering herd problems. If a receiver goes down for 5 minutes and you are retrying every 30 seconds, you hit them with a burst of accumulated retries the moment they come back up.

Exponential backoff with jitter solves this:

// retry.ts
interface RetryConfig {
  maxRetries: number;
  baseDelayMs: number;
  maxDelayMs: number;
}
 
const DEFAULT_RETRY_CONFIG: RetryConfig = {
  maxRetries: 8,
  baseDelayMs: 1_000,   // 1 second
  maxDelayMs: 86_400_000, // 24 hours
};
 
function calculateRetryDelay(
  attempt: number,
  config: RetryConfig = DEFAULT_RETRY_CONFIG
): number {
  // Exponential backoff: 1s, 2s, 4s, 8s, 16s, 32s, 64s, 128s...
  const exponentialDelay =
    config.baseDelayMs * Math.pow(2, attempt);
 
  // Cap at max delay
  const cappedDelay = Math.min(exponentialDelay, config.maxDelayMs);
 
  // Add jitter: random value between 0 and the calculated delay
  // Full jitter prevents thundering herd
  const jitter = Math.random() * cappedDelay;
 
  return Math.floor(jitter);
}
 
// Retry schedule example (approximate, due to jitter):
// Attempt 0: 0-1s
// Attempt 1: 0-2s
// Attempt 2: 0-4s
// Attempt 3: 0-8s
// Attempt 4: 0-16s
// Attempt 5: 0-32s
// Attempt 6: 0-64s (1 min)
// Attempt 7: 0-128s (2 min)

Implementing the Delivery Worker

// webhook-worker.ts
interface WebhookJob {
  id: string;
  url: string;
  payload: Record<string, unknown>;
  eventType: string;
  attempt: number;
  maxRetries: number;
  idempotencyKey: string;
  createdAt: Date;
}
 
async function deliverWebhook(job: WebhookJob): Promise<void> {
  const signature = signPayload(
    JSON.stringify(job.payload),
    SIGNING_SECRET
  );
 
  try {
    const controller = new AbortController();
    const timeoutId = setTimeout(
      () => controller.abort(),
      10_000 // 10 second timeout
    );
 
    const response = await fetch(job.url, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'X-Webhook-Id': job.id,
        'X-Webhook-Event': job.eventType,
        'X-Webhook-Signature': signature,
        'X-Webhook-Timestamp': Date.now().toString(),
        'X-Idempotency-Key': job.idempotencyKey,
      },
      body: JSON.stringify(job.payload),
      signal: controller.signal,
    });
 
    clearTimeout(timeoutId);
 
    if (response.ok) {
      await markDelivered(job.id);
      return;
    }
 
    // Permanent failure - do not retry
    if (response.status >= 400 && response.status < 500) {
      await markFailed(job.id, `HTTP ${response.status}`, false);
      return;
    }
 
    // Transient failure - schedule retry
    throw new Error(`HTTP ${response.status}`);
  } catch (error) {
    if (job.attempt >= job.maxRetries) {
      // Max retries exceeded - move to dead letter queue
      await moveToDeadLetterQueue(job, (error as Error).message);
      return;
    }
 
    const delay = calculateRetryDelay(job.attempt);
    await scheduleRetry(job, delay);
  }
}

Pattern 2: Idempotency Keys

Network issues can cause a webhook to be delivered successfully but the acknowledgment lost. The sender thinks it failed and retries. The receiver processes the event twice.

Idempotency keys prevent duplicate processing:

// On the sender side: generate a deterministic key
function generateIdempotencyKey(
  eventType: string,
  resourceId: string,
  timestamp: number
): string {
  return createHash('sha256')
    .update(`${eventType}:${resourceId}:${timestamp}`)
    .digest('hex');
}
// On the receiver side: check before processing
async function handleWebhook(req: Request): Promise<Response> {
  const idempotencyKey = req.headers.get('x-idempotency-key');
 
  if (!idempotencyKey) {
    return new Response('Missing idempotency key', { status: 400 });
  }
 
  // Check if we already processed this event
  const existing = await db.query.processedWebhooks.findFirst({
    where: eq(processedWebhooks.idempotencyKey, idempotencyKey),
  });
 
  if (existing) {
    // Already processed - return success without reprocessing
    return new Response('OK', { status: 200 });
  }
 
  // Process the webhook
  const body = await req.json();
  await processEvent(body);
 
  // Record that we processed it
  await db.insert(processedWebhooks).values({
    idempotencyKey,
    eventType: req.headers.get('x-webhook-event') ?? 'unknown',
    processedAt: new Date(),
  });
 
  return new Response('OK', { status: 200 });
}

Keep processed webhook records for at least 7 days. After that, you can safely assume no more retries are coming.

Pattern 3: Dead Letter Queues

When a webhook exhausts all retries, you need somewhere to put it. A dead letter queue (DLQ) stores failed deliveries for manual inspection and replay.

// dead-letter-queue.ts
interface DeadLetterEntry {
  id: string;
  webhookId: string;
  url: string;
  payload: Record<string, unknown>;
  eventType: string;
  lastError: string;
  attempts: number;
  createdAt: Date;
  failedAt: Date;
}
 
async function moveToDeadLetterQueue(
  job: WebhookJob,
  error: string
): Promise<void> {
  await db.insert(deadLetterQueue).values({
    webhookId: job.id,
    url: job.url,
    payload: job.payload,
    eventType: job.eventType,
    lastError: error,
    attempts: job.attempt + 1,
    createdAt: job.createdAt,
    failedAt: new Date(),
  });
 
  // Alert the team
  await notify({
    channel: 'webhook-failures',
    message: `Webhook ${job.id} to ${job.url} failed after ${job.attempt + 1} attempts: ${error}`,
  });
}
 
// Replay a dead letter
async function replayDeadLetter(entryId: string): Promise<void> {
  const entry = await db.query.deadLetterQueue.findFirst({
    where: eq(deadLetterQueue.id, entryId),
  });
 
  if (!entry) throw new Error('Dead letter not found');
 
  // Re-enqueue with reset attempt counter
  await enqueueWebhook({
    url: entry.url,
    payload: entry.payload,
    eventType: entry.eventType,
    attempt: 0,
    maxRetries: 3, // fewer retries on replay
  });
 
  // Remove from DLQ
  await db
    .delete(deadLetterQueue)
    .where(eq(deadLetterQueue.id, entryId));
}

Pattern 4: Webhook Signature Verification

Both sender and receiver should verify webhook authenticity. Without signature verification, anyone can send fake webhooks to your endpoint.

Sender Side: Sign the Payload

import { createHmac } from 'crypto';
 
function signPayload(payload: string, secret: string): string {
  const timestamp = Math.floor(Date.now() / 1000);
  const signedContent = `${timestamp}.${payload}`;
 
  const signature = createHmac('sha256', secret)
    .update(signedContent)
    .digest('hex');
 
  return `t=${timestamp},v1=${signature}`;
}

Receiver Side: Verify the Signature

function verifyWebhookSignature(
  payload: string,
  signatureHeader: string,
  secret: string,
  toleranceSeconds: number = 300
): boolean {
  const parts = Object.fromEntries(
    signatureHeader.split(',').map((part) => {
      const [key, value] = part.split('=');
      return [key, value];
    })
  );
 
  const timestamp = parseInt(parts.t, 10);
  const signature = parts.v1;
 
  // Check timestamp tolerance (prevent replay attacks)
  const now = Math.floor(Date.now() / 1000);
  if (Math.abs(now - timestamp) > toleranceSeconds) {
    return false; // Too old or too far in the future
  }
 
  // Verify signature
  const signedContent = `${timestamp}.${payload}`;
  const expectedSignature = createHmac('sha256', secret)
    .update(signedContent)
    .digest('hex');
 
  // Use timing-safe comparison to prevent timing attacks
  return timingSafeEqual(
    Buffer.from(signature),
    Buffer.from(expectedSignature)
  );
}

The timestamp check prevents replay attacks. An attacker who captures a valid webhook cannot replay it after the tolerance window expires.

Ordering Guarantees

Webhooks are delivered asynchronously. If you send event A then event B, event B might arrive first due to network variance or retries. There are two approaches:

Option 1: Accept out-of-order delivery. Include a sequence number or timestamp in the payload. Let the receiver reorder if needed.

// Include ordering metadata in the payload
const payload = {
  event: 'order.updated',
  data: { orderId: '123', status: 'shipped' },
  metadata: {
    sequence: 42,
    occurredAt: '2026-03-07T10:30:00Z',
  },
};

Option 2: Enforce order per resource. Use a queue with partitioning so events for the same resource are delivered sequentially.

// Partition key ensures same-resource events are sequential
await queue.publish({
  topic: 'webhooks',
  partitionKey: `order:${orderId}`, // all events for this order go to same partition
  payload: webhookPayload,
});

In practice, option 1 is simpler and works for most use cases. Only enforce ordering when the receiver truly cannot handle out-of-order events.

Receiver Best Practices

If you are on the receiving end of webhooks, a few patterns will save you:

// 1. Respond quickly, process async
app.post('/webhooks', async (req, res) => {
  // Verify signature first
  if (!verifyWebhookSignature(req.rawBody, req.headers['x-webhook-signature'], SECRET)) {
    return res.status(401).send('Invalid signature');
  }
 
  // Enqueue for async processing
  await queue.publish({
    topic: 'incoming-webhooks',
    payload: req.body,
  });
 
  // Return 200 immediately
  res.status(200).send('OK');
});
 
// 2. Process in a worker
async function processIncomingWebhook(payload: any) {
  // Idempotency check
  // Business logic
  // Error handling
}

Return 200 within 5 seconds. Do your heavy processing asynchronously. If you take too long to respond, the sender will time out and retry, potentially causing duplicate deliveries.

The Takeaway

Webhooks are unreliable by nature. That is fine. The patterns are well-established: exponential backoff with jitter, idempotency keys, dead letter queues, and signature verification. Implement all four and your webhook system will handle the 3-5% failure rate gracefully.

If you are sending transactional emails through Transactional, our webhook delivery system uses all of these patterns for event notifications (delivered, bounced, opened, clicked). But regardless of your provider, build these patterns into your own webhook infrastructure.

Written by

Transactional Team

Share
Tags:
tutorial
webhooks
architecture

YOUR AGENTS DESERVE
REAL INFRASTRUCTURE.

START BUILDING AGENTS THAT DO REAL WORK.

Deploy Your First Agent