Webhooks Will Fail. Here are the Retry and Idempotency Patterns That Save You.
Practical patterns for reliable webhook delivery: exponential backoff with jitter, idempotency keys, dead letter queues, and signature verification. TypeScript code included.
Transactional Team
Mar 7, 2026
>>
9 min read
Share
Webhooks fail. Not sometimes. Regularly. Industry data shows that about 3-5% of webhook attempts fail on the first try. The receiver is deploying, their server is overloaded, there is a DNS blip, or the connection times out.
The question is not whether your webhooks will fail. The question is whether your system handles failures gracefully or silently drops events.
What You Will Learn
Why webhooks fail and how often
Implementing exponential backoff with jitter
Idempotency keys to prevent duplicate processing
Dead letter queues for permanently failed deliveries
Webhook signature verification
Ordering guarantees (and when to skip them)
Webhook Delivery: Naive vs Resilient Retry Strategy
Naive (Fixed Retry)Resilient (Backoff + Jitter)
First-Attempt Failure Rate3-5%3-5%
Final Delivery Rate~90%99.7%
Duplicate ProcessingCommonPrevented
Failed Event RecoveryLostDead Letter Queue
Thundering Herd RiskHighEliminated
Why Webhooks Fail
Common failure modes, roughly ordered by frequency:
Timeouts (40%) -- receiver takes too long to respond
5xx errors (25%) -- receiver's server is down or erroring
Connection refused (15%) -- receiver's server is not listening
DNS resolution failures (10%) -- temporary DNS issues
TLS errors (5%) -- certificate problems
4xx errors (5%) -- misconfigured endpoint, bad URL
The important distinction: some failures are transient (timeouts, 5xx, DNS) and worth retrying. Others are permanent (404, 401) and retrying will never succeed.
Pattern 1: Exponential Backoff with Jitter
Naive retry (retry immediately, fixed intervals) causes thundering herd problems. If a receiver goes down for 5 minutes and you are retrying every 30 seconds, you hit them with a burst of accumulated retries the moment they come back up.
Exponential backoff with jitter solves this:
// retry.tsinterface RetryConfig { maxRetries: number; baseDelayMs: number; maxDelayMs: number;}const DEFAULT_RETRY_CONFIG: RetryConfig = { maxRetries: 8, baseDelayMs: 1_000, // 1 second maxDelayMs: 86_400_000, // 24 hours};function calculateRetryDelay( attempt: number, config: RetryConfig = DEFAULT_RETRY_CONFIG): number { // Exponential backoff: 1s, 2s, 4s, 8s, 16s, 32s, 64s, 128s... const exponentialDelay = config.baseDelayMs * Math.pow(2, attempt); // Cap at max delay const cappedDelay = Math.min(exponentialDelay, config.maxDelayMs); // Add jitter: random value between 0 and the calculated delay // Full jitter prevents thundering herd const jitter = Math.random() * cappedDelay; return Math.floor(jitter);}// Retry schedule example (approximate, due to jitter):// Attempt 0: 0-1s// Attempt 1: 0-2s// Attempt 2: 0-4s// Attempt 3: 0-8s// Attempt 4: 0-16s// Attempt 5: 0-32s// Attempt 6: 0-64s (1 min)// Attempt 7: 0-128s (2 min)
Network issues can cause a webhook to be delivered successfully but the acknowledgment lost. The sender thinks it failed and retries. The receiver processes the event twice.
Idempotency keys prevent duplicate processing:
// On the sender side: generate a deterministic keyfunction generateIdempotencyKey( eventType: string, resourceId: string, timestamp: number): string { return createHash('sha256') .update(`${eventType}:${resourceId}:${timestamp}`) .digest('hex');}
// On the receiver side: check before processingasync function handleWebhook(req: Request): Promise<Response> { const idempotencyKey = req.headers.get('x-idempotency-key'); if (!idempotencyKey) { return new Response('Missing idempotency key', { status: 400 }); } // Check if we already processed this event const existing = await db.query.processedWebhooks.findFirst({ where: eq(processedWebhooks.idempotencyKey, idempotencyKey), }); if (existing) { // Already processed - return success without reprocessing return new Response('OK', { status: 200 }); } // Process the webhook const body = await req.json(); await processEvent(body); // Record that we processed it await db.insert(processedWebhooks).values({ idempotencyKey, eventType: req.headers.get('x-webhook-event') ?? 'unknown', processedAt: new Date(), }); return new Response('OK', { status: 200 });}
Keep processed webhook records for at least 7 days. After that, you can safely assume no more retries are coming.
Pattern 3: Dead Letter Queues
When a webhook exhausts all retries, you need somewhere to put it. A dead letter queue (DLQ) stores failed deliveries for manual inspection and replay.
// dead-letter-queue.tsinterface DeadLetterEntry { id: string; webhookId: string; url: string; payload: Record<string, unknown>; eventType: string; lastError: string; attempts: number; createdAt: Date; failedAt: Date;}async function moveToDeadLetterQueue( job: WebhookJob, error: string): Promise<void> { await db.insert(deadLetterQueue).values({ webhookId: job.id, url: job.url, payload: job.payload, eventType: job.eventType, lastError: error, attempts: job.attempt + 1, createdAt: job.createdAt, failedAt: new Date(), }); // Alert the team await notify({ channel: 'webhook-failures', message: `Webhook ${job.id} to ${job.url} failed after ${job.attempt + 1} attempts: ${error}`, });}// Replay a dead letterasync function replayDeadLetter(entryId: string): Promise<void> { const entry = await db.query.deadLetterQueue.findFirst({ where: eq(deadLetterQueue.id, entryId), }); if (!entry) throw new Error('Dead letter not found'); // Re-enqueue with reset attempt counter await enqueueWebhook({ url: entry.url, payload: entry.payload, eventType: entry.eventType, attempt: 0, maxRetries: 3, // fewer retries on replay }); // Remove from DLQ await db .delete(deadLetterQueue) .where(eq(deadLetterQueue.id, entryId));}
Pattern 4: Webhook Signature Verification
Both sender and receiver should verify webhook authenticity. Without signature verification, anyone can send fake webhooks to your endpoint.
function verifyWebhookSignature( payload: string, signatureHeader: string, secret: string, toleranceSeconds: number = 300): boolean { const parts = Object.fromEntries( signatureHeader.split(',').map((part) => { const [key, value] = part.split('='); return [key, value]; }) ); const timestamp = parseInt(parts.t, 10); const signature = parts.v1; // Check timestamp tolerance (prevent replay attacks) const now = Math.floor(Date.now() / 1000); if (Math.abs(now - timestamp) > toleranceSeconds) { return false; // Too old or too far in the future } // Verify signature const signedContent = `${timestamp}.${payload}`; const expectedSignature = createHmac('sha256', secret) .update(signedContent) .digest('hex'); // Use timing-safe comparison to prevent timing attacks return timingSafeEqual( Buffer.from(signature), Buffer.from(expectedSignature) );}
The timestamp check prevents replay attacks. An attacker who captures a valid webhook cannot replay it after the tolerance window expires.
Ordering Guarantees
Webhooks are delivered asynchronously. If you send event A then event B, event B might arrive first due to network variance or retries. There are two approaches:
Option 1: Accept out-of-order delivery. Include a sequence number or timestamp in the payload. Let the receiver reorder if needed.
// Include ordering metadata in the payloadconst payload = { event: 'order.updated', data: { orderId: '123', status: 'shipped' }, metadata: { sequence: 42, occurredAt: '2026-03-07T10:30:00Z', },};
Option 2: Enforce order per resource. Use a queue with partitioning so events for the same resource are delivered sequentially.
// Partition key ensures same-resource events are sequentialawait queue.publish({ topic: 'webhooks', partitionKey: `order:${orderId}`, // all events for this order go to same partition payload: webhookPayload,});
In practice, option 1 is simpler and works for most use cases. Only enforce ordering when the receiver truly cannot handle out-of-order events.
Receiver Best Practices
If you are on the receiving end of webhooks, a few patterns will save you:
// 1. Respond quickly, process asyncapp.post('/webhooks', async (req, res) => { // Verify signature first if (!verifyWebhookSignature(req.rawBody, req.headers['x-webhook-signature'], SECRET)) { return res.status(401).send('Invalid signature'); } // Enqueue for async processing await queue.publish({ topic: 'incoming-webhooks', payload: req.body, }); // Return 200 immediately res.status(200).send('OK');});// 2. Process in a workerasync function processIncomingWebhook(payload: any) { // Idempotency check // Business logic // Error handling}
Return 200 within 5 seconds. Do your heavy processing asynchronously. If you take too long to respond, the sender will time out and retry, potentially causing duplicate deliveries.
The Takeaway
Webhooks are unreliable by nature. That is fine. The patterns are well-established: exponential backoff with jitter, idempotency keys, dead letter queues, and signature verification. Implement all four and your webhook system will handle the 3-5% failure rate gracefully.
If you are sending transactional emails through Transactional, our webhook delivery system uses all of these patterns for event notifications (delivered, bounced, opened, clicked). But regardless of your provider, build these patterns into your own webhook infrastructure.