Case Study
•Designing a Fault-Tolerant Multi-Dialing System for AI-Driven Voice Calls
Designing a Fault-Tolerant Multi-Dialing System for AI-Driven Voice Calls
Designing and implementing a low-latency, fault-tolerant multi-dialing system that orchestrated AI and human-assisted outbound and inbound calls at scale.
Context & Problem
AI Calls was built to automate outbound cold calls and handle inbound customer calls using AI. Users could upload leads via Excel or connect their CRM, configure AI agents, and launch calling campaigns where multiple phone numbers dialed leads in parallel. While calling a single number was straightforward, coordinating multiple numbers, campaigns, and AI conversations in real time introduced significant system design challenges.
The core challenge was multi-dialing orchestration under real-world constraints. A single user could assign up to four phone numbers to a campaign, and each lead could have multiple phone numbers. The system had to ensure that:
- A lead is never called more than once at the same time
- A phone number never dials a lead already being contacted by another number
- When one call connects, all parallel calls are immediately cancelled
- Campaigns can be paused and resumed without losing progress
- Server crashes or restarts do not cause duplicate calls or infinite loops
At peak usage, campaigns ran in parallel across accounts, handling thousands of calls per day. Any inconsistency in state management could result in duplicate calls, dropped leads, or stalled campaigns, making correctness and fault tolerance more important than raw throughput.
Key Engineering Decisions
Worker–Task Based Dialing Algorithm
I modeled each phone number as a worker and each call attempt as a task. When a campaign started, workers were created for each number and tasks were generated for each lead–number combination. Tasks were assigned in groups so that all parallel calls for a single lead were coordinated, while already-assigned tasks were skipped during scheduling. This ensured no lead or number was ever double-booked.
State-Driven Call Outcomes and Retries
Every call attempt stored its outcome (human answered, machine detected, unreachable, cancelled, etc.). This enabled flexible retry strategies, accurate campaign progression, and future workflows such as flagging problematic numbers. Once all tasks for a lead completed, workers automatically advanced to the next available lead.
Fault Tolerance via Persistent State and Recovery
Campaign state was persisted using a combination of database storage and Redis. In case of server crashes or restarts, cron-based recovery jobs resumed campaigns from the last known safe state, preventing infinite loops or duplicate calls.
Real-Time AI Call Pipeline
Because no end-to-end AI voice platforms were available at the time, I designed a custom streaming pipeline: Vonage streamed audio (640-byte buffers every 25ms) to Deepgram for speech-to-text, GPT generated responses, AssemblyAI handled text-to-speech, and audio was streamed back to Vonage in near real time.
Low-Latency Optimization via Dedicated AI Service
To reduce end-to-end latency to ~2 seconds, I introduced a dedicated microservice (‘brain’) responsible solely for AI audio streams. This service was resource-prioritized, used optimized prompts and models, and carefully managed audio buffers to avoid bottlenecks.
Trade-offs & Constraints
The system introduced significant complexity: coordinating workers and tasks, maintaining consistent state across failures, and managing a real-time AI pipeline. There was also operational overhead in running multiple services and tuning AI latency. These trade-offs were accepted to guarantee correctness, avoid duplicate calls, and deliver a reliable experience under concurrent campaign execution.
Business Impact
Handled concurrent outbound campaigns with up to four parallel calls per account
End-to-end AI response latency achieved through streaming and service isolation
Reached production usage with multiple contracts before acquisition
What I'd improve at 10× scale
At higher scale, I would introduce distributed worker coordination using a queue-based scheduler, stronger idempotency guarantees around call initiation, and regional deployment of the AI streaming service to further reduce latency. I would also explore newer end-to-end AI voice platforms to simplify parts of the pipeline while preserving orchestration control.
