Latency Optimization for a Production Voice AI Platform
A systems-level latency study of a production voice-AI platform (Anthropic Claude intent detection + ElevenLabs TTS over a NestJS WebSocket pipeline). The central finding is that running intent detection and TTS concurrently — rather than shaving any single stage — is the highest-leverage optimization, cutting median end-to-end latency from 3,277 ms to 1,909 ms.
Accepted at ICANN 2026 (peer-reviewed; Springer LNCS proceedings). Registration confirmed; camera-ready June 2026.