Try It Live
Call our demo number to experience the AI voice agent in action.
It demonstrates natural conversation and can answer your questions.
How It Works
Phase 1: Connection
Phase 2: Conversation Loop
Total response time: under 1 second
How It's Built
A real-time voice pipeline powered by best-in-class APIs, orchestrated on Fly.io infrastructure.
What It Does
Performance
Challenges Faced
Conversation history grows with each turn (user message + AI response + tool results), increasing token count from ~1,200 to ~2,500+ over 4-5 turns. This causes latency creep and will eventually hit context limits. Solution: implement sliding window or conversation summarization to maintain consistent performance over longer calls.
Concurrent TTS requests were hitting Cartesia's WebSocket connection limit (max 2), causing audio failures. Solution: added queueSpeakText() method in cartesia.js that processes TTS requests sequentially, preventing rate limit errors.
The Cartesia TTS SDK v2.2.9 had a critical bug where websocket.send() hung indefinitely in Node.js, causing 10-second timeouts and complete silence. Solution: bypassed the SDK entirely and implemented direct WebSocket connection to Cartesia's API.
LLMs sometimes "leak" internal function call syntax into their spoken responses, saying things like "Note: customer wants..." instead of speaking naturally while silently collecting data. Solution: explicit prompt instructions with examples of bad vs good responses.
The agent was loading 1,286 Q&A entries instead of 7, causing 13-14k tokens per LLM call and growing latency (3s → 4s → 5s). Root cause: a database view was aggregating Q&As from ALL users instead of just the current user. Diagnosis came from noticing the massive token count in logs. Solution: fixed the data source to return only the user's own Q&A entries.