Back to Portfolio See my Github

Voice Agent Demo

Real-time AI voice conversations

Try It Live

Call our demo number to experience the AI voice agent in action.
It demonstrates natural conversation and can answer your questions.

How It Works

Phase 1: Connection

You Call
Your Phone
Dial our demo number from any phone - landline or mobile. This system helps business owners never miss after-hours calls. Customers reach an intelligent AI instead of voicemail.
We Answer
Twilio
Twilio is a cloud platform that routes phone calls through the internet, connecting your call to our AI system instead of a traditional phone line.

Phase 2: Conversation Loop

You Speak
Vocal Chords
Talk naturally into your phone, just like you would with a human. Ask questions, give commands, or have a conversation.
We Listen
Deepgram
Deepgram is an AI speech recognition service that converts your spoken words into text in real-time with remarkable accuracy, even handling accents and background noise.
AI Thinks
Groq
Groq provides ultra-fast AI computing that processes your question and generates an intelligent, contextual response in milliseconds - faster than any other AI infrastructure.
We Respond
Cartesia
Cartesia is AI voice synthesis that converts the text response into natural, human-sounding speech. The voice you hear is generated in real-time, not pre-recorded.
↻ This cycle repeats for the entire conversation

Total response time: under 1 second

How It's Built

A real-time voice pipeline powered by best-in-class APIs, orchestrated on Fly.io infrastructure.

Twilio Media Streams
Deepgram STT
Groq Llama 3.3 70B
Cartesia Sonic 3
Neon PostgreSQL
Fly.io

What It Does

Natural Conversation Back-and-forth dialogue that sounds and feels human
Business Context Answers questions using your knowledge base and data
Human-Like Voice Ultra-realistic speech synthesis with sub-second latency

Performance

~1s
Total Response Latency
8kHz
Audio Quality (PCM mu-law)
Direct WS
No SDK overhead

Challenges Faced

End-Call Handling In Progress

Two problems surfaced during live demo testing. First, after the agent closed the WebSocket the caller stayed on a silent line for nearly a minute because the TwiML had a <Pause> verb running after <Connect><Stream> ended. Removed the Pause so Twilio ends the PSTN leg when the stream closes. Second, the original goodbye detector matched natural phrases like "take care" and "goodbye" anywhere in an AI response, which meant the system hung up on callers mid-question whenever the model happened to use one of those phrases. Replaced it with a silent sentinel token [[END_CALL]] that the model emits only after the caller has explicitly wrapped up. The handler strips the token before sending to TTS, then closes the connection once the spoken audio finishes. Still tuning the prompt's negative examples so the model does not emit the sentinel proactively.

Context Window Growth Resolved

Conversation history grew with each turn (user message + AI response + tool results), increasing token count from ~1,200 to ~2,500+ over 4-5 turns. This caused latency creep and would have eventually hit context limits. Solution: added context window management to keep token usage stable across longer calls.

TTS Request Queuing Resolved

Concurrent TTS requests were hitting Cartesia's WebSocket connection limit (max 2), causing audio failures. Solution: added queueSpeakText() method in cartesia.js that processes TTS requests sequentially, preventing rate limit errors.

SDK WebSocket Bug Resolved

The Cartesia TTS SDK v2.2.9 had a critical bug where websocket.send() hung indefinitely in Node.js, causing 10-second timeouts and complete silence. Solution: bypassed the SDK entirely and implemented direct WebSocket connection to Cartesia's API.

LLM Function Call Leakage Resolved

LLMs sometimes "leak" internal function call syntax into their spoken responses, saying things like "Note: customer wants..." instead of speaking naturally while silently collecting data. Solution: explicit prompt instructions with examples of bad vs good responses.

Database View Aggregation Bug Resolved

The agent was loading 1,286 Q&A entries instead of 7, causing 13-14k tokens per LLM call and growing latency (3s → 4s → 5s). Root cause: a database view was aggregating Q&As from ALL users instead of just the current user. Diagnosis came from noticing the massive token count in logs. Solution: fixed the data source to return only the user's own Q&A entries.