Back to Portfolio See my Github

Voice Agent Demo

Real-time AI voice conversations

Try It Live

Call our demo number to experience the AI voice agent in action.
It demonstrates natural conversation and can answer your questions.

How It Works

Phase 1: Connection

You Call
Your Phone
Dial our demo number from any phone - landline or mobile. This system helps business owners never miss after-hours calls. Customers reach an intelligent AI instead of voicemail.
We Answer
Twilio
Twilio is a cloud platform that routes phone calls through the internet, connecting your call to our AI system instead of a traditional phone line.

Phase 2: Conversation Loop

You Speak
Vocal Chords
Talk naturally into your phone, just like you would with a human. Ask questions, give commands, or have a conversation.
We Listen
Deepgram
Deepgram is an AI speech recognition service that converts your spoken words into text in real-time with remarkable accuracy, even handling accents and background noise.
AI Thinks
Groq
Groq provides ultra-fast AI computing that processes your question and generates an intelligent, contextual response in milliseconds - faster than any other AI infrastructure.
We Respond
Cartesia
Cartesia is AI voice synthesis that converts the text response into natural, human-sounding speech. The voice you hear is generated in real-time, not pre-recorded.
↻ This cycle repeats for the entire conversation

Total response time: under 1 second

How It's Built

A real-time voice pipeline powered by best-in-class APIs, orchestrated on Fly.io infrastructure.

Twilio Media Streams
Deepgram STT
Groq Llama 3.3 70B
Cartesia Sonic 3
Neon PostgreSQL
Fly.io

What It Does

Natural Conversation Back-and-forth dialogue that sounds and feels human
Business Context Answers questions using your knowledge base and data
Human-Like Voice Ultra-realistic speech synthesis with sub-second latency

Performance

~1s
Total Response Latency
8kHz
Audio Quality (PCM mu-law)
Direct WS
No SDK overhead

Challenges Faced

Context Window Growth In Progress

Conversation history grows with each turn (user message + AI response + tool results), increasing token count from ~1,200 to ~2,500+ over 4-5 turns. This causes latency creep and will eventually hit context limits. Solution: implement sliding window or conversation summarization to maintain consistent performance over longer calls.

TTS Request Queuing Resolved

Concurrent TTS requests were hitting Cartesia's WebSocket connection limit (max 2), causing audio failures. Solution: added queueSpeakText() method in cartesia.js that processes TTS requests sequentially, preventing rate limit errors.

SDK WebSocket Bug Resolved

The Cartesia TTS SDK v2.2.9 had a critical bug where websocket.send() hung indefinitely in Node.js, causing 10-second timeouts and complete silence. Solution: bypassed the SDK entirely and implemented direct WebSocket connection to Cartesia's API.

LLM Function Call Leakage Resolved

LLMs sometimes "leak" internal function call syntax into their spoken responses, saying things like "Note: customer wants..." instead of speaking naturally while silently collecting data. Solution: explicit prompt instructions with examples of bad vs good responses.

Database View Aggregation Bug Resolved

The agent was loading 1,286 Q&A entries instead of 7, causing 13-14k tokens per LLM call and growing latency (3s → 4s → 5s). Root cause: a database view was aggregating Q&As from ALL users instead of just the current user. Diagnosis came from noticing the massive token count in logs. Solution: fixed the data source to return only the user's own Q&A entries.