Voice Agent Demo - Cameron O'Brien

Try It Live

Call our demo number to experience the AI voice agent in action.
It demonstrates natural conversation and can answer your questions.

How It Works

Phase 1: Connection

You Call

Your Phone

Dial our demo number from any phone - landline or mobile. This system helps business owners never miss after-hours calls. Customers reach an intelligent AI instead of voicemail.

We Answer

Twilio

Twilio is a cloud platform that routes phone calls through the internet, connecting your call to our AI system instead of a traditional phone line.

Phase 2: Conversation Loop

You Speak

Vocal Chords

Talk naturally into your phone, just like you would with a human. Ask questions, give commands, or have a conversation.

We Listen

Deepgram

Deepgram is an AI speech recognition service that converts your spoken words into text in real-time with remarkable accuracy, even handling accents and background noise.

AI Thinks

Groq

Groq provides ultra-fast AI computing that processes your question and generates an intelligent, contextual response in milliseconds - faster than any other AI infrastructure.

We Respond

Cartesia

Cartesia is AI voice synthesis that converts the text response into natural, human-sounding speech. The voice you hear is generated in real-time, not pre-recorded.

↻ This cycle repeats for the entire conversation

Total response time: under 1 second

How It's Built

A real-time voice pipeline powered by best-in-class APIs, orchestrated on Fly.io infrastructure.

Twilio Media Streams

Deepgram STT

Groq Llama 3.3 70B

Cartesia Sonic 3

Neon PostgreSQL

Fly.io

What It Does

Natural Conversation Back-and-forth dialogue that sounds and feels human

Business Context Answers questions using your knowledge base and data

Human-Like Voice Ultra-realistic speech synthesis with sub-second latency

Performance

~1s

Total Response Latency

8kHz

Audio Quality (PCM mu-law)

Direct WS

No SDK overhead

Challenges Faced

Context Window Growth In Progress

Conversation history grows with each turn (user message + AI response + tool results), increasing token count from ~1,200 to ~2,500+ over 4-5 turns. This causes latency creep and will eventually hit context limits. Solution: implement sliding window or conversation summarization to maintain consistent performance over longer calls.

TTS Request Queuing Resolved

Concurrent TTS requests were hitting Cartesia's WebSocket connection limit (max 2), causing audio failures. Solution: added queueSpeakText() method in cartesia.js that processes TTS requests sequentially, preventing rate limit errors.

SDK WebSocket Bug Resolved

The Cartesia TTS SDK v2.2.9 had a critical bug where websocket.send() hung indefinitely in Node.js, causing 10-second timeouts and complete silence. Solution: bypassed the SDK entirely and implemented direct WebSocket connection to Cartesia's API.

LLM Function Call Leakage Resolved

LLMs sometimes "leak" internal function call syntax into their spoken responses, saying things like "Note: customer wants..." instead of speaking naturally while silently collecting data. Solution: explicit prompt instructions with examples of bad vs good responses.

Database View Aggregation Bug Resolved

The agent was loading 1,286 Q&A entries instead of 7, causing 13-14k tokens per LLM call and growing latency (3s → 4s → 5s). Root cause: a database view was aggregating Q&As from ALL users instead of just the current user. Diagnosis came from noticing the massive token count in logs. Solution: fixed the data source to return only the user's own Q&A entries.