Skip to main content
Normalized for Mintlify from knowledge-base/aiconnected-apps-and-modules/modules/aiConnected-voice/aiConnected-voice-junior-dev-prd.mdx.

Voice by aiConnected - Junior Developer PRD

Comprehensive Outline

Purpose: This outline defines a PRD detailed enough that a junior developer with no prior context could build the entire system. Every decision is documented. Nothing is assumed.

PART 1: Foundation & Context

Estimated: 15-20 pages

1. Project Overview

  • 1.1 What We’re Building (plain English)
  • 1.2 Why We’re Building It (business problem)
  • 1.3 Who It’s For (target users)
  • 1.4 Success Looks Like (measurable outcomes)

2. Glossary of Terms

  • 2.1 Telephony Terms (PSTN, SIP, DTMF, IVR, PBX, etc.)
  • 2.2 WebRTC Terms (ICE, STUN, TURN, SDP, etc.)
  • 2.3 AI/ML Terms (LLM, STT, TTS, VAD, embeddings, etc.)
  • 2.4 Platform Terms (tenant, agency, knowledge base, etc.)
  • 2.5 Infrastructure Terms (container, webhook, WebSocket, etc.)

3. Architecture Overview

  • 3.1 System Diagram (with explanation of each box)
  • 3.2 Data Flow Narrative (step-by-step what happens on a call)
  • 3.3 Technology Choices (what we’re using and WHY)
  • 3.4 What We’re NOT Building (explicit scope boundaries)

4. Development Environment Setup

  • 4.1 Required Accounts & API Keys
  • 4.2 Local Development Tools
  • 4.3 Repository Structure
  • 4.4 Environment Variables Reference
  • 4.5 How to Run Locally

PART 2: Database Design

Estimated: 20-25 pages

5. Database Architecture

  • 5.1 Why PostgreSQL
  • 5.2 Database Naming Conventions
  • 5.3 Common Patterns Used (UUIDs, timestamps, soft deletes)

6. Schema: Core Entities

  • 6.1 agencies table (full DDL + field explanations)
  • 6.2 tenants table (full DDL + field explanations)
  • 6.3 users table (full DDL + field explanations)
  • 6.4 user_roles and permissions tables

7. Schema: Telephony Entities

  • 7.1 phone_numbers table
  • 7.2 calls table
  • 7.3 call_events table (state machine history)
  • 7.4 call_transfers table

8. Schema: AI & Content Entities

  • 8.1 knowledge_bases table
  • 8.2 knowledge_documents table
  • 8.3 knowledge_chunks table (with embeddings)
  • 8.4 transcripts table
  • 8.5 recordings table

9. Schema: Configuration Entities

  • 9.1 voice_configurations table
  • 9.2 agent_personalities table
  • 9.3 greetings table
  • 9.4 business_hours table

10. Schema: Billing & Analytics

  • 10.1 usage_records table
  • 10.2 billing_events table
  • 10.3 call_analytics table

11. Indexes & Performance

  • 11.1 Required Indexes (with explanations)
  • 11.2 Partitioning Strategy
  • 11.3 Query Patterns to Optimize For

12. Migrations

  • 12.1 Migration File Naming Convention
  • 12.2 Initial Migration Script
  • 12.3 How to Add New Migrations

PART 3: API Design

Estimated: 25-30 pages

13. API Architecture

  • 13.1 REST vs GraphQL Decision
  • 13.2 URL Structure & Naming
  • 13.3 Authentication (JWT implementation)
  • 13.4 Authorization (RBAC implementation)
  • 13.5 Error Response Format
  • 13.6 Pagination Standard
  • 13.7 Rate Limiting

14. Agency Management APIs

  • 14.1 POST /api/v1/agencies - Create agency
  • 14.2 GET /api/v1/agencies/{id} - Get agency
  • 14.3 PUT /api/v1/agencies/{id} - Update agency
  • 14.4 GET /api/v1/agencies/{id}/tenants - List tenants
  • 14.5 GET /api/v1/agencies/{id}/usage - Get usage

15. Tenant Management APIs

  • 15.1 POST /api/v1/tenants - Create tenant
  • 15.2 GET /api/v1/tenants/{id} - Get tenant
  • 15.3 PUT /api/v1/tenants/{id} - Update tenant
  • 15.4 DELETE /api/v1/tenants/{id} - Deactivate tenant
  • 15.5 GET /api/v1/tenants/{id}/config - Get configuration
  • 15.6 PUT /api/v1/tenants/{id}/config - Update configuration

16. Phone Number APIs

  • 16.1 GET /api/v1/phone-numbers/available - Search available
  • 16.2 POST /api/v1/phone-numbers - Provision number
  • 16.3 GET /api/v1/tenants/{id}/phone-numbers - List tenant numbers
  • 16.4 PUT /api/v1/phone-numbers/{id} - Configure number
  • 16.5 DELETE /api/v1/phone-numbers/{id} - Release number

17. Call Control APIs

  • 17.1 POST /api/v1/calls/outbound - Initiate call
  • 17.2 GET /api/v1/calls/{id} - Get call status
  • 17.3 POST /api/v1/calls/{id}/transfer - Transfer call
  • 17.4 POST /api/v1/calls/{id}/hold - Hold call
  • 17.5 POST /api/v1/calls/{id}/resume - Resume call
  • 17.6 POST /api/v1/calls/{id}/hangup - End call
  • 17.7 GET /api/v1/tenants/{id}/calls - List calls

18. Knowledge Base APIs

  • 18.1 POST /api/v1/tenants/{id}/knowledge-base - Create KB
  • 18.2 GET /api/v1/tenants/{id}/knowledge-base - Get KB
  • 18.3 POST /api/v1/knowledge-bases/{id}/documents - Add document
  • 18.4 DELETE /api/v1/knowledge-bases/{id}/documents/{did} - Remove
  • 18.5 POST /api/v1/knowledge-bases/{id}/query - Query KB

19. Recording & Transcript APIs

  • 19.1 GET /api/v1/calls/{id}/recording - Get recording URL
  • 19.2 GET /api/v1/calls/{id}/transcript - Get transcript
  • 19.3 GET /api/v1/tenants/{id}/recordings - List recordings

20. Analytics APIs

  • 20.1 GET /api/v1/tenants/{id}/analytics/summary - Dashboard data
  • 20.2 GET /api/v1/tenants/{id}/analytics/calls - Call metrics
  • 20.3 GET /api/v1/tenants/{id}/analytics/usage - Usage metrics

21. Webhook Endpoints (Inbound)

  • 21.1 POST /webhooks/gotoconnect - Call events
  • 21.2 POST /webhooks/livekit - Room events
  • 21.3 POST /webhooks/deepgram - Transcription events
  • 21.4 Webhook signature validation

PART 4: GoToConnect Integration

Estimated: 20-25 pages

22. GoToConnect Account Setup

  • 22.1 Required Account Type
  • 22.2 API Credentials Location
  • 22.3 Webhook Configuration Steps
  • 22.4 Phone Number Provisioning

23. GoToConnect Authentication

  • 23.1 OAuth 2.0 Flow (step-by-step)
  • 23.2 Token Storage
  • 23.3 Token Refresh Logic
  • 23.4 Error Handling

24. Webhook Events from GoToConnect

  • 24.1 call.ringing - Inbound call arriving
  • 24.2 call.answered - Call connected
  • 24.3 call.ended - Call terminated
  • 24.4 Event Payload Schemas
  • 24.5 Event Processing Logic

25. GoToConnect API Calls

  • 25.1 Answer Call
  • 25.2 Transfer Call
  • 25.3 Hold/Resume
  • 25.4 Hangup
  • 25.5 Get Call Status
  • 25.6 List Lines/Extensions

26. Phone Number Management

  • 26.1 Search Available Numbers
  • 26.2 Provision Number
  • 26.3 Configure Number Routing
  • 26.4 Release Number

27. Ooma WebRTC Softphone Integration

  • 27.1 What is Ooma Softphone
  • 27.2 Why We Need It
  • 27.3 Auto-Answer Configuration
  • 27.4 Audio Stream Access

PART 5: WebRTC Bridge Service

Estimated: 20-25 pages

28. Bridge Architecture

  • 28.1 Purpose of the Bridge
  • 28.2 Component Diagram
  • 28.3 Threading Model
  • 28.4 State Machine

29. Browser Automation Layer

  • 29.1 Puppeteer/Playwright Setup
  • 29.2 Ooma Login Automation
  • 29.3 Session Management
  • 29.4 Health Monitoring
  • 29.5 Crash Recovery

30. Audio Capture

  • 30.1 Capturing Browser Audio
  • 30.2 Audio Format (sample rate, channels, encoding)
  • 30.3 Buffer Management
  • 30.4 Latency Considerations

31. LiveKit Connection

  • 31.1 Creating LiveKit Room
  • 31.2 Publishing Audio Track
  • 31.3 Subscribing to Agent Audio
  • 31.4 Track Management

32. Audio Routing

  • 32.1 Caller → Agent Flow
  • 32.2 Agent → Caller Flow
  • 32.3 Mixing (if needed)
  • 32.4 Volume Normalization

33. Bridge Lifecycle

  • 33.1 Initialization Sequence
  • 33.2 Call Setup Sequence
  • 33.3 Active Call Management
  • 33.4 Call Teardown Sequence
  • 33.5 Error Recovery

PART 6: LiveKit Integration

Estimated: 20-25 pages

34. LiveKit Cloud Setup

  • 34.1 Account Creation
  • 34.2 Project Configuration
  • 34.3 API Credentials
  • 34.4 Webhook Configuration

35. Room Management

  • 35.1 Room Naming Convention
  • 35.2 Room Creation Logic
  • 35.3 Room Configuration Options
  • 35.4 Room Deletion/Cleanup

36. Participant Management

  • 36.1 Participant Types (caller, agent, supervisor)
  • 36.2 Participant Identity Format
  • 36.3 Permissions by Role
  • 36.4 Participant Lifecycle

37. Token Generation

  • 37.1 JWT Structure
  • 37.2 Claims & Grants
  • 37.3 Token Service Implementation
  • 37.4 Token Refresh Strategy

38. Audio Track Handling

  • 38.1 Track Publication
  • 38.2 Track Subscription
  • 38.3 Track Quality Settings
  • 38.4 Mute/Unmute

39. LiveKit Webhooks

  • 39.1 Room Started
  • 39.2 Room Finished
  • 39.3 Participant Joined
  • 39.4 Participant Left
  • 39.5 Track Published/Unpublished

40. Recording with Egress

  • 40.1 Egress Types
  • 40.2 Starting Recording
  • 40.3 Stopping Recording
  • 40.4 Storage Configuration
  • 40.5 Recording Retrieval

PART 7: Voice AI Pipeline

Estimated: 25-30 pages

41. Pipeline Architecture

  • 41.1 Component Diagram
  • 41.2 Data Flow (audio in → text → response → audio out)
  • 41.3 Latency Budget Breakdown
  • 41.4 Error Handling Strategy

42. Deepgram STT Integration

  • 42.1 Account Setup
  • 42.2 WebSocket Connection
  • 42.3 Audio Streaming Format
  • 42.4 Transcription Options (model, language, punctuation)
  • 42.5 Handling Interim Results
  • 42.6 Handling Final Results
  • 42.7 Error Recovery

43. Voice Activity Detection (VAD)

  • 43.1 What VAD Does
  • 43.2 Silero VAD Setup
  • 43.3 Configuration Parameters
  • 43.4 Speech Start Detection
  • 43.5 Speech End Detection
  • 43.6 Barge-In Handling

44. Claude LLM Integration

  • 44.1 API Setup
  • 44.2 System Prompt Design
  • 44.3 Conversation History Management
  • 44.4 Streaming Responses
  • 44.5 Function Calling (tools)
  • 44.6 Token Management
  • 44.7 Error Handling

45. Knowledge Base Retrieval (RAG)

  • 45.1 When to Query KB
  • 45.2 Query Construction
  • 45.3 Embedding Generation
  • 45.4 Vector Search
  • 45.5 Context Injection into Prompt
  • 45.6 Citation Handling

46. Chatterbox TTS Integration

  • 46.1 RunPod Setup
  • 46.2 API Endpoint Configuration
  • 46.3 Voice Selection
  • 46.4 Text Preprocessing
  • 46.5 Audio Generation
  • 46.6 Streaming Audio Output
  • 46.7 Error Handling

47. Pipeline Orchestration

  • 47.1 Turn-Taking Logic
  • 47.2 Interruption Handling
  • 47.3 Silence Handling
  • 47.4 Timeout Handling
  • 47.5 Graceful Degradation

PART 8: Agent Service

Estimated: 20-25 pages

48. LiveKit Agents Framework

  • 48.1 What is LiveKit Agents
  • 48.2 Agent Architecture
  • 48.3 Worker Setup
  • 48.4 Agent Dispatch

49. Agent Lifecycle

  • 49.1 Agent Pool Management
  • 49.2 Agent Assignment
  • 49.3 Agent State Machine
  • 49.4 Agent Cleanup

50. Conversation State

  • 50.1 State Structure
  • 50.2 State Persistence
  • 50.3 State Transitions
  • 50.4 State Recovery

51. Intent Handling

  • 51.1 Intent Detection Approach
  • 51.2 Common Intents
  • 51.3 Intent → Action Mapping
  • 51.4 Fallback Handling

52. Call Actions

  • 52.1 Transfer to Human
  • 52.2 Transfer to Another AI
  • 52.3 Place on Hold
  • 52.4 Schedule Callback
  • 52.5 End Call

53. Multi-Tenant Agent Configuration

  • 53.1 Loading Tenant Config
  • 53.2 Personality Injection
  • 53.3 Voice Selection
  • 53.4 Knowledge Base Binding

PART 9: Multi-Tenancy & Security

Estimated: 15-20 pages

54. Multi-Tenant Architecture

  • 54.1 Tenant Isolation Model
  • 54.2 Data Segregation
  • 54.3 Resource Quotas
  • 54.4 Tenant Context Propagation

55. Authentication

  • 55.1 JWT Implementation Details
  • 55.2 Token Claims
  • 55.3 Token Validation
  • 55.4 Session Management

56. Authorization

  • 56.1 Role Definitions
  • 56.2 Permission Matrix
  • 56.3 RBAC Implementation
  • 56.4 Resource-Level Permissions

57. Data Security

  • 57.1 Encryption at Rest
  • 57.2 Encryption in Transit
  • 57.3 PII Handling
  • 57.4 Data Retention Policies

58. API Security

  • 58.1 Rate Limiting Implementation
  • 58.2 Input Validation
  • 58.3 SQL Injection Prevention
  • 58.4 CORS Configuration

59. Audit Logging

  • 59.1 What to Log
  • 59.2 Log Format
  • 59.3 Log Storage
  • 59.4 Log Retention

PART 10: Deployment & Operations

Estimated: 20-25 pages

60. Infrastructure Setup

  • 60.1 DigitalOcean Configuration
  • 60.2 Dokploy Setup
  • 60.3 Network Architecture
  • 60.4 SSL/TLS Configuration

61. Container Configuration

  • 61.1 Dockerfile for Each Service
  • 61.2 Docker Compose (local dev)
  • 61.3 Resource Limits
  • 61.4 Health Checks

62. Environment Configuration

  • 62.1 Environment Variables Reference
  • 62.2 Secrets Management
  • 62.3 Configuration by Environment

63. CI/CD Pipeline

  • 63.1 GitHub Actions Setup
  • 63.2 Build Process
  • 63.3 Test Process
  • 63.4 Deploy Process

64. Monitoring

  • 64.1 Health Check Endpoints
  • 64.2 Metrics Collection
  • 64.3 Log Aggregation
  • 64.4 Alerting Rules

65. Scaling

  • 65.1 Horizontal Scaling Strategy
  • 65.2 Auto-Scaling Configuration
  • 65.3 Load Balancing
  • 65.4 Database Scaling

66. Disaster Recovery

  • 66.1 Backup Strategy
  • 66.2 Recovery Procedures
  • 66.3 Failover Configuration

67. Runbooks

  • 67.1 Common Issues & Fixes
  • 67.2 Escalation Procedures
  • 67.3 Incident Response

68. Cost Management

  • 68.1 Cost Breakdown by Component
  • 68.2 Cost Monitoring
  • 68.3 Optimization Strategies

Summary: 10 Parts

PartSectionsFocus AreaEst. Pages
11-4Foundation & Context15-20
25-12Database Design20-25
313-21API Design25-30
422-27GoToConnect Integration20-25
528-33WebRTC Bridge Service20-25
634-40LiveKit Integration20-25
741-47Voice AI Pipeline25-30
848-53Agent Service20-25
954-59Multi-Tenancy & Security15-20
1060-68Deployment & Operations20-25
Total: 68 sections across 10 parts, ~200-250 pages

Part 1: Foundation & Context

Document Version: 1.0
Last Updated: January 25, 2026
Part: 1 of 10
Sections: 1-4
Audience: Junior developers with no prior context

Section 1: Project Overview

1.1 What We’re Building (Plain English)

Voice by aiConnected is a white-label Voice AI contact center platform. Let’s break down what each of those words means: White-label: The platform is designed to be rebranded. When Agency X uses our platform to serve their client (a dental office), the dental office never sees “aiConnected” anywhere. They see Agency X’s branding. We’re invisible. We’re the infrastructure behind the scenes. Voice AI: The core product is an artificial intelligence that talks on the phone. Real phone calls. A human calls a phone number, and an AI answers. The AI can:
  • Understand what the human is saying (speech-to-text)
  • Figure out what they want (intent recognition)
  • Look up information to answer questions (knowledge base)
  • Generate appropriate responses (large language model)
  • Speak the response out loud (text-to-speech)
  • Take actions like transferring to a human, scheduling callbacks, etc.
Contact center: This is industry terminology for “call center” - a centralized system that handles phone communications for a business. Our platform replaces or augments human call center agents with AI agents. Platform: This isn’t a single application. It’s a complete system with multiple services, databases, integrations, and user interfaces that work together.

The Product in One Sentence

Voice by aiConnected lets marketing agencies offer their clients AI-powered phone systems that answer calls, help customers, and sound completely natural - all under the agency’s own brand.

A Concrete Example

  1. Oxford Pierpont (an agency) signs up for Voice by aiConnected
  2. Oxford Pierpont onboards their client, Smile Dental (a dental office)
  3. We provision a phone number: (555) 123-4567
  4. Smile Dental advertises this number for appointments
  5. Sarah (a patient) calls (555) 123-4567
  6. Our AI answers: “Thank you for calling Smile Dental, this is Dr. Smith’s office. How can I help you today?”
  7. Sarah says: “I need to schedule a teeth cleaning”
  8. The AI accesses Smile Dental’s scheduling information and helps Sarah book an appointment
  9. The entire call is recorded and transcribed
  10. Oxford Pierpont can see analytics across all their clients
  11. Smile Dental can see their own call history and transcripts
  12. Sarah never knows she talked to an AI - it sounded that natural

What Makes This Different

Traditional IVRCompetitorsVoice by aiConnected
”Press 1 for sales, press 2 for support…”AI voice, but single-tenantAI voice with full multi-tenant architecture
Frustrating, limitedNo white-label optionComplete white-label - agencies can resell
No intelligenceExpensive ($0.15+/min)Cost-effective (~$0.05-0.08/min to customer)
Can’t understand natural speechLimited customizationPer-tenant knowledge bases, voices, personalities

1.2 Why We’re Building It (Business Problem)

The Pain Points We’re Solving

For Businesses (End Customers like Smile Dental):
  1. Phone calls go unanswered. Small businesses miss 40-60% of calls because staff are busy with in-person customers. Each missed call is a missed opportunity - potentially $500+ in lost revenue for a dental office.
  2. Hiring is expensive and unreliable. A receptionist costs $35,000-50,000/year plus benefits. They call in sick. They quit. They need training. They can only work 8 hours a day.
  3. After-hours coverage is nearly impossible. Answering services cost $1-3 per call and often provide poor experiences. The business loses customers who call at 7 PM.
  4. Consistency is a challenge. Human staff have good days and bad days. The customer experience varies wildly.
For Agencies (Our Direct Customers like Oxford Pierpont):
  1. Agencies want to offer AI solutions but can’t build them. They see the opportunity but lack technical expertise.
  2. Existing solutions don’t allow reselling. Most AI voice products are direct-to-business. Agencies can’t white-label them.
  3. Agencies need recurring revenue. One-time website builds are feast-or-famine. Voice AI is a monthly subscription model.

Market Timing

Why build this now? Because all the pieces finally exist:
  1. LLMs are good enough. Claude, GPT-4, and others can now hold genuinely helpful conversations. Two years ago, they couldn’t.
  2. Speech technology has matured. Deepgram’s Nova-2 model achieves >95% accuracy. Text-to-speech voices (like Chatterbox) are nearly indistinguishable from humans.
  3. Real-time infrastructure exists. LiveKit provides sub-100ms audio routing. WebRTC is battle-tested.
  4. Costs have plummeted. What would have cost 1/minute in 2022 now costs \~0.025/minute.
  5. Businesses are actively seeking automation. Post-pandemic labor shortages have made every business owner aware of the need to automate.

1.3 Who It’s For (Target Users)

Primary Users: Agencies

Profile:
  • Marketing agencies with 10-100 clients
  • Digital agencies expanding into AI services
  • Call center operators looking to add AI options
  • Managed service providers (MSPs)
What they need:
  • Zero technical expertise required to deploy
  • Ability to brand as their own
  • Management dashboard for all their clients
  • Competitive pricing to mark up and profit
Example Agency Persona - “Digital Dave”:
  • Runs a 5-person marketing agency
  • Has 30 small business clients
  • Offers websites, SEO, social media
  • Wants to add “AI services” to his offerings
  • Needs to be able to set up a new client in under an hour
  • Wants to charge clients $300-500/month for the service

Secondary Users: Tenants (Agency’s Clients)

Profile:
  • Small to medium businesses
  • Service-based businesses (dental, legal, HVAC, etc.)
  • High call volume but can’t staff phones adequately
  • Value customer experience
What they need:
  • Calls answered professionally 24/7
  • Accurate information about their business
  • Easy access to call recordings and transcripts
  • Simple setup (they’re not technical)
Example Tenant Persona - “Dr. Sarah’s Dental Office”:
  • 3-dentist practice
  • Receives 50-100 calls/day
  • Front desk staff overwhelmed
  • Misses 30% of calls
  • Loses an estimated $10,000/month in missed appointments
  • Willing to pay $400/month to never miss a call again

Tertiary Users: Platform Admins (Us - aiConnected)

What we need:
  • Visibility into all agencies and tenants
  • Ability to manage billing
  • System health monitoring
  • Support access when agencies need help

1.4 Success Looks Like (Measurable Outcomes)

Technical Success Metrics

MetricTargetHow We’ll Measure
Call Answer Rate99.9%Calls answered / calls received
First Response Latency<2 secondsTime from call connect to AI speaking
Response Latency<1000msTime from human stops speaking to AI starts
Speech Recognition Accuracy>95%Deepgram reported confidence scores
Call Completion Rate>85%Calls that end normally vs. dropped/failed
System Uptime99.9%Total uptime / total time
Concurrent Call Capacity100/tenantLoad tested maximum

Business Success Metrics (Year 1)

MetricTargetNotes
Agency Partners25Paying agencies
Total Tenants250Across all agencies
Monthly Call Minutes500,000Billable minutes
Monthly Recurring Revenue$50,000From agency subscriptions
Gross Margin>60%Revenue minus direct costs
Net Promoter Score>40Customer satisfaction
Churn Rate<5%/monthAgencies leaving

What “Done” Looks Like for MVP

The MVP is complete when:
  1. ✅ An agency can sign up and create their account
  2. ✅ The agency can create a tenant (their client)
  3. ✅ A phone number can be provisioned for the tenant
  4. ✅ The tenant can upload documents to create a knowledge base
  5. ✅ Inbound calls to that number are answered by AI
  6. ✅ The AI can answer questions using the knowledge base
  7. ✅ The AI can transfer calls to a human number
  8. ✅ All calls are recorded and transcribed
  9. ✅ The agency can view calls across all tenants
  10. ✅ The tenant can view their own calls
  11. ✅ Response latency is consistently under 1 second
  12. ✅ The system handles 10 concurrent calls without degradation

Section 2: Glossary of Terms

This glossary exists so you never have to Google a term. Every technical word used in this document is defined here. Read this section once, then use it as a reference.

2.1 Telephony Terms

PSTN (Public Switched Telephone Network)

The traditional phone system. When you pick up a landline or make a cell phone call, you’re using PSTN. It’s the global network of telephone lines, fiber optic cables, switching centers, and cellular networks that allow any phone to call any other phone. Why it matters: Our AI needs to receive calls from PSTN. Regular people dial regular phone numbers. We need to bridge PSTN to our internet-based AI system.

VoIP (Voice over Internet Protocol)

Phone calls transmitted over the internet instead of traditional phone lines. Skype, Zoom, and WhatsApp calls are VoIP. The audio is converted to data packets and sent over the internet. Why it matters: Once we receive a call from PSTN, we convert it to VoIP to route through our system.

SIP (Session Initiation Protocol)

A signaling protocol for starting, maintaining, and ending VoIP calls. SIP handles the “who’s calling whom” and “call is ending” messages - but not the actual audio. Why it matters: GoToConnect and many telephony systems use SIP. Understanding SIP helps debug call connection issues.

WebRTC (Web Real-Time Communication)

A technology that enables real-time audio/video communication directly in web browsers. Unlike SIP, WebRTC is designed for the modern web and handles both signaling and media. Why it matters: Our WebRTC bridge converts between the telephony world (SIP/PSTN) and the AI world (LiveKit). WebRTC is how audio gets from the phone call into our processing pipeline.

DTMF (Dual-Tone Multi-Frequency)

The tones generated when you press buttons on a phone keypad. Each button produces a unique combination of two frequencies. “Press 1 for sales” systems use DTMF. Why it matters: Some callers may try to press buttons to navigate. Our system needs to detect and handle DTMF input appropriately.

IVR (Interactive Voice Response)

Those automated phone systems that say “Press 1 for sales, press 2 for support.” Traditional IVRs are frustrating and limited because they can’t understand natural speech. Why it matters: We’re replacing IVR with conversational AI. Understanding IVR helps explain our value proposition.

PBX (Private Branch Exchange)

A private telephone network within an organization. Think of the phone system inside a corporate office where everyone has extensions. Why it matters: GoToConnect provides cloud PBX functionality. We integrate with their system.

Trunk / SIP Trunk

A connection between phone systems. A SIP trunk is a virtual connection that allows VoIP calls to flow between two systems over the internet. Why it matters: Telephony providers charge based on trunks and concurrent call capacity.

DID (Direct Inward Dialing)

A phone number that routes directly to a specific endpoint without requiring the caller to dial an extension. When you call a business’s main number, that’s a DID. Why it matters: Each tenant gets one or more DIDs. These are the phone numbers customers actually call.

ANI (Automatic Number Identification)

The caller’s phone number, transmitted with the call. This is how caller ID works. Why it matters: We capture ANI to identify repeat callers and log call metadata.

CDR (Call Detail Record)

A record of a phone call containing metadata: who called, who answered, when, how long, etc. Every call generates a CDR. Why it matters: CDRs are essential for billing, analytics, and compliance.

E.164

The international standard format for phone numbers: +[country code][number]. Example: +15551234567 for a US number. Why it matters: We store all phone numbers in E.164 format for consistency. Always convert to E.164 before storing or comparing.

2.2 WebRTC Terms

ICE (Interactive Connectivity Establishment)

A framework for establishing peer-to-peer connections through NATs and firewalls. ICE tries multiple connection methods and picks the best one that works. Why it matters: WebRTC connections can be tricky because of network configurations. ICE handles the complexity of actually connecting two endpoints.

STUN (Session Traversal Utilities for NAT)

A protocol that helps a client discover its public IP address and what type of NAT (network address translation) is between it and the public internet. Why it matters: STUN servers help establish direct connections when possible.

TURN (Traversal Using Relays around NAT)

A protocol that relays traffic through an intermediary server when direct connections aren’t possible. It’s a fallback when STUN fails. Why it matters: TURN servers cost money (bandwidth) but ensure connections work in restrictive network environments.

SDP (Session Description Protocol)

A format for describing multimedia communication sessions. When two WebRTC endpoints connect, they exchange SDP messages describing what codecs they support, what media they want to send/receive, etc. Why it matters: SDP is how WebRTC endpoints negotiate connection parameters.

Oer-to-Peer (P2P)

Direct communication between two endpoints without an intermediary server. WebRTC prefers P2P for lowest latency. Why it matters: P2P is ideal but not always possible. We use LiveKit as an SFU when P2P isn’t feasible.

SFU (Selective Forwarding Unit)

A server that receives media streams from multiple participants and selectively forwards them to other participants. Unlike MCU (mixing), SFU just routes streams without processing them. Why it matters: LiveKit is an SFU. It receives audio from the caller and forwards it to the AI, and vice versa.

Media Track

A single stream of audio or video. An audio track carries sound; a video track carries images. WebRTC connections can have multiple tracks. Why it matters: We work exclusively with audio tracks. The caller publishes an audio track; the AI publishes an audio track.

Codec

A algorithm that encodes and decodes audio or video. Different codecs have different trade-offs between quality, latency, and bandwidth. Why it matters: We use Opus codec for audio because it’s designed for real-time voice communication with low latency.

Opus

An audio codec specifically designed for interactive real-time applications. It handles everything from low-bandwidth voice to high-quality music. It’s the default codec for WebRTC audio. Why it matters: All our audio is encoded with Opus. Sample rate is typically 48kHz with 20ms frames.

Sample Rate

How many audio samples are captured per second. 48000 Hz (48 kHz) means 48,000 samples per second. Higher sample rates = better quality but more data. Why it matters: Different components expect different sample rates. We standardize on 48kHz for LiveKit but may need 16kHz for some STT services.

Frame

A chunk of audio samples. Audio is processed in frames, not individual samples. A 20ms frame at 48kHz contains 960 samples. Why it matters: Audio processing is frame-based. Understanding frame size helps with buffer management and latency calculations.

2.3 AI/ML Terms

LLM (Large Language Model)

An AI model trained on massive amounts of text that can understand and generate human-like text. Examples: Claude (Anthropic), GPT-4 (OpenAI), Llama (Meta). Why it matters: The LLM is the “brain” of our AI agent. It understands what the caller wants and generates appropriate responses.

STT (Speech-to-Text)

The process of converting spoken audio into written text. Also called ASR (Automatic Speech Recognition). Why it matters: We must convert the caller’s speech to text before the LLM can process it. Deepgram Nova-2 is our STT provider.

TTS (Text-to-Speech)

The process of converting written text into spoken audio. Also called speech synthesis. Why it matters: After the LLM generates a text response, we must convert it to audio for the caller to hear. Chatterbox is our TTS provider.

VAD (Voice Activity Detection)

Detecting when someone is speaking versus when there’s silence or background noise. Why it matters: VAD tells us when the caller starts and stops speaking. This is critical for turn-taking in conversation.

Barge-In

When a caller interrupts the AI while it’s speaking. The AI should stop talking and listen. Why it matters: Natural conversations include interruptions. Our AI must handle barge-in gracefully.

Turn-Taking

The conversational pattern of one party speaking, then the other, back and forth. Humans do this naturally; AI must be programmed to do it. Why it matters: Poor turn-taking makes conversations awkward. The AI shouldn’t talk over the caller or leave long silences.

Latency

The delay between cause and effect. In our context: the time between when the caller stops speaking and when the AI starts responding. Why it matters: High latency feels unnatural. We target <1000ms total latency.

Streaming

Processing data as it arrives rather than waiting for all of it. Streaming STT transcribes words as they’re spoken; streaming TTS generates audio as text is produced. Why it matters: Streaming is essential for low latency. We can’t wait for the caller to finish a complete sentence before starting to process.

Embeddings

Numerical representations of text that capture semantic meaning. Similar texts have similar embeddings. Why it matters: We use embeddings to search the knowledge base. When a caller asks a question, we embed the question and find knowledge chunks with similar embeddings.

Vector Database

A database optimized for storing and searching embeddings. Regular databases search by exact match; vector databases search by similarity. Why it matters: Knowledge base search uses vector similarity. We store document embeddings and query by similarity.

RAG (Retrieval-Augmented Generation)

A technique where the LLM is given relevant information retrieved from a knowledge base before generating a response. This grounds the AI’s responses in actual facts. Why it matters: RAG is how our AI answers questions about a specific business. We retrieve relevant knowledge and inject it into the LLM prompt.

Prompt

The input given to an LLM. This includes system instructions, context, and the user’s message. Why it matters: Prompt design significantly affects AI quality. We carefully craft prompts to make the AI behave appropriately for each tenant.

System Prompt

Instructions given to the LLM that set its behavior, personality, and constraints. The system prompt is typically hidden from the end user. Why it matters: Each tenant has a customized system prompt that defines their AI’s personality and knowledge.

Context Window

The maximum amount of text an LLM can process at once. Measured in tokens. Claude Sonnet has a 200K token context window. Why it matters: Conversation history must fit in the context window. Long calls may require summarization.

Token

A unit of text processing for LLMs. Roughly 4 characters or 0.75 words in English. LLMs charge by token and have token limits. Why it matters: Token usage affects cost and context limits. We track tokens for billing and to avoid exceeding limits.

Function Calling / Tool Use

The ability of an LLM to request execution of external functions. The AI says “I need to check the calendar” and we execute that function and return results. Why it matters: Function calling lets our AI take actions - transfer calls, look up information, schedule appointments, etc.

Hallucination

When an LLM generates plausible-sounding but false information. The AI confidently states something that isn’t true. Why it matters: Hallucinations are dangerous in business contexts. RAG and careful prompting reduce but don’t eliminate hallucinations.

2.4 Platform Terms

Agency

In our platform, an agency is a business partner who resells Voice by aiConnected to their clients. The agency is our direct customer. Example: Oxford Pierpont is an agency with 30 client tenants.

Tenant

An end-customer business that uses the platform through an agency. The tenant is the agency’s customer. Example: Smile Dental is a tenant under Oxford Pierpont.

Platform Admin

An aiConnected employee who manages the overall platform. Can see all agencies and tenants.

Agency Admin

A user who manages an agency account. Can create/manage tenants, view agency-wide analytics, etc.

Tenant Admin

A user who manages a single tenant account. Can configure their knowledge base, view their call history, etc.

Knowledge Base

A collection of information about a tenant’s business that the AI uses to answer questions. Can include documents, FAQs, and structured data. Example: Smile Dental’s knowledge base includes their service list, pricing, hours, and policies.

Voice Configuration

Settings that define how the AI sounds and behaves for a tenant. Includes voice selection, speaking rate, personality traits.

Personality

The behavioral characteristics of the AI agent - formal vs casual, concise vs verbose, etc.

2.5 Infrastructure Terms

Container

A lightweight, standalone package that includes everything needed to run a piece of software. Containers are consistent across development and production. Why it matters: We deploy our services as Docker containers. This ensures consistency across environments.

Docker

The most popular containerization platform. We write Dockerfiles that define how to build containers.

Kubernetes / K8s

A system for orchestrating containers at scale - handling deployment, scaling, and management. We use Dokploy (which uses Docker Swarm) instead of Kubernetes for simplicity.

Dokploy

An open-source platform for deploying Docker applications. Simpler than Kubernetes. This is our deployment platform on DigitalOcean.

Webhook

An HTTP callback - a way for one service to notify another when something happens. Instead of polling “did anything happen?”, the service pushes notifications. Why it matters: GoToConnect sends webhooks when calls arrive. LiveKit sends webhooks when participants join/leave. Our system is event-driven via webhooks.

WebSocket

A protocol for persistent, bidirectional communication between client and server. Unlike HTTP (request/response), WebSocket connections stay open for real-time data flow. Why it matters: Deepgram STT uses WebSocket for streaming audio in and transcriptions out. Real-time communication requires WebSocket.

REST API

A standard way to build web APIs using HTTP methods (GET, POST, PUT, DELETE) and JSON data. Why it matters: Our management APIs are REST. Agencies and tenants interact with the platform via REST API (and UI built on it).

JWT (JSON Web Token)

A compact, self-contained token for securely transmitting information. Used for authentication - proving who a user is. Why it matters: Our authentication system uses JWT. Users log in and receive a token that proves their identity.

UUID (Universally Unique Identifier)

A 128-bit identifier that’s practically guaranteed to be unique. Example: 550e8400-e29b-41d4-a716-446655440000 Why it matters: We use UUIDs as primary keys for most database records. They’re generated client-side without coordination.

Environment Variable

A configuration value set outside the code. Allows the same code to run differently in development vs production. Why it matters: API keys, database URLs, and feature flags are environment variables. Never hardcode secrets.

Redis

An in-memory data store used for caching, session storage, and pub/sub messaging. Very fast because data is in RAM. Why it matters: We use Redis for real-time state (active calls), caching, and as a message queue.

PostgreSQL

A powerful open-source relational database. Our primary data store for all persistent data.

n8n

An open-source workflow automation tool. Think “Zapier but self-hosted.” We use n8n for orchestrating webhooks and automations.

Section 3: Architecture Overview

3.1 System Diagram (With Explanation)

┌─────────────────────────────────────────────────────────────────────────────┐
│                              PSTN NETWORK                                    │
│                    (Traditional Phone System)                                │
│                                                                              │
│    📱 Caller's Phone ─────────────────────────────────────────┐             │
│                                                                │             │
└────────────────────────────────────────────────────────────────│─────────────┘


┌─────────────────────────────────────────────────────────────────────────────┐
│                           GOTOCONNECT                                        │
│                    (Cloud Telephony Provider)                                │
│                                                                              │
│    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐                   │
│    │   Phone     │    │    Call     │    │  Webhook    │                   │
│    │  Numbers    │    │   Control   │    │   Events    │                   │
│    │   (DIDs)    │    │    API      │    │             │                   │
│    └─────────────┘    └─────────────┘    └──────┬──────┘                   │
│                                                  │                          │
└──────────────────────────────────────────────────│──────────────────────────┘

                    ┌──────────────────────────────┼───────────────────┐
                    │                              │                   │
                    ▼                              ▼                   │
┌─────────────────────────────────────┐  ┌─────────────────────────┐  │
│         ORCHESTRATION LAYER         │  │    WEBRTC BRIDGE        │  │
│              (n8n)                   │  │      SERVICE            │  │
│                                      │  │                         │  │
│  ┌────────────────────────────────┐ │  │  ┌───────────────────┐  │  │
│  │  • Receive call webhooks       │ │  │  │ Ooma WebRTC       │  │  │
│  │  • Route to appropriate flow   │ │  │  │ Softphone         │  │  │
│  │  • Trigger call setup          │ │  │  │ (Browser-based)   │  │  │
│  │  • Handle post-call processing │ │  │  └─────────┬─────────┘  │  │
│  └────────────────────────────────┘ │  │            │            │  │
│                                      │  │  ┌─────────▼─────────┐  │  │
└──────────────────────────────────────┘  │  │ Audio Capture &   │  │  │
                                          │  │ Forwarding        │  │  │
                                          │  └─────────┬─────────┘  │  │
                                          │            │            │  │
                                          └────────────│────────────┘  │
                                                       │               │
                                                       ▼               │
┌─────────────────────────────────────────────────────────────────────────────┐
│                            LIVEKIT CLOUD                                     │
│                    (Real-Time Media Infrastructure)                          │
│                                                                              │
│    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐                   │
│    │    Room     │    │   Audio     │    │  Recording  │                   │
│    │ Management  │    │   Routing   │    │   (Egress)  │                   │
│    └─────────────┘    └─────────────┘    └─────────────┘                   │
│                              │                                              │
│                              │ Audio Streams                                │
│                              ▼                                              │
└──────────────────────────────│──────────────────────────────────────────────┘



┌─────────────────────────────────────────────────────────────────────────────┐
│                         AI AGENT SERVICE                                     │
│                 (LiveKit Agents Framework + Our Logic)                       │
│                                                                              │
│    ┌─────────────────────────────────────────────────────────────────────┐  │
│    │                        VOICE PIPELINE                                │  │
│    │                                                                      │  │
│    │   ┌─────────┐      ┌─────────┐      ┌─────────┐      ┌─────────┐   │  │
│    │   │ CALLER  │      │  STT    │      │  LLM    │      │  TTS    │   │  │
│    │   │ AUDIO   │─────▶│Deepgram │─────▶│ Claude  │─────▶│Chatter- │   │  │
│    │   │   IN    │      │ Nova-2  │      │ Sonnet  │      │  box    │   │  │
│    │   └─────────┘      └─────────┘      └────┬────┘      └────┬────┘   │  │
│    │                                          │                │        │  │
│    │                                          ▼                │        │  │
│    │                                    ┌──────────┐           │        │  │
│    │                                    │ Knowledge│           │        │  │
│    │                                    │   Base   │           │        │  │
│    │                                    │  (RAG)   │           │        │  │
│    │                                    └──────────┘           │        │  │
│    │                                                           ▼        │  │
│    │   ┌─────────┐                                      ┌─────────┐    │  │
│    │   │ AGENT   │◀─────────────────────────────────────│  AUDIO  │    │  │
│    │   │ AUDIO   │                                      │   OUT   │    │  │
│    │   │   OUT   │                                      │         │    │  │
│    │   └─────────┘                                      └─────────┘    │  │
│    │                                                                      │  │
│    └─────────────────────────────────────────────────────────────────────┘  │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

                               │ Writes to

┌─────────────────────────────────────────────────────────────────────────────┐
│                          DATA LAYER                                          │
│                                                                              │
│    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐                   │
│    │ PostgreSQL  │    │   Redis     │    │   S3/DO    │                   │
│    │  (Primary   │    │  (Cache,    │    │  Spaces    │                   │
│    │   Data)     │    │   State)    │    │(Recordings)│                   │
│    └─────────────┘    └─────────────┘    └─────────────┘                   │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

                               │ Powers

┌─────────────────────────────────────────────────────────────────────────────┐
│                      MANAGEMENT LAYER                                        │
│                                                                              │
│    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐                   │
│    │   REST API  │    │  Web UI     │    │  Webhooks   │                   │
│    │  (Backend)  │    │ (Frontend)  │    │   (Out)     │                   │
│    └─────────────┘    └─────────────┘    └─────────────┘                   │
│                                                                              │
│              Used by: Agencies, Tenants, Platform Admins                    │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

Component-by-Component Explanation

PSTN Network The traditional phone network. When Sarah picks up her phone and dials (555) 123-4567, her call travels through PSTN. This is outside our control - it’s the global telephone infrastructure. GoToConnect Our telephony provider. They give us:
  • Phone numbers (DIDs) that customers call
  • The ability to answer and control calls programmatically
  • Webhook notifications when calls arrive
  • APIs to transfer, hold, and hangup calls
GoToConnect is the bridge between PSTN and our internet-based system. We chose them because they offer the Ooma WebRTC softphone, which lets us capture audio via browser. n8n (Orchestration Layer) An automation platform that receives webhooks and coordinates responses. When GoToConnect sends a “call.ringing” webhook, n8n:
  1. Receives the webhook
  2. Looks up the phone number to find the tenant
  3. Triggers the WebRTC bridge to answer
  4. Initiates LiveKit room creation
  5. Dispatches an AI agent
n8n is the “traffic controller” that coordinates all the services. WebRTC Bridge Service This is custom software we build. It:
  1. Runs a headless browser with Ooma’s WebRTC softphone
  2. Auto-answers incoming calls
  3. Captures the audio stream from the browser
  4. Forwards that audio to LiveKit
  5. Receives audio from LiveKit (the AI speaking)
  6. Plays that audio through the browser to the caller
This bridge is necessary because GoToConnect doesn’t give us direct audio access - we have to go through their softphone. LiveKit Cloud A real-time communication platform (like a specialized video conferencing backend). LiveKit:
  • Creates “rooms” for each call
  • Routes audio between participants (caller, AI agent, supervisors)
  • Records calls (via Egress)
  • Handles all the WebRTC complexity
We use LiveKit Cloud (managed service) rather than self-hosting to reduce complexity. AI Agent Service Built on the LiveKit Agents framework. This is where the magic happens:
  1. Subscribes to the caller’s audio from LiveKit
  2. Streams audio to Deepgram for transcription
  3. Sends transcriptions to Claude for response generation
  4. Streams Claude’s response to Chatterbox for speech synthesis
  5. Publishes synthesized audio back to LiveKit
Voice Pipeline Components:
  • Deepgram Nova-2: Converts caller’s speech to text. Streaming, real-time.
  • Claude Sonnet: Generates intelligent responses. Understands context, follows instructions.
  • Knowledge Base (RAG): Vector database with tenant-specific information. Grounds Claude’s responses in facts.
  • Chatterbox-Turbo: Converts Claude’s text responses to natural-sounding speech. Runs on RunPod GPU.
Data Layer
  • PostgreSQL: All persistent data - users, tenants, calls, transcripts, etc.
  • Redis: Fast, temporary data - active call state, caching, pub/sub messaging
  • S3/DigitalOcean Spaces: Object storage for call recordings (audio files)
Management Layer
  • REST API: Backend service that powers all management operations
  • Web UI: React-based dashboard for agencies and tenants
  • Webhooks (Out): Notify external systems when events occur (call completed, etc.)

3.2 Data Flow Narrative (Step-by-Step What Happens on a Call)

Let’s follow a complete call from start to finish. Sarah is calling Smile Dental.

Phase 1: Call Initiation (0-3 seconds)

T+0.0s: Sarah dials (555) 123-4567
  • Her phone connects to PSTN
  • PSTN routes to GoToConnect (which owns that number)
T+0.5s: GoToConnect receives the call
  • Looks up routing for (555) 123-4567
  • Finds it’s configured to ring the Ooma softphone extension
  • Sends HTTP POST webhook to our n8n endpoint:
{
  "event": "call.ringing",
  "callId": "call-123456",
  "from": "+15559876543",
  "to": "+15551234567",
  "timestamp": "2026-01-25T10:00:00Z"
}
T+0.6s: n8n receives webhook
  • Workflow triggers
  • Looks up phone number +15551234567 in database
  • Finds: tenant_id = “smile-dental”, agency_id = “oxford-pierpont”
  • Loads tenant configuration: voice settings, greeting, personality
  • Creates a call record in PostgreSQL with status = “ringing”
T+0.7s: n8n triggers WebRTC bridge
  • Sends command: “Answer call on line X”
  • Bridge’s browser-based softphone picks up
T+1.0s: Call connects
  • GoToConnect sees the softphone answered
  • Audio path established: Sarah ↔ GoToConnect ↔ Softphone in Browser
  • GoToConnect sends “call.answered” webhook
T+1.2s: n8n creates LiveKit room
  • Room name: “call-smile-dental-call-123456”
  • Generates access tokens for bridge (as “caller”) and agent
T+1.5s: Bridge joins LiveKit room
  • Opens WebSocket connection to LiveKit
  • Starts publishing caller audio as an audio track
  • Subscribes to receive agent audio track
T+1.8s: AI Agent dispatched
  • n8n notifies Agent Service: “Join room call-smile-dental-call-123456”
  • Agent Service assigns an available agent worker
  • Agent loads Smile Dental’s configuration and knowledge base
T+2.0s: Agent joins LiveKit room
  • Subscribes to caller’s audio track
  • Ready to publish agent audio track
  • Initializes STT connection to Deepgram
  • Prepares Claude conversation with system prompt
T+2.5s: Agent speaks greeting
  • Retrieves greeting from tenant config: “Thank you for calling Smile Dental, this is Dr. Smith’s office. How can I help you today?”
  • Sends greeting to Chatterbox TTS
  • Receives audio stream back
  • Publishes to LiveKit
  • Sarah hears the greeting through her phone
Total call setup time: ~2.5 seconds

Phase 2: Conversation (Duration varies)

T+3.0s: Sarah starts speaking
  • “Yeah, hi, I need to schedule a teeth cleaning”
  • Audio flows: Sarah’s phone → PSTN → GoToConnect → Softphone → Bridge → LiveKit → Agent
T+3.0s to T+5.0s: Speech-to-Text processing
  • Agent streams audio to Deepgram via WebSocket
  • Deepgram sends interim results as Sarah speaks:
    • T+3.2s: “Yeah”
    • T+3.5s: “Yeah hi”
    • T+3.8s: “Yeah hi I need to”
    • T+4.2s: “Yeah hi I need to schedule”
    • T+4.8s: “Yeah hi I need to schedule a teeth cleaning”
  • VAD (Voice Activity Detection) detects Sarah stopped speaking at T+5.0s
  • Deepgram sends final transcript: “Yeah, hi, I need to schedule a teeth cleaning.”
T+5.0s: Agent processes transcript
  • Recognizes intent: appointment scheduling
  • Queries knowledge base: “teeth cleaning appointment scheduling”
  • Retrieves relevant chunks:
    • “Teeth cleaning appointments are 45 minutes”
    • “Available Monday-Friday 8am-5pm, Saturday 9am-2pm”
    • “New patient cleaning: 150,Existingpatient:150, Existing patient: 100”
T+5.1s: Agent sends to Claude
  • Constructs prompt with:
    • System prompt (personality, instructions)
    • Knowledge base context (retrieved chunks)
    • Conversation history (just the greeting so far)
    • User message: “Yeah, hi, I need to schedule a teeth cleaning.”
T+5.1s to T+5.8s: Claude generates response
  • Claude processes and generates response (streaming)
  • As tokens stream back, agent buffers them into sentence chunks
  • First sentence ready: “I’d be happy to help you schedule a cleaning!”
T+5.8s: First sentence to TTS
  • Sends “I’d be happy to help you schedule a cleaning!” to Chatterbox
  • Chatterbox generates audio and streams back
T+5.9s: Agent starts speaking
  • Publishes audio to LiveKit
  • Audio flows: Agent → LiveKit → Bridge → Softphone → GoToConnect → PSTN → Sarah’s phone
  • Sarah hears: “I’d be happy to help you schedule a cleaning!”
T+6.2s: Continues with rest of response
  • Meanwhile, Claude has generated more: “Are you an existing patient with us, or will this be your first visit?”
  • TTS generates audio, agent publishes
  • Sarah hears the complete response
Response latency: ~900ms (from Sarah stopping speaking to AI starting to respond)

Phase 3: Continued Conversation

This back-and-forth continues:
  • Sarah: “I’ve been there before, maybe two years ago?”
  • Agent: (checks if it matters, decides to proceed) “Great, let me check our availability. What days work best for you?”
  • Sarah: “Anytime Thursday or Friday afternoon”
  • Agent: “I have openings Thursday at 2pm, 3:30pm, or Friday at 1pm and 4pm. Which works best?”
  • Sarah: “Thursday at 3:30 works”
  • Agent: “Perfect! I have you down for Thursday at 3:30pm for a teeth cleaning. Can I confirm your phone number for appointment reminders?”
  • …and so on
Throughout:
  • Every utterance is transcribed and stored
  • Conversation history grows, sent to Claude each turn
  • Agent can access tenant knowledge base as needed
  • Full audio is being recorded via LiveKit Egress

Phase 4: Call Completion

Sarah: “That’s all I needed, thanks!” Agent recognizes call is ending
  • Intent: end conversation
  • Response: “You’re all set! We’ll see you Thursday at 3:30. Have a great day!”
Agent initiates hangup
  • Sends command to n8n: “End call call-123456”
  • n8n tells GoToConnect to hang up
  • GoToConnect terminates the call
Post-call processing:
  1. LiveKit room closes (all participants left)
  2. LiveKit Egress finalizes recording, uploads to storage
  3. n8n workflow triggers:
    • Updates call record: status = “completed”, duration = 180 seconds
    • Triggers transcript finalization
    • Generates call summary (optional Claude call)
    • Calculates usage for billing
    • Sends webhook to tenant (if configured)
Call record in database:
{
  "id": "call-123456",
  "tenant_id": "smile-dental",
  "from_number": "+15559876543",
  "to_number": "+15551234567",
  "direction": "inbound",
  "status": "completed",
  "started_at": "2026-01-25T10:00:00Z",
  "answered_at": "2026-01-25T10:00:01Z",
  "ended_at": "2026-01-25T10:03:00Z",
  "duration_seconds": 180,
  "recording_url": "https://storage.example.com/recordings/call-123456.wav",
  "transcript_id": "transcript-789",
  "outcome": "appointment_scheduled",
  "cost_cents": 45
}

3.3 Technology Choices (What We’re Using and WHY)

Every technology choice has a reason. Here’s why we chose each component:

Telephony: GoToConnect

What it is: Cloud-based business phone system with API access. Why we chose it:
  1. Ooma WebRTC Softphone - Critical. GoToConnect offers a browser-based softphone through Ooma, which lets us capture audio without specialized telephony hardware.
  2. Webhook support - Sends real-time notifications for call events.
  3. Call control API - Programmatic transfer, hold, hangup.
  4. Reasonable pricing - ~$0.005/minute for PSTN usage.
  5. Existing relationship - Bob’s company already uses GoToConnect.
Alternatives considered:
  • Twilio - More developer-friendly but more expensive, no Ooma equivalent
  • Vonage - Similar capabilities but less familiar
  • Direct SIP - Would require significant telephony expertise

Real-Time Media: LiveKit Cloud

What it is: Managed WebRTC infrastructure for real-time audio/video. Why we chose it:
  1. LiveKit Agents Framework - Purpose-built for AI voice agents. Handles VAD, turn-taking, pipeline orchestration.
  2. Cloud-hosted - No infrastructure to manage.
  3. Low latency - Sub-100ms audio routing.
  4. Recording built-in - Egress feature for call recording.
  5. Scalable - Handles thousands of concurrent rooms.
Alternatives considered:
  • Self-hosted LiveKit - More control but operational burden
  • Twilio Video - Less AI-focused, no agents framework
  • Daily.co - Good but less mature agent tooling
  • Custom WebRTC - Too much complexity

Speech-to-Text: Deepgram Nova-2

What it is: Real-time speech recognition API. Why we chose it:
  1. Accuracy - Nova-2 is best-in-class for conversational speech.
  2. Streaming - Real-time results as speech happens.
  3. Latency - Designed for real-time use cases.
  4. Pricing - $0.0043/minute is competitive.
  5. LiveKit integration - Works well with LiveKit Agents.
Alternatives considered:
  • Google Speech-to-Text - Good but more expensive
  • AWS Transcribe - Higher latency
  • Whisper - Not designed for real-time streaming
  • AssemblyAI - Good but Deepgram has edge on latency

Language Model: Claude Sonnet (Anthropic)

What it is: Large language model for generating responses. Why we chose it:
  1. Quality - Claude produces natural, helpful responses.
  2. Instruction following - Excellent at staying in character.
  3. Function calling - Reliable tool use for actions.
  4. Context window - 200K tokens handles long conversations.
  5. Safety - Built-in refusal of harmful requests.
Alternatives considered:
  • GPT-4 - Comparable but OpenAI has reliability concerns
  • Llama - Would need to self-host, more complexity
  • Claude Opus - Overkill for this use case, more expensive

Text-to-Speech: Chatterbox-Turbo on RunPod

What it is: Open-source TTS model running on GPU cloud. Why we chose it:
  1. Quality - Natural-sounding voice synthesis.
  2. Cost - Much cheaper than commercial TTS at scale.
  3. Customization - Can fine-tune for specific voices.
  4. Latency - Fast enough for real-time with GPU acceleration.
  5. No per-character fees - Just GPU time.
Alternatives considered:
  • ElevenLabs - Excellent quality but $0.30/1000 chars adds up
  • Amazon Polly - Robotic sounding
  • Google TTS - Better than Polly but not great
  • Play.ht - Good but expensive for volume

Database: PostgreSQL

What it is: Open-source relational database. Why we chose it:
  1. Reliability - Battle-tested, ACID compliant.
  2. pgvector extension - Native vector similarity search for RAG.
  3. JSON support - Flexible for varied data shapes.
  4. Familiar - Team knows it well.
  5. Managed options - DigitalOcean, AWS RDS, etc.
Alternatives considered:
  • MySQL - No native vector support
  • MongoDB - Less suited for relational data
  • Separate vector DB - Added complexity

Cache/State: Redis

What it is: In-memory data store. Why we chose it:
  1. Speed - Sub-millisecond operations.
  2. Pub/Sub - Real-time messaging between services.
  3. TTL support - Automatic expiration for temporary data.
  4. Familiar - Industry standard.
Alternatives considered:
  • Memcached - Less feature-rich
  • KeyDB - Compatible but less proven

Orchestration: n8n

What it is: Open-source workflow automation. Why we chose it:
  1. Visual workflows - Easy to build and debug.
  2. Webhook handling - First-class support.
  3. Self-hosted - No per-execution fees.
  4. Extensible - Custom code nodes when needed.
  5. Bob’s familiarity - Already using it.
Alternatives considered:
  • Zapier - Too expensive at scale
  • Custom code - More flexibility but slower to develop
  • Temporal - Overkill for our needs

Deployment: Dokploy on DigitalOcean

What it is: Container orchestration platform on cloud VMs. Why we chose it:
  1. Simplicity - Easier than Kubernetes.
  2. Cost - DigitalOcean is affordable.
  3. Control - Self-managed but not too complex.
  4. Docker-native - Standard containerization.
Alternatives considered:
  • Kubernetes - Overkill for initial scale
  • AWS ECS - More complex, vendor lock-in
  • Heroku - Expensive at scale
  • Render - Good but less control

3.4 What We’re NOT Building (Explicit Scope Boundaries)

Clear boundaries prevent scope creep. Here’s what’s explicitly out of scope:

Not Building: Outbound Dialer (MVP)

We will support outbound calls eventually, but MVP is inbound-only. Outbound dialers require:
  • Campaign management
  • Do-not-call list compliance
  • Predictive dialing algorithms
  • Different conversation patterns
Why not: Inbound is simpler and provides immediate value. Outbound comes in Phase 2.

Not Building: Video Calls

Voice only. No video support. Video would require:
  • Different pipeline (video processing)
  • Higher bandwidth
  • Different use cases entirely
Why not: Our value prop is voice. Video is a different product.

Not Building: SMS/Chat

Voice only. No text messaging or web chat. These would require:
  • Different interaction patterns
  • Different latency expectations
  • Different UI
Why not: Focus. We do voice exceptionally well first.

Not Building: Custom Voice Cloning

We use pre-trained voices. We won’t clone customer voices or create fully custom voices. This would require:
  • Voice recording sessions
  • Fine-tuning pipelines
  • Legal consent frameworks
Why not: Complexity. Pre-trained voices are good enough for MVP.

Not Building: On-Premise Deployment

Cloud only. No on-premise option. On-prem would require:
  • Different deployment models
  • Customer-managed infrastructure
  • Support complexity
Why not: Operational simplicity. Enterprise on-prem is a future consideration.

Not Building: Direct Consumer Sales

Agencies only. We don’t sell directly to end businesses. Direct sales would require:
  • Different sales motion
  • Support infrastructure
  • Competing with our own customers
Why not: Channel strategy. Agencies scale better than direct sales.

Not Building: Full CRM

We capture call data but we’re not a CRM. Integrations with Salesforce, HubSpot, etc. are planned, but we won’t replicate CRM functionality. Why not: Focus. Others do CRM well. We do voice AI well.

Not Building: Appointment Scheduling Backend

The AI can help schedule appointments conversationally, but we won’t build a full scheduling system (calendar management, availability, etc.). We’ll integrate with existing systems. Why not: Reinventing the wheel. Calendly, Acuity, etc. exist.

Section 4: Development Environment Setup

This section tells you exactly how to set up your development machine to work on this project. Follow these steps in order.

4.1 Required Accounts & API Keys

Before writing any code, you need accounts with these services. Create accounts and gather API keys.

4.1.1 GoToConnect (Telephony)

What you need:
  • GoToConnect account with API access
  • OAuth 2.0 credentials (Client ID and Client Secret)
  • At least one phone number provisioned
  • Webhook endpoint configured
How to get it:
  1. Contact GoToConnect sales for a developer/partner account
  2. Access the admin portal at admin.goto.com
  3. Navigate to Integrations → API Credentials
  4. Create new OAuth 2.0 application
  5. Note the Client ID and Client Secret
  6. Configure redirect URI for OAuth flow
Environment variables:
GOTOCONNECT_CLIENT_ID=your_client_id_here
GOTOCONNECT_CLIENT_SECRET=your_client_secret_here
GOTOCONNECT_REDIRECT_URI=https://yourapp.com/oauth/callback
GOTOCONNECT_WEBHOOK_SECRET=your_webhook_secret

4.1.2 LiveKit Cloud

What you need:
  • LiveKit Cloud account
  • API Key and Secret
  • WebSocket URL for your project
How to get it:
  1. Sign up at cloud.livekit.io
  2. Create a new project
  3. Go to Settings → Keys
  4. Note the API Key and Secret
  5. Note the WebSocket URL (wss://your-project.livekit.cloud)
Environment variables:
LIVEKIT_API_KEY=your_api_key_here
LIVEKIT_API_SECRET=your_api_secret_here
LIVEKIT_WS_URL=wss://your-project.livekit.cloud

4.1.3 Deepgram (STT)

What you need:
  • Deepgram account
  • API key
How to get it:
  1. Sign up at console.deepgram.com
  2. Create a new project
  3. Go to API Keys
  4. Create new key with appropriate permissions
Environment variables:
DEEPGRAM_API_KEY=your_api_key_here

4.1.4 Anthropic (LLM)

What you need:
  • Anthropic API account
  • API key
How to get it:
  1. Sign up at console.anthropic.com
  2. Go to API Keys
  3. Create new key
Environment variables:
ANTHROPIC_API_KEY=your_api_key_here

4.1.5 RunPod (TTS Hosting)

What you need:
  • RunPod account
  • API key
  • GPU endpoint URL (after deploying Chatterbox)
How to get it:
  1. Sign up at runpod.io
  2. Add payment method
  3. Go to Settings → API Keys
  4. Create new key
  5. Deploy Chatterbox template (instructions in Part 7)
Environment variables:
RUNPOD_API_KEY=your_api_key_here
CHATTERBOX_ENDPOINT_URL=https://your-endpoint.runpod.ai

4.1.6 DigitalOcean

What you need:
  • DigitalOcean account
  • API token
  • Spaces access keys (for object storage)
How to get it:
  1. Sign up at digitalocean.com
  2. Go to API → Tokens → Generate New Token
  3. Go to Spaces → Manage Keys → Generate New Key
Environment variables:
DO_API_TOKEN=your_token_here
DO_SPACES_KEY=your_spaces_key
DO_SPACES_SECRET=your_spaces_secret
DO_SPACES_ENDPOINT=nyc3.digitaloceanspaces.com
DO_SPACES_BUCKET=voice-aiconnected-recordings

4.1.7 Database Connection

For local development:
DATABASE_URL=postgresql://postgres:password@localhost:5432/voice_aiconnected
REDIS_URL=redis://localhost:6379
For production (managed databases):
DATABASE_URL=postgresql://user:pass@db-host:5432/voice_aiconnected?sslmode=require
REDIS_URL=rediss://user:pass@redis-host:6379

4.2 Local Development Tools

Install these tools on your development machine.

4.2.1 Required Software

Node.js (v20 LTS)
# Using nvm (recommended)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash
nvm install 20
nvm use 20
node --version  # Should show v20.x.x
Python (3.11+)
# Using pyenv (recommended)
curl https://pyenv.run | bash
pyenv install 3.11.7
pyenv global 3.11.7
python --version  # Should show 3.11.x
Docker & Docker Compose
# macOS
brew install --cask docker

# Ubuntu
sudo apt-get update
sudo apt-get install docker.io docker-compose-v2
sudo usermod -aG docker $USER  # Log out and back in after this

# Verify
docker --version
docker compose version
PostgreSQL Client
# macOS
brew install postgresql@15

# Ubuntu
sudo apt-get install postgresql-client-15

# Verify
psql --version
Redis CLI
# macOS
brew install redis

# Ubuntu
sudo apt-get install redis-tools

# Verify
redis-cli --version
Git
# Should already be installed, verify:
git --version

# If not:
# macOS: xcode-select --install
# Ubuntu: sudo apt-get install git
VS Code Extensions:
  • Python (Microsoft)
  • Pylance
  • ESLint
  • Prettier
  • Docker
  • GitLens
  • Thunder Client (API testing)
  • PostgreSQL (ckolkman)
VS Code settings.json additions:
{
  "python.defaultInterpreterPath": "~/.pyenv/shims/python",
  "editor.formatOnSave": true,
  "editor.defaultFormatter": "esbenp.prettier-vscode",
  "[python]": {
    "editor.defaultFormatter": "ms-python.black-formatter"
  },
  "files.exclude": {
    "**/__pycache__": true,
    "**/.pytest_cache": true,
    "**/node_modules": true
  }
}

4.2.3 Helpful CLI Tools

# HTTPie - Better than curl for API testing
brew install httpie  # or pip install httpie

# jq - JSON processor
brew install jq  # or sudo apt-get install jq

# ngrok - Expose local server for webhooks
brew install ngrok  # or download from ngrok.com

# lazydocker - Docker TUI
brew install lazydocker

4.3 Repository Structure

The project is organized as a monorepo with the following structure:
voice-aiconnected/
├── README.md                    # Project overview
├── CLAUDE-CODE-CONTINUATION-PROMPT.md  # Handoff document
├── docker-compose.yml           # Local development services
├── docker-compose.prod.yml      # Production compose file
├── .env.example                 # Example environment variables
├── .gitignore                   # Git ignore rules

├── docs/                        # Documentation
│   ├── JUNIOR-DEV-PRD-PART-01.md
│   ├── JUNIOR-DEV-PRD-PART-02.md
│   ├── ... (through PART-10)
│   ├── 01-SYSTEM-ARCHITECTURE-OVERVIEW.md
│   ├── 02-GOTOCONNECT-INTEGRATION-SPECIFICATION.md
│   ├── 03-VOICE-PIPELINE-ARCHITECTURE.md
│   ├── 04-WEBRTC-BRIDGE-TECHNICAL-DESIGN.md
│   └── 05-LIVEKIT-INTEGRATION-SPECIFICATION.md

├── services/                    # Backend services
│   │
│   ├── api/                     # REST API service
│   │   ├── Dockerfile
│   │   ├── requirements.txt
│   │   ├── pyproject.toml
│   │   ├── src/
│   │   │   ├── __init__.py
│   │   │   ├── main.py          # FastAPI application
│   │   │   ├── config.py        # Configuration
│   │   │   ├── database.py      # Database connection
│   │   │   ├── models/          # SQLAlchemy models
│   │   │   ├── schemas/         # Pydantic schemas
│   │   │   ├── routers/         # API route handlers
│   │   │   ├── services/        # Business logic
│   │   │   └── utils/           # Utilities
│   │   └── tests/
│   │
│   ├── agent/                   # AI Agent service
│   │   ├── Dockerfile
│   │   ├── requirements.txt
│   │   ├── pyproject.toml
│   │   ├── src/
│   │   │   ├── __init__.py
│   │   │   ├── main.py          # Agent entry point
│   │   │   ├── config.py
│   │   │   ├── agent.py         # Agent implementation
│   │   │   ├── pipeline/        # Voice pipeline components
│   │   │   │   ├── stt.py       # Speech-to-text
│   │   │   │   ├── llm.py       # LLM integration
│   │   │   │   ├── tts.py       # Text-to-speech
│   │   │   │   └── vad.py       # Voice activity detection
│   │   │   ├── knowledge/       # RAG implementation
│   │   │   └── actions/         # Agent actions (transfer, etc.)
│   │   └── tests/
│   │
│   ├── bridge/                  # WebRTC Bridge service
│   │   ├── Dockerfile
│   │   ├── package.json
│   │   ├── src/
│   │   │   ├── index.ts         # Entry point
│   │   │   ├── config.ts
│   │   │   ├── browser.ts       # Browser automation
│   │   │   ├── softphone.ts     # Ooma softphone control
│   │   │   ├── livekit.ts       # LiveKit connection
│   │   │   └── audio.ts         # Audio routing
│   │   └── tests/
│   │
│   └── worker/                  # Background job worker
│       ├── Dockerfile
│       ├── requirements.txt
│       ├── src/
│       │   ├── __init__.py
│       │   ├── main.py
│       │   ├── jobs/            # Job definitions
│       │   └── utils/
│       └── tests/

├── web/                         # Frontend application
│   ├── Dockerfile
│   ├── package.json
│   ├── next.config.js
│   ├── src/
│   │   ├── app/                 # Next.js app router
│   │   ├── components/          # React components
│   │   ├── lib/                 # Utilities
│   │   └── styles/              # CSS
│   └── tests/

├── migrations/                  # Database migrations
│   ├── alembic.ini
│   ├── env.py
│   └── versions/                # Migration files

├── scripts/                     # Utility scripts
│   ├── setup-dev.sh             # Development setup
│   ├── seed-data.py             # Seed database
│   └── deploy.sh                # Deployment script

└── infra/                       # Infrastructure configuration
    ├── dokploy/                 # Dokploy configuration
    ├── nginx/                   # Nginx configuration
    └── monitoring/              # Monitoring setup

Service Responsibilities

api/ - REST API Service
  • Handles all HTTP requests from frontend and external systems
  • Manages authentication and authorization
  • CRUD operations for all entities
  • Exposes webhooks for external systems
agent/ - AI Agent Service
  • LiveKit Agents worker
  • Voice pipeline (STT → LLM → TTS)
  • Knowledge base queries
  • Conversation management
bridge/ - WebRTC Bridge Service
  • Browser automation (Puppeteer)
  • Ooma softphone control
  • Audio capture and routing
  • LiveKit media publishing
worker/ - Background Worker
  • Async job processing
  • Post-call processing
  • Transcript finalization
  • Usage aggregation
  • Scheduled tasks
web/ - Frontend Application
  • Agency dashboard
  • Tenant dashboard
  • Admin dashboard
  • Configuration interfaces

4.4 Environment Variables Reference

Complete list of all environment variables used by the system:
# =============================================================================
# ENVIRONMENT CONFIGURATION
# =============================================================================

# Environment: development, staging, production
NODE_ENV=development
PYTHON_ENV=development

# =============================================================================
# DATABASE
# =============================================================================

# PostgreSQL connection string
DATABASE_URL=postgresql://postgres:password@localhost:5432/voice_aiconnected

# Redis connection string
REDIS_URL=redis://localhost:6379

# =============================================================================
# GOTOCONNECT (TELEPHONY)
# =============================================================================

# OAuth 2.0 credentials
GOTOCONNECT_CLIENT_ID=your_client_id
GOTOCONNECT_CLIENT_SECRET=your_client_secret
GOTOCONNECT_REDIRECT_URI=http://localhost:3000/oauth/gotoconnect/callback

# API base URL
GOTOCONNECT_API_URL=https://api.goto.com

# Webhook validation
GOTOCONNECT_WEBHOOK_SECRET=your_webhook_secret

# Ooma Softphone credentials (for browser automation)
OOMA_USERNAME=your_ooma_username
OOMA_PASSWORD=your_ooma_password

# =============================================================================
# LIVEKIT
# =============================================================================

# LiveKit Cloud credentials
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
LIVEKIT_WS_URL=wss://your-project.livekit.cloud

# Webhook validation
LIVEKIT_WEBHOOK_SECRET=your_webhook_secret

# =============================================================================
# DEEPGRAM (STT)
# =============================================================================

DEEPGRAM_API_KEY=your_api_key

# Model configuration
DEEPGRAM_MODEL=nova-2
DEEPGRAM_LANGUAGE=en-US

# =============================================================================
# ANTHROPIC (LLM)
# =============================================================================

ANTHROPIC_API_KEY=your_api_key

# Model configuration
ANTHROPIC_MODEL=claude-sonnet-4-20250514
ANTHROPIC_MAX_TOKENS=1024

# =============================================================================
# CHATTERBOX (TTS)
# =============================================================================

# RunPod endpoint
CHATTERBOX_ENDPOINT_URL=https://your-endpoint.runpod.ai
RUNPOD_API_KEY=your_api_key

# Voice configuration
CHATTERBOX_DEFAULT_VOICE=default
CHATTERBOX_SAMPLE_RATE=24000

# =============================================================================
# OBJECT STORAGE (RECORDINGS)
# =============================================================================

# DigitalOcean Spaces (S3-compatible)
DO_SPACES_KEY=your_key
DO_SPACES_SECRET=your_secret
DO_SPACES_ENDPOINT=nyc3.digitaloceanspaces.com
DO_SPACES_BUCKET=voice-aiconnected-recordings
DO_SPACES_REGION=nyc3

# =============================================================================
# AUTHENTICATION
# =============================================================================

# JWT configuration
JWT_SECRET=your_very_long_random_secret_at_least_32_characters
JWT_ALGORITHM=HS256
JWT_EXPIRATION_HOURS=24

# Refresh token
REFRESH_TOKEN_SECRET=another_very_long_random_secret
REFRESH_TOKEN_EXPIRATION_DAYS=30

# =============================================================================
# APPLICATION
# =============================================================================

# API service
API_HOST=0.0.0.0
API_PORT=8000
API_BASE_URL=http://localhost:8000

# Frontend
NEXT_PUBLIC_API_URL=http://localhost:8000
NEXT_PUBLIC_WS_URL=ws://localhost:8000

# CORS
CORS_ORIGINS=http://localhost:3000,http://localhost:8000

# =============================================================================
# WEBHOOKS (INBOUND)
# =============================================================================

# n8n webhook URLs (where GoToConnect/LiveKit send events)
N8N_WEBHOOK_BASE_URL=http://localhost:5678/webhook

# =============================================================================
# LOGGING & MONITORING
# =============================================================================

# Log level: DEBUG, INFO, WARNING, ERROR
LOG_LEVEL=INFO

# Sentry (error tracking)
SENTRY_DSN=https://your_sentry_dsn

# =============================================================================
# FEATURE FLAGS
# =============================================================================

# Enable/disable features
FEATURE_OUTBOUND_CALLS=false
FEATURE_RECORDING=true
FEATURE_TRANSCRIPTION=true

# =============================================================================
# RATE LIMITING
# =============================================================================

# API rate limits
RATE_LIMIT_PER_MINUTE=100
RATE_LIMIT_PER_HOUR=1000

# =============================================================================
# DEVELOPMENT ONLY
# =============================================================================

# Debug mode (never enable in production)
DEBUG=true

# Skip webhook signature validation (never in production)
SKIP_WEBHOOK_VALIDATION=false

# Use mock services (for testing without external APIs)
USE_MOCK_STT=false
USE_MOCK_LLM=false
USE_MOCK_TTS=false

4.5 How to Run Locally

Step-by-step instructions to get the system running on your machine.

Step 1: Clone the Repository

git clone https://github.com/oxfordpierpont/Voice-aiConnected.git
cd Voice-aiConnected

Step 2: Copy Environment Variables

cp .env.example .env
Edit .env and fill in all the API keys from Section 4.1.

Step 3: Start Infrastructure Services

# Start PostgreSQL, Redis, and other infrastructure
docker compose up -d postgres redis

# Verify they're running
docker compose ps

Step 4: Initialize Database

# Create database
docker compose exec postgres psql -U postgres -c "CREATE DATABASE voice_aiconnected;"

# Run migrations
cd services/api
python -m alembic upgrade head

# Seed development data (optional)
cd ../../scripts
python seed-data.py

Step 5: Start Backend Services

Option A: Using Docker Compose (Recommended)
# From project root
docker compose up -d api agent bridge worker

# View logs
docker compose logs -f api
Option B: Running Directly (For Active Development) Terminal 1 - API Service:
cd services/api
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn src.main:app --reload --port 8000
Terminal 2 - Agent Service:
cd services/agent
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python src/main.py
Terminal 3 - Bridge Service:
cd services/bridge
npm install
npm run dev

Step 6: Start Frontend

cd web
npm install
npm run dev
Frontend will be available at http://localhost:3000

Step 7: Start n8n (Workflow Automation)

docker compose up -d n8n
n8n will be available at http://localhost:5678

Step 8: Expose Webhooks (For Testing)

GoToConnect needs to reach your local machine with webhooks.
# Start ngrok tunnel
ngrok http 8000

# Note the URL, e.g., https://abc123.ngrok.io
# Configure this URL in GoToConnect webhook settings

Step 9: Verify Everything Works

# Check API health
curl http://localhost:8000/health
# Expected: {"status": "healthy", "version": "0.1.0"}

# Check database connection
curl http://localhost:8000/health/db
# Expected: {"status": "connected"}

# Check Redis connection
curl http://localhost:8000/health/redis
# Expected: {"status": "connected"}

Step 10: Make a Test Call

  1. Log into the frontend at http://localhost:3000
  2. Create a test tenant with a phone number
  3. Call the phone number
  4. You should hear the AI greeting!

Troubleshooting Common Issues

Issue: Database connection refused
Connection refused to localhost:5432
Solution: Ensure PostgreSQL container is running: docker compose ps. Start it with docker compose up -d postgres. Issue: Redis connection refused
Connection refused to localhost:6379
Solution: Ensure Redis container is running: docker compose up -d redis. Issue: API key errors
401 Unauthorized from Deepgram/Anthropic/etc
Solution: Verify API keys in .env file. Check for trailing whitespace or quotes. Issue: Webhook not received
GoToConnect shows webhook failed
Solution: Ensure ngrok is running and the URL is correctly configured in GoToConnect. Check ngrok web interface at http://localhost:4040 for incoming requests. Issue: Browser automation fails
Puppeteer cannot launch browser
Solution: Ensure Chrome/Chromium is installed. On Ubuntu: sudo apt-get install chromium-browser. You may need to configure Puppeteer to use the installed browser. Issue: Port already in use
Error: listen EADDRINUSE: address already in use :::8000
Solution: Kill the existing process: lsof -i :8000 then kill -9 <PID>. Or change the port in .env.

End of Part 1

You now have:
  1. ✅ Complete understanding of what we’re building and why
  2. ✅ Full glossary of every technical term
  3. ✅ Detailed architecture with component explanations
  4. ✅ Complete development environment setup
Next: Part 2 - Database Design Part 2 will cover:
  • Complete database schema with DDL
  • Every table, column, index explained
  • Migration strategy
  • Query patterns

Document End - Part 1 of 10

Junior Developer PRD - Part 2: Database Design

Document Version: 1.0
Last Updated: January 25, 2026
Part: 2 of 10
Sections: 5-12
Audience: Junior developers with no prior context

Section 5: Database Architecture

5.1 Why PostgreSQL

We use PostgreSQL as our primary database. Here’s why:

Reasons for Choosing PostgreSQL

1. Relational Data Model Fits Our Domain Our data is inherently relational:
  • Agencies have many Tenants
  • Tenants have many Phone Numbers
  • Phone Numbers receive many Calls
  • Calls have Transcripts and Recordings
A relational database with foreign keys and joins is the natural fit. 2. pgvector Extension for AI/RAG PostgreSQL has the pgvector extension that adds:
  • Vector data type for storing embeddings
  • Vector similarity search operators
  • Indexes for fast nearest-neighbor queries
This means we can store knowledge base embeddings directly in PostgreSQL without needing a separate vector database like Pinecone or Weaviate. One less service to manage. 3. JSONB for Flexible Data Some data doesn’t fit neatly into columns:
  • Tenant configuration varies by tenant
  • Call metadata varies by call type
  • Webhook payloads from external systems
PostgreSQL’s JSONB type lets us store JSON data efficiently with indexing support. 4. Battle-Tested Reliability PostgreSQL has:
  • ACID compliance (data integrity guaranteed)
  • Excellent crash recovery
  • Mature replication for high availability
  • Decades of production use
5. Managed Options Available We can self-host or use managed services:
  • DigitalOcean Managed Databases
  • AWS RDS
  • Supabase
  • Neon
Managed databases handle backups, updates, and scaling. 6. Team Familiarity The team knows PostgreSQL. Using familiar technology means faster development and easier debugging.

What We’re NOT Using

MongoDB - Document databases are great for some use cases, but our relational data benefits from joins and foreign key constraints. MySQL - Good database, but lacks native vector support. We’d need a separate vector database. SQLite - Not suitable for concurrent access from multiple services. Separate Vector Database - Adding Pinecone/Weaviate/Milvus would mean another service to manage, another point of failure, and data synchronization challenges.

5.2 Database Naming Conventions

Consistency makes code easier to read and write. Follow these conventions exactly.

Table Names

  • Plural nouns: users, calls, tenants (not user, call, tenant)
  • Snake_case: phone_numbers, knowledge_bases (not phoneNumbers, KnowledgeBases)
  • Lowercase only: call_events (not Call_Events or CALL_EVENTS)

Column Names

  • Snake_case: created_at, tenant_id, phone_number
  • Lowercase only: Always
  • Descriptive: started_at not start, duration_seconds not dur
  • Boolean columns: Prefix with is_ or has_: is_active, has_voicemail
- **Foreign keys**: `{referenced_table_singular}_id`: `tenant_id`, `user_id`, `call_id`  
  • Timestamps: Suffix with _at: created_at, updated_at, deleted_at, started_at, ended_at

Index Names

Format: `ix_{table}_{column(s)}`

Examples:
  • ix_calls_tenant_id
  • ix_calls_started_at
  • ix_users_email
  • ix_phone_numbers_tenant_id_number

Constraint Names

**Primary Keys**: `pk_{table}`

  • pk_users
  • pk_calls
**Foreign Keys**: `fk_{table}_{referenced_table}`

  • fk_tenants_agencies
  • fk_calls_tenants
**Unique Constraints**: `uq_{table}_{column(s)}`

  • uq_users_email
  • uq_phone_numbers_number
**Check Constraints**: `ck_{table}_{description}`

  • ck_calls_duration_positive
  • ck_users_email_format

Enum Types

Format: `{table}_{column}_enum` or descriptive name

Examples:
  • call_status_enum
  • call_direction_enum
  • user_role_enum

5.3 Common Patterns Used

These patterns appear throughout the schema. Understand them once, recognize them everywhere.

Pattern 1: UUID Primary Keys

Every table uses UUID as the primary key, not auto-incrementing integers.
id UUID PRIMARY KEY DEFAULT gen_random_uuid()
Why UUIDs:
  • Can be generated client-side without database round-trip
  • No sequential guessing (security)
  • Easy to merge data from multiple sources
  • Works well with distributed systems
Why NOT auto-increment:
  • Requires database to generate ID
  • Sequential IDs leak information (how many records exist)
  • Merging data from multiple sources causes conflicts

Pattern 2: Timestamp Columns

Every table has these timestamp columns:
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
TIMESTAMPTZ (timestamp with time zone) stores the absolute moment in time. PostgreSQL converts to UTC internally and converts back to the client’s timezone on retrieval. Why not TIMESTAMP: Without timezone, you don’t know what “2026-01-25 10:00:00” means. Is it UTC? EST? PST? Automatic updated_at: We use a trigger to automatically update updated_at when a row changes:
CREATE OR REPLACE FUNCTION update_updated_at_column()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = NOW();
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- Applied to each table:
CREATE TRIGGER update_users_updated_at
    BEFORE UPDATE ON users
    FOR EACH ROW
    EXECUTE FUNCTION update_updated_at_column();

Pattern 3: Soft Deletes

We don’t actually delete records. We mark them as deleted:
deleted_at TIMESTAMPTZ DEFAULT NULL
If deleted_at is NULL: Record is active
If deleted_at has a value: Record was deleted at that time
Why soft deletes:
  • Data recovery is possible
  • Audit trail preserved
  • Foreign key relationships don’t break
  • Billing and analytics remain accurate
Querying active records:
SELECT * FROM tenants WHERE deleted_at IS NULL;
Querying deleted records:
SELECT * FROM tenants WHERE deleted_at IS NOT NULL;

Pattern 4: JSONB Configuration Columns

For flexible, schema-less data within a record:
settings JSONB NOT NULL DEFAULT '{}'::jsonb,
metadata JSONB NOT NULL DEFAULT '{}'::jsonb
When to use JSONB:
  • Data structure varies between records
  • External system payloads
  • User-configurable settings
  • Data you don’t query by frequently
When NOT to use JSONB:
  • Data you query/filter by frequently (use columns)
  • Relationships to other tables (use foreign keys)
  • Data with strict schema requirements

Pattern 5: Enum Types for Status Fields

For fields with a fixed set of values:
CREATE TYPE call_status_enum AS ENUM (
    'pending',
    'ringing',
    'answered',
    'completed',
    'failed',
    'cancelled'
);

-- Usage in table:
status call_status_enum NOT NULL DEFAULT 'pending'
Why enums:
  • Database enforces valid values
  • Typos caught at insert time
  • Self-documenting schema
  • More efficient storage than strings
When to use enums:
  • Fixed set of values that rarely changes
  • Values are known at schema design time
When NOT to use enums:
  • Values added/removed frequently
  • User-defined values
  • Hundreds of possible values

Pattern 6: Tenant Isolation

Most tables include a tenant_id foreign key:
tenant_id UUID NOT NULL REFERENCES tenants(id)
Every query should filter by tenant_id. This ensures data isolation between tenants.
-- CORRECT: Filter by tenant
SELECT * FROM calls WHERE tenant_id = $1 AND id = $2;

-- WRONG: No tenant filter (data leak risk)
SELECT * FROM calls WHERE id = $1;

Pattern 7: Audit Columns for Sensitive Operations

For tables where we need to track who did what:
created_by UUID REFERENCES users(id),
updated_by UUID REFERENCES users(id)

Section 6: Schema - Core Entities

6.1 agencies Table

Agencies are our direct customers - businesses that resell Voice by aiConnected to their clients.
-- =============================================================================
-- AGENCIES TABLE
-- =============================================================================
-- Agencies are businesses that resell Voice by aiConnected to their clients.
-- Each agency can have multiple tenants (their clients).
-- =============================================================================

CREATE TABLE agencies (
    -- Primary Key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Basic Information
    name VARCHAR(255) NOT NULL,
    slug VARCHAR(100) NOT NULL,  -- URL-friendly identifier: "oxford-pierpont"
    
    -- Contact Information
    contact_email VARCHAR(255) NOT NULL,
    contact_phone VARCHAR(50),
    contact_name VARCHAR(255),
    
    -- Address (for billing/legal)
    address_line1 VARCHAR(255),
    address_line2 VARCHAR(255),
    city VARCHAR(100),
    state VARCHAR(100),
    postal_code VARCHAR(20),
    country VARCHAR(2) DEFAULT 'US',  -- ISO 3166-1 alpha-2
    
    -- Business Details
    company_name VARCHAR(255),  -- Legal company name if different from "name"
    tax_id VARCHAR(50),         -- EIN for US companies
    
    -- Status
    status VARCHAR(50) NOT NULL DEFAULT 'active',  -- active, suspended, cancelled
    is_verified BOOLEAN NOT NULL DEFAULT FALSE,     -- Email/identity verified
    
    -- Limits & Quotas
    max_tenants INTEGER NOT NULL DEFAULT 100,       -- Maximum tenants allowed
    max_concurrent_calls INTEGER NOT NULL DEFAULT 50, -- Across all tenants
    
    -- Billing
    billing_email VARCHAR(255),
    stripe_customer_id VARCHAR(255),  -- For payment processing
    billing_plan VARCHAR(50) DEFAULT 'starter',  -- starter, growth, scale, enterprise
    
    -- Settings (flexible JSON for agency-specific config)
    settings JSONB NOT NULL DEFAULT '{}'::jsonb,
    -- Example settings:
    -- {
    --   "branding": {
    --     "logo_url": "https://...",
    --     "primary_color": "#1a73e8"
    --   },
    --   "defaults": {
    --     "voice_id": "default",
    --     "timezone": "America/New_York"
    --   },
    --   "notifications": {
    --     "email_on_new_tenant": true
    --   }
    -- }
    
    -- Metadata (for internal use)
    metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
    
    -- Timestamps
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    deleted_at TIMESTAMPTZ DEFAULT NULL,
    
    -- Constraints
    CONSTRAINT uq_agencies_slug UNIQUE (slug),
    CONSTRAINT uq_agencies_contact_email UNIQUE (contact_email),
    CONSTRAINT ck_agencies_status CHECK (status IN ('active', 'suspended', 'cancelled'))
);

-- Indexes
CREATE INDEX ix_agencies_status ON agencies(status) WHERE deleted_at IS NULL;
CREATE INDEX ix_agencies_created_at ON agencies(created_at);
CREATE INDEX ix_agencies_billing_plan ON agencies(billing_plan) WHERE deleted_at IS NULL;

-- Trigger for updated_at
CREATE TRIGGER update_agencies_updated_at
    BEFORE UPDATE ON agencies
    FOR EACH ROW
    EXECUTE FUNCTION update_updated_at_column();

-- Comments
COMMENT ON TABLE agencies IS 'Businesses that resell Voice by aiConnected to their clients';
COMMENT ON COLUMN agencies.slug IS 'URL-friendly identifier, must be unique';
COMMENT ON COLUMN agencies.max_tenants IS 'Maximum number of tenants this agency can create';
COMMENT ON COLUMN agencies.settings IS 'Agency-specific configuration as JSON';

Column Explanations

ColumnTypePurpose
idUUIDUnique identifier, auto-generated
nameVARCHAR(255)Display name: “Oxford Pierpont”
slugVARCHAR(100)URL-safe identifier: “oxford-pierpont”
contact_emailVARCHAR(255)Primary contact email
contact_phoneVARCHAR(50)Primary contact phone
contact_nameVARCHAR(255)Primary contact person’s name
address_*VariousPhysical/billing address
company_nameVARCHAR(255)Legal entity name
tax_idVARCHAR(50)Tax identification number
statusVARCHAR(50)Account status: active/suspended/cancelled
is_verifiedBOOLEANHas the agency verified their identity
max_tenantsINTEGERQuota: how many tenants allowed
max_concurrent_callsINTEGERQuota: simultaneous calls across all tenants
billing_emailVARCHAR(255)Where to send invoices
stripe_customer_idVARCHAR(255)Reference to Stripe customer
billing_planVARCHAR(50)Pricing tier
settingsJSONBFlexible configuration
metadataJSONBInternal tracking data
created_atTIMESTAMPTZWhen record was created
updated_atTIMESTAMPTZWhen record was last modified
deleted_atTIMESTAMPTZWhen record was soft-deleted (NULL if active)

6.2 tenants Table

Tenants are the end-customer businesses (agency’s clients) that use the voice AI.
-- =============================================================================
-- TENANTS TABLE
-- =============================================================================
-- Tenants are end-customer businesses that use Voice AI.
-- Each tenant belongs to one agency.
-- Tenants have their own phone numbers, knowledge base, and configuration.
-- =============================================================================

CREATE TABLE tenants (
    -- Primary Key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Relationship
    agency_id UUID NOT NULL REFERENCES agencies(id),
    
    -- Basic Information
    name VARCHAR(255) NOT NULL,           -- Display name: "Smile Dental"
    slug VARCHAR(100) NOT NULL,           -- Unique within agency: "smile-dental"
    
    -- Business Information
    business_type VARCHAR(100),           -- dental, legal, hvac, etc.
    timezone VARCHAR(50) NOT NULL DEFAULT 'America/New_York',
    
    -- Contact (for the business)
    contact_email VARCHAR(255),
    contact_phone VARCHAR(50),
    contact_name VARCHAR(255),
    website_url VARCHAR(500),
    
    -- Status
    status VARCHAR(50) NOT NULL DEFAULT 'active',  -- active, suspended, cancelled
    
    -- Limits & Quotas
    max_concurrent_calls INTEGER NOT NULL DEFAULT 10,
    max_monthly_minutes INTEGER,  -- NULL = unlimited
    
    -- Configuration
    settings JSONB NOT NULL DEFAULT '{}'::jsonb,
    -- Example settings:
    -- {
    --   "voice": {
    --     "voice_id": "alloy",
    --     "speaking_rate": 1.0,
    --     "language": "en-US"
    --   },
    --   "behavior": {
    --     "greeting_delay_ms": 500,
    --     "silence_timeout_ms": 5000,
    --     "max_call_duration_seconds": 1800
    --   },
    --   "features": {
    --     "call_recording": true,
    --     "transcription": true,
    --     "sentiment_analysis": false
    --   },
    --   "transfer": {
    --     "default_number": "+15551234567",
    --     "business_hours_only": true
    --   }
    -- }
    
    -- Metadata
    metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
    
    -- Timestamps
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    deleted_at TIMESTAMPTZ DEFAULT NULL,
    
    -- Constraints
    CONSTRAINT uq_tenants_agency_slug UNIQUE (agency_id, slug),
    CONSTRAINT ck_tenants_status CHECK (status IN ('active', 'suspended', 'cancelled'))
);

-- Indexes
CREATE INDEX ix_tenants_agency_id ON tenants(agency_id) WHERE deleted_at IS NULL;
CREATE INDEX ix_tenants_status ON tenants(status) WHERE deleted_at IS NULL;
CREATE INDEX ix_tenants_created_at ON tenants(created_at);
CREATE INDEX ix_tenants_business_type ON tenants(business_type) WHERE deleted_at IS NULL;

-- Trigger for updated_at
CREATE TRIGGER update_tenants_updated_at
    BEFORE UPDATE ON tenants
    FOR EACH ROW
    EXECUTE FUNCTION update_updated_at_column();

-- Comments
COMMENT ON TABLE tenants IS 'End-customer businesses that use Voice AI, belonging to an agency';
COMMENT ON COLUMN tenants.slug IS 'URL-friendly identifier, unique within agency';
COMMENT ON COLUMN tenants.timezone IS 'IANA timezone identifier for business hours calculations';
COMMENT ON COLUMN tenants.settings IS 'Tenant-specific configuration as JSON';

Column Explanations

ColumnTypePurpose
idUUIDUnique identifier
agency_idUUIDWhich agency owns this tenant
nameVARCHAR(255)Display name: “Smile Dental”
slugVARCHAR(100)URL-safe identifier, unique within agency
business_typeVARCHAR(100)Industry category for analytics
timezoneVARCHAR(50)IANA timezone (America/New_York)
contact_*VariousBusiness contact information
website_urlVARCHAR(500)Business website
statusVARCHAR(50)Account status
max_concurrent_callsINTEGERHow many simultaneous calls allowed
max_monthly_minutesINTEGERMonthly minute quota (NULL = unlimited)
settingsJSONBAll tenant configuration
metadataJSONBInternal tracking
created_atTIMESTAMPTZCreation timestamp
updated_atTIMESTAMPTZLast modification
deleted_atTIMESTAMPTZSoft delete timestamp

6.3 users Table

Users are humans who log into the platform - agency admins, tenant admins, etc.
-- =============================================================================
-- USERS TABLE
-- =============================================================================
-- Users are humans who log into the platform.
-- A user can belong to an agency (agency staff) or a tenant (tenant staff).
-- Platform admins have neither agency_id nor tenant_id.
-- =============================================================================

CREATE TABLE users (
    -- Primary Key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Relationships (one or none, not both)
    agency_id UUID REFERENCES agencies(id),  -- NULL if platform admin or tenant user
    tenant_id UUID REFERENCES tenants(id),   -- NULL if platform admin or agency user
    
    -- Authentication
    email VARCHAR(255) NOT NULL,
    password_hash VARCHAR(255) NOT NULL,  -- bcrypt hash
    
    -- Profile
    first_name VARCHAR(100) NOT NULL,
    last_name VARCHAR(100) NOT NULL,
    phone VARCHAR(50),
    avatar_url VARCHAR(500),
    
    -- Role (see user_roles table for detailed permissions)
    role VARCHAR(50) NOT NULL DEFAULT 'user',
    -- Possible roles:
    -- 'platform_admin' - Full platform access (aiConnected staff)
    -- 'agency_admin' - Full agency access
    -- 'agency_user' - Limited agency access
    -- 'tenant_admin' - Full tenant access
    -- 'tenant_user' - Limited tenant access
    
    -- Status
    status VARCHAR(50) NOT NULL DEFAULT 'active',  -- active, suspended, invited
    is_verified BOOLEAN NOT NULL DEFAULT FALSE,     -- Email verified
    
    -- Security
    last_login_at TIMESTAMPTZ,
    last_login_ip VARCHAR(45),  -- IPv6 can be 45 chars
    failed_login_attempts INTEGER NOT NULL DEFAULT 0,
    locked_until TIMESTAMPTZ,
    
    -- Password Reset
    password_reset_token VARCHAR(255),
    password_reset_expires TIMESTAMPTZ,
    
    -- Email Verification
    email_verification_token VARCHAR(255),
    email_verified_at TIMESTAMPTZ,
    
    -- Preferences
    preferences JSONB NOT NULL DEFAULT '{}'::jsonb,
    -- Example preferences:
    -- {
    --   "theme": "dark",
    --   "timezone": "America/Los_Angeles",
    --   "notifications": {
    --     "email": true,
    --     "sms": false
    --   },
    --   "dashboard": {
    --     "default_view": "calls"
    --   }
    -- }
    
    -- Timestamps
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    deleted_at TIMESTAMPTZ DEFAULT NULL,
    
    -- Constraints
    CONSTRAINT uq_users_email UNIQUE (email),
    CONSTRAINT ck_users_status CHECK (status IN ('active', 'suspended', 'invited')),
    CONSTRAINT ck_users_role CHECK (role IN ('platform_admin', 'agency_admin', 'agency_user', 'tenant_admin', 'tenant_user')),
    -- User must belong to agency OR tenant OR neither (platform admin), not both
    CONSTRAINT ck_users_ownership CHECK (
        (agency_id IS NULL AND tenant_id IS NULL) OR  -- platform admin
        (agency_id IS NOT NULL AND tenant_id IS NULL) OR  -- agency user
        (agency_id IS NULL AND tenant_id IS NOT NULL)  -- tenant user
    )
);

-- Indexes
CREATE INDEX ix_users_email ON users(email) WHERE deleted_at IS NULL;
CREATE INDEX ix_users_agency_id ON users(agency_id) WHERE deleted_at IS NULL;
CREATE INDEX ix_users_tenant_id ON users(tenant_id) WHERE deleted_at IS NULL;
CREATE INDEX ix_users_role ON users(role) WHERE deleted_at IS NULL;
CREATE INDEX ix_users_status ON users(status) WHERE deleted_at IS NULL;

-- Trigger for updated_at
CREATE TRIGGER update_users_updated_at
    BEFORE UPDATE ON users
    FOR EACH ROW
    EXECUTE FUNCTION update_updated_at_column();

-- Comments
COMMENT ON TABLE users IS 'Human users who log into the platform';
COMMENT ON COLUMN users.password_hash IS 'bcrypt hashed password, never store plaintext';
COMMENT ON COLUMN users.role IS 'User role determining permission level';
COMMENT ON COLUMN users.failed_login_attempts IS 'Counter for rate limiting, reset on successful login';

Column Explanations

ColumnTypePurpose
idUUIDUnique identifier
agency_idUUIDAgency this user belongs to (NULL if tenant user or platform admin)
tenant_idUUIDTenant this user belongs to (NULL if agency user or platform admin)
emailVARCHAR(255)Login email, must be unique
password_hashVARCHAR(255)bcrypt hash of password
first_nameVARCHAR(100)User’s first name
last_nameVARCHAR(100)User’s last name
phoneVARCHAR(50)Contact phone number
avatar_urlVARCHAR(500)Profile picture URL
roleVARCHAR(50)Permission level
statusVARCHAR(50)Account status
is_verifiedBOOLEANHas email been verified
last_login_atTIMESTAMPTZWhen user last logged in
last_login_ipVARCHAR(45)IP address of last login
failed_login_attemptsINTEGERCount of failed logins (for lockout)
locked_untilTIMESTAMPTZAccount locked until this time
password_reset_*VariousPassword reset flow fields
email_verification_*VariousEmail verification flow fields
preferencesJSONBUser preferences and settings

6.4 user_roles and permissions Tables

Fine-grained permission control for advanced use cases.
-- =============================================================================
-- PERMISSIONS TABLE
-- =============================================================================
-- Defines all possible permissions in the system.
-- These are referenced by roles.
-- =============================================================================

CREATE TABLE permissions (
    -- Primary Key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Permission Definition
    code VARCHAR(100) NOT NULL,  -- 'calls.view', 'tenants.create', etc.
    name VARCHAR(255) NOT NULL,   -- Human-readable name
    description TEXT,
    
    -- Grouping
    category VARCHAR(100) NOT NULL,  -- 'calls', 'tenants', 'analytics', etc.
    
    -- Timestamps
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    
    -- Constraints
    CONSTRAINT uq_permissions_code UNIQUE (code)
);

-- Seed default permissions
INSERT INTO permissions (code, name, description, category) VALUES
    -- Call permissions
    ('calls.view', 'View Calls', 'View call list and details', 'calls'),
    ('calls.listen', 'Listen to Recordings', 'Listen to call recordings', 'calls'),
    ('calls.export', 'Export Calls', 'Export call data to CSV/Excel', 'calls'),
    ('calls.delete', 'Delete Calls', 'Delete call records', 'calls'),
    
    -- Tenant permissions
    ('tenants.view', 'View Tenants', 'View tenant list and details', 'tenants'),
    ('tenants.create', 'Create Tenants', 'Create new tenants', 'tenants'),
    ('tenants.edit', 'Edit Tenants', 'Modify tenant settings', 'tenants'),
    ('tenants.delete', 'Delete Tenants', 'Delete tenants', 'tenants'),
    
    -- Phone number permissions
    ('phone_numbers.view', 'View Phone Numbers', 'View phone number list', 'phone_numbers'),
    ('phone_numbers.provision', 'Provision Numbers', 'Add new phone numbers', 'phone_numbers'),
    ('phone_numbers.configure', 'Configure Numbers', 'Change number settings', 'phone_numbers'),
    ('phone_numbers.release', 'Release Numbers', 'Remove phone numbers', 'phone_numbers'),
    
    -- Knowledge base permissions
    ('knowledge.view', 'View Knowledge Base', 'View knowledge documents', 'knowledge'),
    ('knowledge.create', 'Add Knowledge', 'Add documents to knowledge base', 'knowledge'),
    ('knowledge.edit', 'Edit Knowledge', 'Modify knowledge documents', 'knowledge'),
    ('knowledge.delete', 'Delete Knowledge', 'Remove knowledge documents', 'knowledge'),
    
    -- Analytics permissions
    ('analytics.view', 'View Analytics', 'View basic analytics', 'analytics'),
    ('analytics.export', 'Export Analytics', 'Export analytics data', 'analytics'),
    ('analytics.advanced', 'Advanced Analytics', 'Access advanced analytics features', 'analytics'),
    
    -- User management permissions
    ('users.view', 'View Users', 'View user list', 'users'),
    ('users.create', 'Create Users', 'Invite new users', 'users'),
    ('users.edit', 'Edit Users', 'Modify user profiles', 'users'),
    ('users.delete', 'Delete Users', 'Remove users', 'users'),
    
    -- Settings permissions
    ('settings.view', 'View Settings', 'View configuration', 'settings'),
    ('settings.edit', 'Edit Settings', 'Modify configuration', 'settings'),
    
    -- Billing permissions (agency only)
    ('billing.view', 'View Billing', 'View invoices and usage', 'billing'),
    ('billing.manage', 'Manage Billing', 'Update payment methods', 'billing')
;

-- =============================================================================
-- ROLE_PERMISSIONS TABLE
-- =============================================================================
-- Maps roles to their permissions.
-- This defines what each role can do.
-- =============================================================================

CREATE TABLE role_permissions (
    -- Composite Primary Key
    role VARCHAR(50) NOT NULL,
    permission_id UUID NOT NULL REFERENCES permissions(id),
    
    -- Timestamps
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    
    -- Constraints
    CONSTRAINT pk_role_permissions PRIMARY KEY (role, permission_id)
);

-- Seed role permissions

-- Platform Admin: Everything
INSERT INTO role_permissions (role, permission_id)
SELECT 'platform_admin', id FROM permissions;

-- Agency Admin: Everything except platform-level
INSERT INTO role_permissions (role, permission_id)
SELECT 'agency_admin', id FROM permissions
WHERE code NOT LIKE 'platform.%';

-- Agency User: View + limited actions
INSERT INTO role_permissions (role, permission_id)
SELECT 'agency_user', id FROM permissions
WHERE code IN (
    'calls.view', 'calls.listen',
    'tenants.view',
    'phone_numbers.view',
    'knowledge.view',
    'analytics.view'
);

-- Tenant Admin: Full tenant access
INSERT INTO role_permissions (role, permission_id)
SELECT 'tenant_admin', id FROM permissions
WHERE code IN (
    'calls.view', 'calls.listen', 'calls.export',
    'phone_numbers.view', 'phone_numbers.configure',
    'knowledge.view', 'knowledge.create', 'knowledge.edit', 'knowledge.delete',
    'analytics.view', 'analytics.export',
    'users.view', 'users.create', 'users.edit',
    'settings.view', 'settings.edit'
);

-- Tenant User: View only
INSERT INTO role_permissions (role, permission_id)
SELECT 'tenant_user', id FROM permissions
WHERE code IN (
    'calls.view', 'calls.listen',
    'knowledge.view',
    'analytics.view'
);

-- Index for lookup
CREATE INDEX ix_role_permissions_role ON role_permissions(role);

Section 7: Schema - Telephony Entities

7.1 phone_numbers Table

Phone numbers provisioned through GoToConnect and assigned to tenants.
-- =============================================================================
-- PHONE_NUMBERS TABLE
-- =============================================================================
-- Phone numbers provisioned from GoToConnect and assigned to tenants.
-- Each phone number belongs to exactly one tenant.
-- =============================================================================

CREATE TABLE phone_numbers (
    -- Primary Key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Relationships
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    
    -- Phone Number (E.164 format)
    number VARCHAR(20) NOT NULL,  -- +15551234567
    
    -- Display Information
    friendly_name VARCHAR(255),  -- "Main Office Line"
    
    -- Provider Information
    provider VARCHAR(50) NOT NULL DEFAULT 'gotoconnect',  -- gotoconnect, twilio, etc.
    provider_id VARCHAR(255),     -- Provider's ID for this number
    provider_data JSONB NOT NULL DEFAULT '{}'::jsonb,  -- Provider-specific data
    
    -- Capabilities
    capabilities JSONB NOT NULL DEFAULT '{
        "voice": true,
        "sms": false,
        "mms": false,
        "fax": false
    }'::jsonb,
    
    -- Configuration
    settings JSONB NOT NULL DEFAULT '{}'::jsonb,
    -- Example settings:
    -- {
    --   "greeting_id": "uuid-of-greeting",
    --   "voicemail_enabled": true,
    --   "voicemail_greeting_id": "uuid-of-vm-greeting",
    --   "transfer_enabled": true,
    --   "transfer_number": "+15559876543",
    --   "business_hours_id": "uuid-of-hours",
    --   "after_hours_action": "voicemail"  -- voicemail, transfer, message
    -- }
    
    -- Status
    status VARCHAR(50) NOT NULL DEFAULT 'active',  -- active, suspended, released
    
    -- Timestamps
    provisioned_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),  -- When number was acquired
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    deleted_at TIMESTAMPTZ DEFAULT NULL,
    
    -- Constraints
    CONSTRAINT uq_phone_numbers_number UNIQUE (number),
    CONSTRAINT ck_phone_numbers_status CHECK (status IN ('active', 'suspended', 'released')),
    CONSTRAINT ck_phone_numbers_e164 CHECK (number ~ '^\+[1-9]\d{1,14}$')  -- E.164 format
);

-- Indexes
CREATE INDEX ix_phone_numbers_tenant_id ON phone_numbers(tenant_id) WHERE deleted_at IS NULL;
CREATE INDEX ix_phone_numbers_number ON phone_numbers(number) WHERE deleted_at IS NULL;
CREATE INDEX ix_phone_numbers_status ON phone_numbers(status) WHERE deleted_at IS NULL;

-- Trigger for updated_at
CREATE TRIGGER update_phone_numbers_updated_at
    BEFORE UPDATE ON phone_numbers
    FOR EACH ROW
    EXECUTE FUNCTION update_updated_at_column();

-- Comments
COMMENT ON TABLE phone_numbers IS 'Phone numbers provisioned for tenants';
COMMENT ON COLUMN phone_numbers.number IS 'Phone number in E.164 format (+15551234567)';
COMMENT ON COLUMN phone_numbers.provider_id IS 'The ID assigned by the telephony provider';

7.2 calls Table

The central table tracking all phone calls.
-- =============================================================================
-- CALLS TABLE
-- =============================================================================
-- Records every phone call processed by the system.
-- This is one of the most frequently queried tables.
-- =============================================================================

CREATE TYPE call_direction_enum AS ENUM ('inbound', 'outbound');
CREATE TYPE call_status_enum AS ENUM (
    'pending',      -- Call initiated but not yet connected
    'ringing',      -- Phone is ringing
    'answered',     -- Call connected, conversation active
    'completed',    -- Call ended normally
    'failed',       -- Call failed to connect
    'cancelled',    -- Call cancelled before connecting
    'transferred',  -- Call transferred to another number
    'voicemail'     -- Call went to voicemail
);

CREATE TABLE calls (
    -- Primary Key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Relationships
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    phone_number_id UUID NOT NULL REFERENCES phone_numbers(id),
    
    -- Call Identification
    external_call_id VARCHAR(255),  -- GoToConnect's call ID
    livekit_room_name VARCHAR(255), -- LiveKit room name
    
    -- Direction
    direction call_direction_enum NOT NULL,
    
    -- Parties
    from_number VARCHAR(20) NOT NULL,  -- Caller's number (E.164)
    to_number VARCHAR(20) NOT NULL,    -- Called number (E.164)
    
    -- For outbound calls, track the campaign/reason
    campaign_id UUID REFERENCES campaigns(id),  -- Optional link to outbound campaign
    
    -- Status
    status call_status_enum NOT NULL DEFAULT 'pending',
    
    -- Timing
    initiated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),  -- When call was initiated
    ringing_at TIMESTAMPTZ,                            -- When phone started ringing
    answered_at TIMESTAMPTZ,                           -- When call was answered
    ended_at TIMESTAMPTZ,                              -- When call ended
    
    -- Duration (calculated, but stored for query performance)
    duration_seconds INTEGER,  -- Total duration from answer to end
    ring_duration_seconds INTEGER,  -- Time ringing before answer
    
    -- Outcome
    outcome VARCHAR(100),  -- appointment_scheduled, question_answered, transferred, etc.
    outcome_details JSONB DEFAULT '{}'::jsonb,
    
    -- Sentiment Analysis (if enabled)
    sentiment_score DECIMAL(3,2),  -- -1.00 to 1.00
    sentiment_label VARCHAR(50),   -- positive, negative, neutral
    
    -- Cost Tracking
    cost_cents INTEGER,  -- Total cost in cents
    cost_breakdown JSONB DEFAULT '{}'::jsonb,
    -- Example: {"telephony": 1, "stt": 2, "llm": 3, "tts": 1, "livekit": 1}
    
    -- Recording
    recording_url VARCHAR(500),
    recording_duration_seconds INTEGER,
    recording_storage_path VARCHAR(500),  -- Internal storage path
    
    -- Transcript Reference
    transcript_id UUID,  -- Will be foreign key to transcripts
    
    -- Error Information (for failed calls)
    error_code VARCHAR(100),
    error_message TEXT,
    
    -- Metadata
    metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
    -- Example metadata:
    -- {
    --   "user_agent": "iPhone/15.0",
    --   "carrier": "Verizon",
    --   "gotoconnect_data": { ... },
    --   "livekit_data": { ... }
    -- }
    
    -- Timestamps
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    
    -- Constraints
    CONSTRAINT ck_calls_duration_positive CHECK (duration_seconds IS NULL OR duration_seconds >= 0),
    CONSTRAINT ck_calls_cost_positive CHECK (cost_cents IS NULL OR cost_cents >= 0)
);

-- Indexes (heavily optimized for common query patterns)

-- Most common: List calls for a tenant
CREATE INDEX ix_calls_tenant_id_created_at ON calls(tenant_id, created_at DESC);

-- Filter by status
CREATE INDEX ix_calls_tenant_id_status ON calls(tenant_id, status);

-- Filter by phone number
CREATE INDEX ix_calls_phone_number_id ON calls(phone_number_id);

-- Time-based queries for analytics
CREATE INDEX ix_calls_initiated_at ON calls(initiated_at);
CREATE INDEX ix_calls_answered_at ON calls(answered_at) WHERE answered_at IS NOT NULL;

-- Lookup by external ID
CREATE INDEX ix_calls_external_call_id ON calls(external_call_id) WHERE external_call_id IS NOT NULL;

-- LiveKit room lookup
CREATE INDEX ix_calls_livekit_room_name ON calls(livekit_room_name) WHERE livekit_room_name IS NOT NULL;

-- Direction filter
CREATE INDEX ix_calls_tenant_direction ON calls(tenant_id, direction);

-- Outcome analysis
CREATE INDEX ix_calls_tenant_outcome ON calls(tenant_id, outcome) WHERE outcome IS NOT NULL;

-- Trigger for updated_at
CREATE TRIGGER update_calls_updated_at
    BEFORE UPDATE ON calls
    FOR EACH ROW
    EXECUTE FUNCTION update_updated_at_column();

-- Comments
COMMENT ON TABLE calls IS 'All phone calls processed by the system';
COMMENT ON COLUMN calls.external_call_id IS 'Call ID from GoToConnect for correlation';
COMMENT ON COLUMN calls.duration_seconds IS 'Conversation duration, NULL if not answered';
COMMENT ON COLUMN calls.cost_cents IS 'Total cost in cents for billing';

Column Explanations

ColumnTypePurpose
idUUIDUnique call identifier
tenant_idUUIDWhich tenant this call belongs to
phone_number_idUUIDWhich phone number received/made the call
external_call_idVARCHAR(255)GoToConnect’s ID for correlation
livekit_room_nameVARCHAR(255)LiveKit room for audio routing
directionENUMinbound or outbound
from_numberVARCHAR(20)Caller’s phone number
to_numberVARCHAR(20)Recipient’s phone number
statusENUMCurrent call state
initiated_atTIMESTAMPTZWhen call started
ringing_atTIMESTAMPTZWhen ringing began
answered_atTIMESTAMPTZWhen call was answered
ended_atTIMESTAMPTZWhen call ended
duration_secondsINTEGERLength of conversation
outcomeVARCHAR(100)Result classification
sentiment_scoreDECIMALCaller sentiment (-1 to 1)
cost_centsINTEGERTotal cost for billing
recording_urlVARCHAR(500)URL to access recording
transcript_idUUIDLink to transcript record
error_*VariousError details if call failed
metadataJSONBAdditional call data

7.3 call_events Table

State machine history for every call - tracks every status change.
-- =============================================================================
-- CALL_EVENTS TABLE
-- =============================================================================
-- Tracks every state transition for a call.
-- This is an append-only audit log.
-- =============================================================================

CREATE TABLE call_events (
    -- Primary Key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Relationship
    call_id UUID NOT NULL REFERENCES calls(id),
    
    -- Event Information
    event_type VARCHAR(100) NOT NULL,
    -- Event types:
    -- 'status_changed' - Status transition
    -- 'participant_joined' - Someone joined the call
    -- 'participant_left' - Someone left the call
    -- 'recording_started' - Recording began
    -- 'recording_stopped' - Recording ended
    -- 'transfer_initiated' - Transfer started
    -- 'transfer_completed' - Transfer finished
    -- 'dtmf_received' - Button press detected
    -- 'speech_detected' - VAD detected speech
    -- 'response_generated' - AI generated response
    -- 'error_occurred' - Error happened
    
    -- Previous and New State (for status_changed events)
    previous_status call_status_enum,
    new_status call_status_enum,
    
    -- Event Data
    data JSONB NOT NULL DEFAULT '{}'::jsonb,
    -- Example data by event_type:
    -- status_changed: {"reason": "caller_hangup"}
    -- participant_joined: {"participant_id": "...", "participant_type": "ai_agent"}
    -- dtmf_received: {"digit": "1"}
    -- speech_detected: {"duration_ms": 2500, "transcript": "..."}
    -- response_generated: {"response": "...", "latency_ms": 450}
    -- error_occurred: {"error_code": "timeout", "message": "..."}
    
    -- Source of event
    source VARCHAR(100) NOT NULL,  -- gotoconnect, livekit, agent, system
    
    -- Timestamp (event time, not record creation time)
    occurred_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    
    -- Record creation
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Indexes
CREATE INDEX ix_call_events_call_id ON call_events(call_id);
CREATE INDEX ix_call_events_call_id_occurred_at ON call_events(call_id, occurred_at);
CREATE INDEX ix_call_events_event_type ON call_events(event_type);
CREATE INDEX ix_call_events_occurred_at ON call_events(occurred_at);

-- Comments
COMMENT ON TABLE call_events IS 'Append-only audit log of all call state transitions';
COMMENT ON COLUMN call_events.occurred_at IS 'When the event actually occurred (may differ from created_at)';

7.4 call_transfers Table

Tracks when calls are transferred to humans or other destinations.
-- =============================================================================
-- CALL_TRANSFERS TABLE
-- =============================================================================
-- Records call transfers from AI to human or other destinations.
-- =============================================================================

CREATE TYPE transfer_type_enum AS ENUM ('cold', 'warm', 'blind');
CREATE TYPE transfer_status_enum AS ENUM ('pending', 'ringing', 'connected', 'failed', 'rejected');

CREATE TABLE call_transfers (
    -- Primary Key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Relationships
    call_id UUID NOT NULL REFERENCES calls(id),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    
    -- Transfer Details
    transfer_type transfer_type_enum NOT NULL,
    -- cold: Caller placed on hold, AI speaks to recipient first
    -- warm: AI introduces caller, then connects
    -- blind: Immediate transfer without introduction
    
    -- Destination
    destination_number VARCHAR(20) NOT NULL,  -- E.164
    destination_name VARCHAR(255),            -- "Dr. Smith" or "Main Office"
    
    -- Reason for Transfer
    reason VARCHAR(255),         -- User requested, business hours, escalation
    reason_details TEXT,         -- Additional context
    
    -- Status
    status transfer_status_enum NOT NULL DEFAULT 'pending',
    
    -- Timing
    initiated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    connected_at TIMESTAMPTZ,    -- When transfer target answered
    completed_at TIMESTAMPTZ,    -- When transfer fully completed or failed
    
    -- Outcome
    outcome VARCHAR(100),        -- connected, no_answer, busy, rejected
    
    -- Error Information
    error_code VARCHAR(100),
    error_message TEXT,
    
    -- Timestamps
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Indexes
CREATE INDEX ix_call_transfers_call_id ON call_transfers(call_id);
CREATE INDEX ix_call_transfers_tenant_id ON call_transfers(tenant_id);
CREATE INDEX ix_call_transfers_status ON call_transfers(status);
CREATE INDEX ix_call_transfers_initiated_at ON call_transfers(initiated_at);

-- Trigger for updated_at
CREATE TRIGGER update_call_transfers_updated_at
    BEFORE UPDATE ON call_transfers
    FOR EACH ROW
    EXECUTE FUNCTION update_updated_at_column();

-- Comments
COMMENT ON TABLE call_transfers IS 'Records of call transfers from AI to other destinations';

Section 8: Schema - AI & Content Entities

8.1 knowledge_bases Table

Container for tenant knowledge - documents, FAQs, etc.
-- =============================================================================
-- KNOWLEDGE_BASES TABLE
-- =============================================================================
-- Each tenant has one primary knowledge base.
-- The knowledge base contains documents that the AI can reference.
-- =============================================================================

CREATE TABLE knowledge_bases (
    -- Primary Key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Relationship
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    
    -- Basic Information
    name VARCHAR(255) NOT NULL DEFAULT 'Primary Knowledge Base',
    description TEXT,
    
    -- Status
    status VARCHAR(50) NOT NULL DEFAULT 'active',  -- active, processing, error
    
    -- Statistics (denormalized for quick access)
    document_count INTEGER NOT NULL DEFAULT 0,
    chunk_count INTEGER NOT NULL DEFAULT 0,
    total_tokens INTEGER NOT NULL DEFAULT 0,
    
    -- Processing Status
    last_processed_at TIMESTAMPTZ,
    processing_error TEXT,
    
    -- Configuration
    settings JSONB NOT NULL DEFAULT '{}'::jsonb,
    -- Example settings:
    -- {
    --   "chunk_size": 500,
    --   "chunk_overlap": 50,
    --   "embedding_model": "text-embedding-3-small"
    -- }
    
    -- Timestamps
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    deleted_at TIMESTAMPTZ DEFAULT NULL,
    
    -- Constraints
    CONSTRAINT uq_knowledge_bases_tenant UNIQUE (tenant_id)  -- One KB per tenant
);

-- Indexes
CREATE INDEX ix_knowledge_bases_tenant_id ON knowledge_bases(tenant_id) WHERE deleted_at IS NULL;

-- Trigger for updated_at
CREATE TRIGGER update_knowledge_bases_updated_at
    BEFORE UPDATE ON knowledge_bases
    FOR EACH ROW
    EXECUTE FUNCTION update_updated_at_column();

-- Comments
COMMENT ON TABLE knowledge_bases IS 'Container for tenant knowledge documents';

8.2 knowledge_documents Table

Individual documents uploaded to knowledge bases.
-- =============================================================================
-- KNOWLEDGE_DOCUMENTS TABLE
-- =============================================================================
-- Individual documents within a knowledge base.
-- Documents are processed into chunks for retrieval.
-- =============================================================================

CREATE TYPE document_type_enum AS ENUM ('text', 'pdf', 'url', 'faq');
CREATE TYPE document_status_enum AS ENUM ('pending', 'processing', 'ready', 'error');

CREATE TABLE knowledge_documents (
    -- Primary Key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Relationships
    knowledge_base_id UUID NOT NULL REFERENCES knowledge_bases(id),
    tenant_id UUID NOT NULL REFERENCES tenants(id),  -- Denormalized for query efficiency
    
    -- Document Information
    name VARCHAR(255) NOT NULL,
    document_type document_type_enum NOT NULL,
    
    -- Content
    original_content TEXT,        -- Original text content (for text type)
    source_url VARCHAR(500),      -- Source URL (for url type)
    storage_path VARCHAR(500),    -- Path to stored file (for pdf type)
    
    -- Processing
    status document_status_enum NOT NULL DEFAULT 'pending',
    processed_at TIMESTAMPTZ,
    processing_error TEXT,
    
    -- Statistics
    chunk_count INTEGER NOT NULL DEFAULT 0,
    token_count INTEGER NOT NULL DEFAULT 0,
    character_count INTEGER NOT NULL DEFAULT 0,
    
    -- Metadata
    metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
    -- Example metadata:
    -- {
    --   "file_name": "services.pdf",
    --   "file_size": 102400,
    --   "mime_type": "application/pdf",
    --   "source": "upload",
    --   "uploaded_by": "user-uuid"
    -- }
    
    -- Timestamps
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    deleted_at TIMESTAMPTZ DEFAULT NULL
);

-- Indexes
CREATE INDEX ix_knowledge_documents_kb_id ON knowledge_documents(knowledge_base_id) WHERE deleted_at IS NULL;
CREATE INDEX ix_knowledge_documents_tenant_id ON knowledge_documents(tenant_id) WHERE deleted_at IS NULL;
CREATE INDEX ix_knowledge_documents_status ON knowledge_documents(status) WHERE deleted_at IS NULL;
CREATE INDEX ix_knowledge_documents_type ON knowledge_documents(document_type) WHERE deleted_at IS NULL;

-- Trigger for updated_at
CREATE TRIGGER update_knowledge_documents_updated_at
    BEFORE UPDATE ON knowledge_documents
    FOR EACH ROW
    EXECUTE FUNCTION update_updated_at_column();

-- Comments
COMMENT ON TABLE knowledge_documents IS 'Source documents for knowledge base retrieval';

8.3 knowledge_chunks Table

Chunked document content with embeddings for vector search.
-- =============================================================================
-- KNOWLEDGE_CHUNKS TABLE
-- =============================================================================
-- Documents are split into chunks for efficient retrieval.
-- Each chunk has an embedding vector for similarity search.
-- =============================================================================

-- Enable pgvector extension (run once)
CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE knowledge_chunks (
    -- Primary Key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Relationships
    document_id UUID NOT NULL REFERENCES knowledge_documents(id) ON DELETE CASCADE,
    knowledge_base_id UUID NOT NULL REFERENCES knowledge_bases(id),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    
    -- Content
    content TEXT NOT NULL,
    
    -- Position within document
    chunk_index INTEGER NOT NULL,  -- 0-based position in document
    start_char INTEGER,            -- Starting character position
    end_char INTEGER,              -- Ending character position
    
    -- Embedding
    embedding vector(1536),  -- OpenAI text-embedding-3-small dimension
    -- Note: Dimension depends on embedding model used
    
    -- Metadata
    metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
    -- Example metadata:
    -- {
    --   "section": "Services",
    --   "heading": "Teeth Cleaning",
    --   "tokens": 150
    -- }
    
    -- Timestamps
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Indexes

-- Standard lookups
CREATE INDEX ix_knowledge_chunks_document_id ON knowledge_chunks(document_id);
CREATE INDEX ix_knowledge_chunks_kb_id ON knowledge_chunks(knowledge_base_id);
CREATE INDEX ix_knowledge_chunks_tenant_id ON knowledge_chunks(tenant_id);

-- Vector similarity search (IVFFlat index for approximate nearest neighbor)
-- Note: Create this AFTER initial data load for better index quality
CREATE INDEX ix_knowledge_chunks_embedding ON knowledge_chunks 
    USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);  -- Adjust lists based on data size

-- Comments
COMMENT ON TABLE knowledge_chunks IS 'Document chunks with embeddings for vector search';
COMMENT ON COLUMN knowledge_chunks.embedding IS 'Vector embedding from text-embedding-3-small (1536 dimensions)';

Vector Search Query Example

-- Find similar chunks for a query embedding
SELECT 
    kc.id,
    kc.content,
    kd.name as document_name,
    1 - (kc.embedding <=> $1) as similarity  -- Cosine similarity
FROM knowledge_chunks kc
JOIN knowledge_documents kd ON kc.document_id = kd.id
WHERE kc.tenant_id = $2
    AND kd.deleted_at IS NULL
ORDER BY kc.embedding <=> $1  -- Order by cosine distance
LIMIT 5;

-- $1 = query embedding vector
-- $2 = tenant_id

8.4 transcripts Table

Full transcripts of conversations.
-- =============================================================================
-- TRANSCRIPTS TABLE
-- =============================================================================
-- Complete transcripts of calls with speaker-attributed turns.
-- =============================================================================

CREATE TABLE transcripts (
    -- Primary Key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Relationships
    call_id UUID NOT NULL REFERENCES calls(id),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    
    -- Status
    status VARCHAR(50) NOT NULL DEFAULT 'processing',  -- processing, complete, error
    
    -- Full Text (for search)
    full_text TEXT,  -- Complete transcript as single text
    
    -- Statistics
    turn_count INTEGER NOT NULL DEFAULT 0,
    word_count INTEGER NOT NULL DEFAULT 0,
    duration_seconds INTEGER,
    
    -- Processing
    processed_at TIMESTAMPTZ,
    processing_error TEXT,
    
    -- Timestamps
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Indexes
CREATE INDEX ix_transcripts_call_id ON transcripts(call_id);
CREATE INDEX ix_transcripts_tenant_id ON transcripts(tenant_id);
CREATE INDEX ix_transcripts_status ON transcripts(status);

-- Full-text search index
CREATE INDEX ix_transcripts_full_text ON transcripts 
    USING gin(to_tsvector('english', full_text))
    WHERE full_text IS NOT NULL;

-- Trigger for updated_at
CREATE TRIGGER update_transcripts_updated_at
    BEFORE UPDATE ON transcripts
    FOR EACH ROW
    EXECUTE FUNCTION update_updated_at_column();

-- Update foreign key in calls table
ALTER TABLE calls ADD CONSTRAINT fk_calls_transcript 
    FOREIGN KEY (transcript_id) REFERENCES transcripts(id);

-- =============================================================================
-- TRANSCRIPT_TURNS TABLE
-- =============================================================================
-- Individual turns (utterances) within a transcript.
-- =============================================================================

CREATE TYPE speaker_type_enum AS ENUM ('caller', 'agent', 'system');

CREATE TABLE transcript_turns (
    -- Primary Key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Relationships
    transcript_id UUID NOT NULL REFERENCES transcripts(id) ON DELETE CASCADE,
    call_id UUID NOT NULL REFERENCES calls(id),
    
    -- Turn Information
    turn_index INTEGER NOT NULL,  -- 0-based order
    speaker speaker_type_enum NOT NULL,
    
    -- Content
    content TEXT NOT NULL,
    
    -- Timing (relative to call start)
    start_time_ms INTEGER NOT NULL,  -- Milliseconds from call start
    end_time_ms INTEGER NOT NULL,
    duration_ms INTEGER GENERATED ALWAYS AS (end_time_ms - start_time_ms) STORED,
    
    -- Confidence (from STT)
    confidence DECIMAL(5,4),  -- 0.0000 to 1.0000
    
    -- Metadata
    metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
    -- Example:
    -- {
    --   "is_interruption": false,
    --   "sentiment": "neutral",
    --   "intent": "greeting"
    -- }
    
    -- Timestamp
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Indexes
CREATE INDEX ix_transcript_turns_transcript_id ON transcript_turns(transcript_id);
CREATE INDEX ix_transcript_turns_call_id ON transcript_turns(call_id);
CREATE INDEX ix_transcript_turns_order ON transcript_turns(transcript_id, turn_index);

-- Comments
COMMENT ON TABLE transcripts IS 'Complete call transcripts';
COMMENT ON TABLE transcript_turns IS 'Individual speaker turns within transcripts';

8.5 recordings Table

Call recording metadata and storage references.
-- =============================================================================
-- RECORDINGS TABLE
-- =============================================================================
-- Metadata for call recordings stored in object storage.
-- Actual audio files are in S3/DigitalOcean Spaces.
-- =============================================================================

CREATE TYPE recording_status_enum AS ENUM ('processing', 'ready', 'error', 'deleted');
CREATE TYPE recording_format_enum AS ENUM ('wav', 'mp3', 'ogg', 'webm');

CREATE TABLE recordings (
    -- Primary Key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Relationships
    call_id UUID NOT NULL REFERENCES calls(id),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    
    -- Storage
    storage_provider VARCHAR(50) NOT NULL DEFAULT 'do_spaces',  -- do_spaces, s3, gcs
    storage_bucket VARCHAR(255) NOT NULL,
    storage_key VARCHAR(500) NOT NULL,  -- Path within bucket
    
    -- Access
    public_url VARCHAR(500),          -- If publicly accessible
    signed_url VARCHAR(1000),         -- Pre-signed URL for temporary access
    signed_url_expires_at TIMESTAMPTZ,
    
    -- File Information
    format recording_format_enum NOT NULL DEFAULT 'wav',
    file_size_bytes BIGINT NOT NULL,
    duration_seconds INTEGER NOT NULL,
    sample_rate INTEGER NOT NULL DEFAULT 48000,
    channels INTEGER NOT NULL DEFAULT 2,  -- 1=mono, 2=stereo
    bitrate INTEGER,  -- For compressed formats
    
    -- Status
    status recording_status_enum NOT NULL DEFAULT 'processing',
    
    -- Processing
    processed_at TIMESTAMPTZ,
    processing_error TEXT,
    
    -- Retention
    retention_days INTEGER,           -- NULL = keep forever
    expires_at TIMESTAMPTZ,           -- When recording will be deleted
    
    -- Metadata
    metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
    -- Example:
    -- {
    --   "livekit_egress_id": "...",
    --   "channels": {
    --     "left": "caller",
    --     "right": "agent"
    --   }
    -- }
    
    -- Timestamps
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    deleted_at TIMESTAMPTZ DEFAULT NULL
);

-- Indexes
CREATE INDEX ix_recordings_call_id ON recordings(call_id);
CREATE INDEX ix_recordings_tenant_id ON recordings(tenant_id) WHERE deleted_at IS NULL;
CREATE INDEX ix_recordings_status ON recordings(status) WHERE deleted_at IS NULL;
CREATE INDEX ix_recordings_expires_at ON recordings(expires_at) WHERE expires_at IS NOT NULL;

-- Trigger for updated_at
CREATE TRIGGER update_recordings_updated_at
    BEFORE UPDATE ON recordings
    FOR EACH ROW
    EXECUTE FUNCTION update_updated_at_column();

-- Comments
COMMENT ON TABLE recordings IS 'Metadata for call recordings stored in object storage';
COMMENT ON COLUMN recordings.storage_key IS 'Object key/path within storage bucket';

Section 9: Schema - Configuration Entities

9.1 voice_configurations Table

TTS voice settings for tenants.
-- =============================================================================
-- VOICE_CONFIGURATIONS TABLE
-- =============================================================================
-- Defines voice settings for TTS output.
-- Each tenant can have multiple voice configurations (e.g., different for hours).
-- =============================================================================

CREATE TABLE voice_configurations (
    -- Primary Key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Relationships
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    
    -- Basic Information
    name VARCHAR(255) NOT NULL,        -- "Main Voice", "After Hours Voice"
    is_default BOOLEAN NOT NULL DEFAULT FALSE,
    
    -- Voice Selection
    provider VARCHAR(50) NOT NULL DEFAULT 'chatterbox',  -- chatterbox, elevenlabs, etc.
    voice_id VARCHAR(255) NOT NULL,    -- Provider's voice identifier
    voice_name VARCHAR(255),           -- Human-readable: "Sarah", "James"
    
    -- Voice Parameters
    speaking_rate DECIMAL(3,2) NOT NULL DEFAULT 1.00,  -- 0.50 to 2.00
    pitch DECIMAL(3,2) NOT NULL DEFAULT 1.00,          -- 0.50 to 2.00
    volume DECIMAL(3,2) NOT NULL DEFAULT 1.00,         -- 0.00 to 1.00
    
    -- Language
    language_code VARCHAR(10) NOT NULL DEFAULT 'en-US',
    
    -- Advanced Settings
    settings JSONB NOT NULL DEFAULT '{}'::jsonb,
    -- Example:
    -- {
    --   "stability": 0.75,
    --   "similarity_boost": 0.75,
    --   "style": 0.5,
    --   "use_speaker_boost": true
    -- }
    
    -- Timestamps
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    deleted_at TIMESTAMPTZ DEFAULT NULL
);

-- Indexes
CREATE INDEX ix_voice_configurations_tenant_id ON voice_configurations(tenant_id) WHERE deleted_at IS NULL;
CREATE INDEX ix_voice_configurations_default ON voice_configurations(tenant_id, is_default) WHERE is_default = TRUE AND deleted_at IS NULL;

-- Ensure only one default per tenant
CREATE UNIQUE INDEX uq_voice_configurations_default 
    ON voice_configurations(tenant_id) 
    WHERE is_default = TRUE AND deleted_at IS NULL;

-- Trigger for updated_at
CREATE TRIGGER update_voice_configurations_updated_at
    BEFORE UPDATE ON voice_configurations
    FOR EACH ROW
    EXECUTE FUNCTION update_updated_at_column();

-- Comments
COMMENT ON TABLE voice_configurations IS 'TTS voice settings for tenants';

9.2 agent_personalities Table

AI personality and behavior configuration.
-- =============================================================================
-- AGENT_PERSONALITIES TABLE
-- =============================================================================
-- Defines the AI agent's personality, tone, and behavior.
-- =============================================================================

CREATE TABLE agent_personalities (
    -- Primary Key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Relationships
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    
    -- Basic Information
    name VARCHAR(255) NOT NULL,        -- "Friendly Receptionist", "Professional Agent"
    is_default BOOLEAN NOT NULL DEFAULT FALSE,
    
    -- Agent Identity
    agent_name VARCHAR(100),           -- Name the AI uses: "Hi, I'm Sarah"
    
    -- System Prompt Components
    base_prompt TEXT NOT NULL,         -- Core instructions
    personality_traits TEXT,           -- "friendly, helpful, professional"
    tone_description TEXT,             -- "warm and welcoming"
    
    -- Behavior Rules
    behavior_rules JSONB NOT NULL DEFAULT '[]'::jsonb,
    -- Example:
    -- [
    --   "Always greet callers warmly",
    --   "Never discuss competitor products",
    --   "Transfer to human if caller is upset"
    -- ]
    
    -- Knowledge Instructions
    knowledge_instructions TEXT,       -- How to use knowledge base
    
    -- Response Style
    response_style JSONB NOT NULL DEFAULT '{}'::jsonb,
    -- Example:
    -- {
    --   "max_response_length": "medium",  -- short, medium, long
    --   "formality": "casual",            -- formal, semi-formal, casual
    --   "use_filler_words": false,
    --   "confirm_understanding": true
    -- }
    
    -- Capabilities
    capabilities JSONB NOT NULL DEFAULT '{}'::jsonb,
    -- Example:
    -- {
    --   "can_schedule_appointments": true,
    --   "can_provide_pricing": true,
    --   "can_transfer_calls": true,
    --   "can_take_messages": true
    -- }
    
    -- Escalation Rules
    escalation_rules JSONB NOT NULL DEFAULT '[]'::jsonb,
    -- Example:
    -- [
    --   {"trigger": "angry_customer", "action": "transfer", "target": "manager"},
    --   {"trigger": "legal_question", "action": "transfer", "target": "legal"},
    --   {"trigger": "three_failed_attempts", "action": "human_takeover"}
    -- ]
    
    -- Timestamps
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    deleted_at TIMESTAMPTZ DEFAULT NULL
);

-- Indexes
CREATE INDEX ix_agent_personalities_tenant_id ON agent_personalities(tenant_id) WHERE deleted_at IS NULL;

-- Ensure only one default per tenant
CREATE UNIQUE INDEX uq_agent_personalities_default 
    ON agent_personalities(tenant_id) 
    WHERE is_default = TRUE AND deleted_at IS NULL;

-- Trigger for updated_at
CREATE TRIGGER update_agent_personalities_updated_at
    BEFORE UPDATE ON agent_personalities
    FOR EACH ROW
    EXECUTE FUNCTION update_updated_at_column();

-- Comments
COMMENT ON TABLE agent_personalities IS 'AI agent personality and behavior configuration';

9.3 greetings Table

Pre-configured greeting messages.
-- =============================================================================
-- GREETINGS TABLE
-- =============================================================================
-- Pre-configured greeting messages for different scenarios.
-- =============================================================================

CREATE TYPE greeting_type_enum AS ENUM (
    'initial',          -- First greeting when call is answered
    'return_caller',    -- Greeting for recognized callers
    'after_hours',      -- After business hours greeting
    'holiday',          -- Holiday greeting
    'voicemail',        -- Voicemail greeting
    'hold',             -- Hold message
    'transfer',         -- Transfer announcement
    'goodbye'           -- Ending message
);

CREATE TABLE greetings (
    -- Primary Key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Relationships
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    
    -- Basic Information
    name VARCHAR(255) NOT NULL,
    greeting_type greeting_type_enum NOT NULL,
    
    -- Content
    text_content TEXT NOT NULL,  -- The greeting text
    -- Example: "Thank you for calling {business_name}. How can I help you today?"
    
    -- Placeholders supported:
    -- {business_name} - Tenant's business name
    -- {caller_name} - If recognized
    -- {current_time} - Current time
    -- {agent_name} - AI agent's name
    
    -- Audio (optional pre-recorded)
    audio_url VARCHAR(500),      -- Pre-recorded audio URL
    audio_duration_ms INTEGER,
    
    -- Status
    is_active BOOLEAN NOT NULL DEFAULT TRUE,
    
    -- Scheduling (for holiday greetings)
    active_from TIMESTAMPTZ,
    active_until TIMESTAMPTZ,
    
    -- Timestamps
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    deleted_at TIMESTAMPTZ DEFAULT NULL
);

-- Indexes
CREATE INDEX ix_greetings_tenant_id ON greetings(tenant_id) WHERE deleted_at IS NULL;
CREATE INDEX ix_greetings_type ON greetings(tenant_id, greeting_type) WHERE deleted_at IS NULL AND is_active = TRUE;

-- Trigger for updated_at
CREATE TRIGGER update_greetings_updated_at
    BEFORE UPDATE ON greetings
    FOR EACH ROW
    EXECUTE FUNCTION update_updated_at_column();

-- Comments
COMMENT ON TABLE greetings IS 'Pre-configured greeting messages for various scenarios';

9.4 business_hours Table

Business hours for controlling AI behavior by time.
-- =============================================================================
-- BUSINESS_HOURS TABLE
-- =============================================================================
-- Defines when the business is open.
-- Used to determine appropriate greetings and transfer behavior.
-- =============================================================================

CREATE TABLE business_hours (
    -- Primary Key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Relationships
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    phone_number_id UUID REFERENCES phone_numbers(id),  -- NULL = applies to all numbers
    
    -- Basic Information
    name VARCHAR(255) NOT NULL DEFAULT 'Default Hours',
    
    -- Regular Hours (by day of week)
    -- Stored as JSONB for flexibility
    weekly_hours JSONB NOT NULL DEFAULT '{}'::jsonb,
    -- Example:
    -- {
    --   "monday": [{"open": "09:00", "close": "17:00"}],
    --   "tuesday": [{"open": "09:00", "close": "17:00"}],
    --   "wednesday": [{"open": "09:00", "close": "17:00"}],
    --   "thursday": [{"open": "09:00", "close": "17:00"}],
    --   "friday": [{"open": "09:00", "close": "17:00"}],
    --   "saturday": [{"open": "10:00", "close": "14:00"}],
    --   "sunday": []  -- Closed
    -- }
    -- 
    -- Multiple ranges for split shifts:
    -- "monday": [{"open": "09:00", "close": "12:00"}, {"open": "13:00", "close": "17:00"}]
    
    -- Timezone for interpreting hours
    timezone VARCHAR(50) NOT NULL DEFAULT 'America/New_York',
    
    -- Status
    is_active BOOLEAN NOT NULL DEFAULT TRUE,
    
    -- Timestamps
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    deleted_at TIMESTAMPTZ DEFAULT NULL
);

-- Indexes
CREATE INDEX ix_business_hours_tenant_id ON business_hours(tenant_id) WHERE deleted_at IS NULL;
CREATE INDEX ix_business_hours_phone_number_id ON business_hours(phone_number_id) WHERE deleted_at IS NULL;

-- Trigger for updated_at
CREATE TRIGGER update_business_hours_updated_at
    BEFORE UPDATE ON business_hours
    FOR EACH ROW
    EXECUTE FUNCTION update_updated_at_column();

-- =============================================================================
-- BUSINESS_HOUR_OVERRIDES TABLE
-- =============================================================================
-- Specific date overrides (holidays, special hours).
-- =============================================================================

CREATE TABLE business_hour_overrides (
    -- Primary Key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Relationships
    business_hours_id UUID NOT NULL REFERENCES business_hours(id),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    
    -- Override Information
    override_date DATE NOT NULL,
    name VARCHAR(255),  -- "Christmas", "New Year's Day"
    
    -- Hours for this day (empty array = closed)
    hours JSONB NOT NULL DEFAULT '[]'::jsonb,
    -- Example: [{"open": "10:00", "close": "14:00"}] or [] for closed
    
    -- Timestamps
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    
    -- Constraints
    CONSTRAINT uq_business_hour_overrides_date UNIQUE (business_hours_id, override_date)
);

-- Indexes
CREATE INDEX ix_business_hour_overrides_date ON business_hour_overrides(override_date);
CREATE INDEX ix_business_hour_overrides_business_hours_id ON business_hour_overrides(business_hours_id);

-- Trigger for updated_at
CREATE TRIGGER update_business_hour_overrides_updated_at
    BEFORE UPDATE ON business_hour_overrides
    FOR EACH ROW
    EXECUTE FUNCTION update_updated_at_column();

-- Comments
COMMENT ON TABLE business_hours IS 'Regular business hours by day of week';
COMMENT ON TABLE business_hour_overrides IS 'Date-specific overrides for holidays and special hours';

Section 10: Schema - Billing & Analytics

10.1 usage_records Table

Granular usage tracking for billing.
-- =============================================================================
-- USAGE_RECORDS TABLE
-- =============================================================================
-- Tracks granular usage for billing purposes.
-- One record per billable event.
-- =============================================================================

CREATE TYPE usage_type_enum AS ENUM (
    'call_minutes',      -- Voice call duration
    'stt_minutes',       -- Speech-to-text processing
    'llm_tokens',        -- LLM input/output tokens
    'tts_characters',    -- Text-to-speech characters
    'storage_gb',        -- Recording storage
    'phone_number'       -- Phone number rental
);

CREATE TABLE usage_records (
    -- Primary Key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Relationships
    agency_id UUID NOT NULL REFERENCES agencies(id),
    tenant_id UUID REFERENCES tenants(id),  -- NULL for agency-level charges
    call_id UUID REFERENCES calls(id),      -- NULL for non-call usage
    
    -- Usage Information
    usage_type usage_type_enum NOT NULL,
    quantity DECIMAL(20,6) NOT NULL,  -- Amount of usage
    unit VARCHAR(50) NOT NULL,        -- 'minutes', 'tokens', 'characters', 'gb', 'number'
    
    -- Pricing (at time of usage)
    unit_price_cents DECIMAL(20,6) NOT NULL,  -- Price per unit in cents
    total_cents DECIMAL(20,2) NOT NULL,       -- quantity * unit_price_cents
    
    -- Billing Period
    usage_date DATE NOT NULL,
    billing_period_start DATE NOT NULL,
    billing_period_end DATE NOT NULL,
    
    -- Status
    is_billed BOOLEAN NOT NULL DEFAULT FALSE,
    billed_at TIMESTAMPTZ,
    invoice_id UUID,  -- Reference to invoice when billed
    
    -- Metadata
    metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
    -- Example:
    -- {
    --   "call_id": "...",
    --   "component": "deepgram",
    --   "model": "nova-2"
    -- }
    
    -- Timestamps
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Indexes (optimized for billing queries)
CREATE INDEX ix_usage_records_agency_period ON usage_records(agency_id, billing_period_start, billing_period_end);
CREATE INDEX ix_usage_records_tenant_period ON usage_records(tenant_id, billing_period_start, billing_period_end) WHERE tenant_id IS NOT NULL;
CREATE INDEX ix_usage_records_date ON usage_records(usage_date);
CREATE INDEX ix_usage_records_unbilled ON usage_records(agency_id, is_billed) WHERE is_billed = FALSE;
CREATE INDEX ix_usage_records_call_id ON usage_records(call_id) WHERE call_id IS NOT NULL;
CREATE INDEX ix_usage_records_type ON usage_records(usage_type);

-- Comments
COMMENT ON TABLE usage_records IS 'Granular usage tracking for billing';
COMMENT ON COLUMN usage_records.unit_price_cents IS 'Price per unit at time of usage, stored for historical accuracy';

10.2 billing_events Table

Billing-related events (invoices, payments, etc.).
-- =============================================================================
-- BILLING_EVENTS TABLE
-- =============================================================================
-- Tracks billing lifecycle events.
-- =============================================================================

CREATE TYPE billing_event_type_enum AS ENUM (
    'invoice_created',
    'invoice_sent',
    'payment_initiated',
    'payment_succeeded',
    'payment_failed',
    'refund_issued',
    'credit_applied',
    'subscription_started',
    'subscription_changed',
    'subscription_cancelled'
);

CREATE TABLE billing_events (
    -- Primary Key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Relationships
    agency_id UUID NOT NULL REFERENCES agencies(id),
    
    -- Event Information
    event_type billing_event_type_enum NOT NULL,
    
    -- Amounts
    amount_cents INTEGER,
    currency VARCHAR(3) DEFAULT 'USD',
    
    -- External References
    stripe_event_id VARCHAR(255),
    stripe_invoice_id VARCHAR(255),
    stripe_payment_intent_id VARCHAR(255),
    
    -- Event Data
    data JSONB NOT NULL DEFAULT '{}'::jsonb,
    
    -- Timestamp
    occurred_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Indexes
CREATE INDEX ix_billing_events_agency_id ON billing_events(agency_id);
CREATE INDEX ix_billing_events_type ON billing_events(event_type);
CREATE INDEX ix_billing_events_occurred_at ON billing_events(occurred_at);
CREATE INDEX ix_billing_events_stripe_event ON billing_events(stripe_event_id) WHERE stripe_event_id IS NOT NULL;

-- Comments
COMMENT ON TABLE billing_events IS 'Audit log of billing-related events';

10.3 call_analytics Table

Pre-aggregated analytics for dashboards.
-- =============================================================================
-- CALL_ANALYTICS TABLE
-- =============================================================================
-- Pre-aggregated metrics for fast dashboard queries.
-- Populated by background jobs.
-- =============================================================================

CREATE TABLE call_analytics (
    -- Primary Key (composite)
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    date DATE NOT NULL,
    hour INTEGER NOT NULL,  -- 0-23
    
    PRIMARY KEY (tenant_id, date, hour),
    
    -- Call Counts
    total_calls INTEGER NOT NULL DEFAULT 0,
    answered_calls INTEGER NOT NULL DEFAULT 0,
    missed_calls INTEGER NOT NULL DEFAULT 0,
    failed_calls INTEGER NOT NULL DEFAULT 0,
    
    -- Direction Breakdown
    inbound_calls INTEGER NOT NULL DEFAULT 0,
    outbound_calls INTEGER NOT NULL DEFAULT 0,
    
    -- Duration Metrics
    total_duration_seconds INTEGER NOT NULL DEFAULT 0,
    avg_duration_seconds INTEGER NOT NULL DEFAULT 0,
    max_duration_seconds INTEGER NOT NULL DEFAULT 0,
    
    -- Wait Time Metrics
    total_wait_seconds INTEGER NOT NULL DEFAULT 0,
    avg_wait_seconds INTEGER NOT NULL DEFAULT 0,
    
    -- Response Time Metrics (AI latency)
    avg_first_response_ms INTEGER,
    avg_response_latency_ms INTEGER,
    
    -- Outcome Breakdown
    outcomes JSONB NOT NULL DEFAULT '{}'::jsonb,
    -- Example: {"appointment_scheduled": 5, "question_answered": 10, "transferred": 2}
    
    -- Sentiment Summary
    positive_calls INTEGER NOT NULL DEFAULT 0,
    neutral_calls INTEGER NOT NULL DEFAULT 0,
    negative_calls INTEGER NOT NULL DEFAULT 0,
    
    -- Cost
    total_cost_cents INTEGER NOT NULL DEFAULT 0,
    
    -- Transfer Metrics
    transfer_count INTEGER NOT NULL DEFAULT 0,
    successful_transfers INTEGER NOT NULL DEFAULT 0,
    
    -- Timestamps
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Indexes (optimized for dashboard queries)
CREATE INDEX ix_call_analytics_tenant_date ON call_analytics(tenant_id, date DESC);
CREATE INDEX ix_call_analytics_date ON call_analytics(date DESC);

-- Trigger for updated_at
CREATE TRIGGER update_call_analytics_updated_at
    BEFORE UPDATE ON call_analytics
    FOR EACH ROW
    EXECUTE FUNCTION update_updated_at_column();

-- Comments
COMMENT ON TABLE call_analytics IS 'Pre-aggregated hourly call metrics for dashboards';

Section 11: Indexes & Performance

11.1 Required Indexes (With Explanations)

All indexes are defined inline with table definitions above. Here’s a summary of indexing strategy:

Primary Access Patterns

1. List calls for a tenant (most common)
CREATE INDEX ix_calls_tenant_id_created_at ON calls(tenant_id, created_at DESC);
Covers: SELECT * FROM calls WHERE tenant_id = $1 ORDER BY created_at DESC LIMIT 50 2. Find active records (soft delete filter)
-- Pattern: WHERE deleted_at IS NULL
CREATE INDEX ix_tenants_agency_id ON tenants(agency_id) WHERE deleted_at IS NULL;
Partial index only includes non-deleted records. 3. Status filtering
CREATE INDEX ix_calls_tenant_id_status ON calls(tenant_id, status);
Covers: SELECT * FROM calls WHERE tenant_id = $1 AND status = 'answered' 4. Time-range queries
CREATE INDEX ix_calls_initiated_at ON calls(initiated_at);
CREATE INDEX ix_call_analytics_tenant_date ON call_analytics(tenant_id, date DESC);
Covers date range analytics queries. 5. Vector similarity search
CREATE INDEX ix_knowledge_chunks_embedding ON knowledge_chunks 
    USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);
Approximate nearest neighbor for RAG queries. 6. Full-text search
CREATE INDEX ix_transcripts_full_text ON transcripts 
    USING gin(to_tsvector('english', full_text));
Covers transcript search: WHERE to_tsvector('english', full_text) @@ to_tsquery('appointment')

Index Maintenance

-- Check index usage
SELECT 
    schemaname,
    tablename,
    indexname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

-- Find unused indexes
SELECT 
    schemaname || '.' || relname AS table,
    indexrelname AS index,
    pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
    idx_scan as index_scans
FROM pg_stat_user_indexes ui
JOIN pg_index i ON ui.indexrelid = i.indexrelid
WHERE NOT indisunique
  AND idx_scan < 50
  AND pg_relation_size(relid) > 5 * 8192
ORDER BY pg_relation_size(i.indexrelid) DESC;

-- Reindex for performance (run during low-traffic periods)
REINDEX INDEX CONCURRENTLY ix_calls_tenant_id_created_at;

11.2 Partitioning Strategy

For tables that grow very large, we use table partitioning.

Calls Table Partitioning (Future)

When the calls table exceeds ~10 million rows, partition by month:
-- Create partitioned table
CREATE TABLE calls_partitioned (
    LIKE calls INCLUDING ALL
) PARTITION BY RANGE (initiated_at);

-- Create monthly partitions
CREATE TABLE calls_y2026m01 PARTITION OF calls_partitioned
    FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');

CREATE TABLE calls_y2026m02 PARTITION OF calls_partitioned
    FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

-- Automate partition creation with pg_partman extension

Benefits of Partitioning

  1. Query Performance: Queries filtering by date only scan relevant partitions
  2. Maintenance: Can vacuum/reindex individual partitions
  3. Data Retention: Can drop old partitions instead of DELETE
  4. Parallel Query: PostgreSQL can scan partitions in parallel

When to Partition

  • calls: When &gt; 10M rows
  • call_events: When &gt; 50M rows
  • transcript_turns: When &gt; 100M rows
  • usage_records: When &gt; 50M rows

11.3 Query Patterns to Optimize For

Pattern 1: Tenant Call List

Query:
SELECT 
    c.*,
    pn.number as phone_number,
    pn.friendly_name
FROM calls c
JOIN phone_numbers pn ON c.phone_number_id = pn.id
WHERE c.tenant_id = $1
ORDER BY c.initiated_at DESC
LIMIT 50 OFFSET 0;
Optimization:
  • Index: ix_calls_tenant_id_created_at
  • Limit result set size
  • Consider cursor-based pagination for large offsets

Pattern 2: Analytics Dashboard

Query:
SELECT 
    date,
    SUM(total_calls) as total_calls,
    SUM(answered_calls) as answered_calls,
    AVG(avg_duration_seconds) as avg_duration
FROM call_analytics
WHERE tenant_id = $1
  AND date BETWEEN $2 AND $3
GROUP BY date
ORDER BY date;
Optimization:
  • Pre-aggregated table (call_analytics)
  • Index: ix_call_analytics_tenant_date
Query:
SELECT 
    kc.content,
    kd.name as document_name,
    1 - (kc.embedding <=> $1) as similarity
FROM knowledge_chunks kc
JOIN knowledge_documents kd ON kc.document_id = kd.id
WHERE kc.tenant_id = $2
  AND kd.deleted_at IS NULL
ORDER BY kc.embedding <=> $1
LIMIT 5;
Optimization:
  • IVFFlat index on embeddings
  • Filter by tenant_id first (reduces vector search scope)
Query:
SELECT 
    t.id,
    t.full_text,
    c.initiated_at,
    ts_rank(to_tsvector('english', t.full_text), query) as rank
FROM transcripts t
JOIN calls c ON t.call_id = c.id,
     to_tsquery('english', $1) query
WHERE t.tenant_id = $2
  AND to_tsvector('english', t.full_text) @@ query
ORDER BY rank DESC
LIMIT 20;
Optimization:
  • GIN index on tsvector
  • Tenant filter with FTS

Section 12: Migrations

12.1 Migration File Naming Convention

We use Alembic for database migrations. Migration files follow this naming pattern:
{timestamp}_{description}.py
Examples:
  • 20260125_1000_initial_schema.py
  • 20260125_1100_add_phone_numbers.py
  • 20260126_0900_add_call_sentiment.py

Migration File Structure

"""Add phone_numbers table

Revision ID: 20260125_1100
Revises: 20260125_1000
Create Date: 2026-01-25 11:00:00.000000
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql

# revision identifiers
revision = '20260125_1100'
down_revision = '20260125_1000'
branch_labels = None
depends_on = None


def upgrade() -> None:
    """Apply migration."""
    op.create_table(
        'phone_numbers',
        sa.Column('id', postgresql.UUID(), server_default=sa.text('gen_random_uuid()'), nullable=False),
        # ... columns
        sa.PrimaryKeyConstraint('id', name='pk_phone_numbers')
    )
    op.create_index('ix_phone_numbers_tenant_id', 'phone_numbers', ['tenant_id'])


def downgrade() -> None:
    """Reverse migration."""
    op.drop_index('ix_phone_numbers_tenant_id')
    op.drop_table('phone_numbers')

12.2 Initial Migration Script

The initial migration creates all tables defined in this document. Here’s the structure:
"""Initial schema

Revision ID: 20260125_1000
Revises: 
Create Date: 2026-01-25 10:00:00.000000
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql

revision = '20260125_1000'
down_revision = None
branch_labels = None
depends_on = None


def upgrade() -> None:
    # Enable extensions
    op.execute('CREATE EXTENSION IF NOT EXISTS "uuid-ossp"')
    op.execute('CREATE EXTENSION IF NOT EXISTS "vector"')
    
    # Create ENUM types
    op.execute("""
        CREATE TYPE call_direction_enum AS ENUM ('inbound', 'outbound');
        CREATE TYPE call_status_enum AS ENUM (
            'pending', 'ringing', 'answered', 'completed', 
            'failed', 'cancelled', 'transferred', 'voicemail'
        );
        -- ... other enums
    """)
    
    # Create updated_at trigger function
    op.execute("""
        CREATE OR REPLACE FUNCTION update_updated_at_column()
        RETURNS TRIGGER AS $$
        BEGIN
            NEW.updated_at = NOW();
            RETURN NEW;
        END;
        $$ LANGUAGE plpgsql;
    """)
    
    # Create tables in dependency order
    # 1. agencies (no dependencies)
    # 2. tenants (depends on agencies)
    # 3. users (depends on agencies, tenants)
    # 4. phone_numbers (depends on tenants)
    # 5. calls (depends on tenants, phone_numbers)
    # ... etc
    
    # Create agencies table
    op.create_table('agencies', ...)
    
    # Create tenants table
    op.create_table('tenants', ...)
    
    # ... continue for all tables
    
    # Create triggers
    op.execute("""
        CREATE TRIGGER update_agencies_updated_at
            BEFORE UPDATE ON agencies
            FOR EACH ROW
            EXECUTE FUNCTION update_updated_at_column();
    """)
    # ... triggers for all tables


def downgrade() -> None:
    # Drop in reverse order
    op.drop_table('call_analytics')
    op.drop_table('usage_records')
    # ... all tables
    
    # Drop ENUM types
    op.execute('DROP TYPE IF EXISTS call_status_enum')
    op.execute('DROP TYPE IF EXISTS call_direction_enum')
    # ... all enums
    
    # Drop function
    op.execute('DROP FUNCTION IF EXISTS update_updated_at_column')
    
    # Drop extensions
    op.execute('DROP EXTENSION IF EXISTS vector')

12.3 How to Add New Migrations

Step 1: Generate Migration File

cd services/api
alembic revision -m "add_sentiment_to_calls"
This creates a new file in migrations/versions/.

Step 2: Edit the Migration

"""Add sentiment columns to calls

Revision ID: 20260130_1500
Revises: 20260125_1000
Create Date: 2026-01-30 15:00:00.000000
"""
from alembic import op
import sqlalchemy as sa

revision = '20260130_1500'
down_revision = '20260125_1000'


def upgrade() -> None:
    op.add_column('calls', 
        sa.Column('sentiment_score', sa.Numeric(3, 2), nullable=True)
    )
    op.add_column('calls',
        sa.Column('sentiment_label', sa.String(50), nullable=True)
    )


def downgrade() -> None:
    op.drop_column('calls', 'sentiment_label')
    op.drop_column('calls', 'sentiment_score')

Step 3: Test Migration Locally

# Apply migration
alembic upgrade head

# Verify
psql $DATABASE_URL -c "\d calls"

# Test rollback
alembic downgrade -1

# Re-apply
alembic upgrade head

Step 4: Commit Migration

git add migrations/versions/20260130_1500_add_sentiment_to_calls.py
git commit -m "Add sentiment columns to calls table"

Migration Best Practices

  1. Always test rollback - Every upgrade() must have a working downgrade()
  2. Avoid data loss - Don’t drop columns without migrating data first
  3. Use transactions - Alembic wraps migrations in transactions by default
  4. Handle large tables carefully - Adding indexes to large tables can lock them:
# Use CONCURRENTLY for large tables
op.execute('CREATE INDEX CONCURRENTLY ix_calls_new ON calls(new_column)')
  1. Don’t modify old migrations - Once deployed, migrations are immutable
  2. Test with production-like data - A migration that works on empty tables might fail on real data

End of Part 2

You now have:
  1. ✅ Complete understanding of database architecture decisions
  2. ✅ Full DDL for all 25+ tables
  3. ✅ Comprehensive column documentation
  4. ✅ Index strategy with explanations
  5. ✅ Migration workflow
Next: Part 3 - API Design Part 3 will cover:
  • REST API architecture
  • Authentication and authorization
  • Complete endpoint specifications
  • Request/response schemas
  • Error handling

Document End - Part 2 of 10

Junior Developer PRD - Part 3: API Design

Document Version: 1.0
Last Updated: January 25, 2026
Part: 3 of 10
Sections: 13-22
Audience: Junior developers with no prior context

Section 13: REST API Architecture

13.1 What is REST (Quick Refresher)

REST (Representational State Transfer) is an architectural style for designing web APIs. Our API follows REST principles: 1. Resources are nouns, not verbs
  • Good: /api/v1/calls (noun)
  • Bad: /api/v1/getCalls (verb)
2. HTTP methods indicate actions
  • GET - Read (retrieve data)
  • POST - Create (new resource)
  • PUT - Update (replace entire resource)
  • PATCH - Partial Update (modify specific fields)
  • DELETE - Delete (remove resource)
3. URLs represent resource hierarchy
  • /api/v1/tenants/{tenant_id}/calls - Calls belonging to a tenant
  • /api/v1/agencies/{agency_id}/tenants - Tenants belonging to an agency
4. Stateless requests
  • Each request contains all information needed
  • Server doesn’t store client session state
  • Authentication token sent with every request
5. Standard HTTP status codes
  • 2xx - Success
  • 4xx - Client error (your fault)
  • 5xx - Server error (our fault)

13.2 API URL Structure

Base URL

Production:  https://api.voice.aiconnected.com
Staging:     https://api.staging.voice.aiconnected.com
Development: http://localhost:8000

URL Pattern

{base_url}/api/v{version}/{resource}/{id}/{sub-resource}/{sub-id}
Examples:
GET  /api/v1/agencies                      # List all agencies
GET  /api/v1/agencies/abc123               # Get specific agency
GET  /api/v1/agencies/abc123/tenants       # List tenants for agency
POST /api/v1/agencies/abc123/tenants       # Create tenant under agency
GET  /api/v1/tenants/xyz789/calls          # List calls for tenant
GET  /api/v1/calls/call123                 # Get specific call
GET  /api/v1/calls/call123/transcript      # Get call's transcript

Versioning Strategy

We use URL path versioning (/api/v1/, /api/v2/). Why URL versioning:
  • Explicit and visible
  • Easy to route at load balancer
  • Clear which version client is using
  • Can run multiple versions simultaneously
Version lifecycle:
  • v1 - Current stable version
  • v2 - Next version (when breaking changes needed)
  • Old versions deprecated with 6-month warning
  • Deprecated versions return Warning header

13.3 Request Format

Headers

Every request must include:
Content-Type: application/json
Accept: application/json
Authorization: Bearer {jwt_token}
Optional headers:
X-Request-ID: {uuid}           # For request tracing (generated if not provided)
X-Tenant-ID: {tenant_id}       # Context override (for agency users)
Accept-Language: en-US         # For localized messages

Request Body (POST/PUT/PATCH)

Always JSON:
{
  "name": "Smile Dental",
  "contact_email": "info@smiledental.com",
  "timezone": "America/New_York"
}

Query Parameters (GET)

For filtering, pagination, and sorting:
GET /api/v1/calls?status=completed&limit=50&offset=0&sort=-created_at

13.4 Response Format

Successful Response (Single Resource)

{
  "data": {
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "type": "tenant",
    "attributes": {
      "name": "Smile Dental",
      "slug": "smile-dental",
      "status": "active",
      "created_at": "2026-01-25T10:00:00Z",
      "updated_at": "2026-01-25T10:00:00Z"
    },
    "relationships": {
      "agency": {
        "id": "agency-uuid",
        "type": "agency"
      }
    }
  },
  "meta": {
    "request_id": "req-12345"
  }
}

Successful Response (Collection)

{
  "data": [
    {
      "id": "call-1",
      "type": "call",
      "attributes": { ... }
    },
    {
      "id": "call-2",
      "type": "call",
      "attributes": { ... }
    }
  ],
  "meta": {
    "request_id": "req-12345",
    "pagination": {
      "total": 1250,
      "limit": 50,
      "offset": 0,
      "has_more": true
    }
  },
  "links": {
    "self": "/api/v1/calls?limit=50&offset=0",
    "next": "/api/v1/calls?limit=50&offset=50",
    "prev": null,
    "first": "/api/v1/calls?limit=50&offset=0",
    "last": "/api/v1/calls?limit=50&offset=1200"
  }
}

Error Response

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Request validation failed",
    "details": [
      {
        "field": "contact_email",
        "message": "Invalid email format",
        "code": "INVALID_FORMAT"
      },
      {
        "field": "name",
        "message": "Name is required",
        "code": "REQUIRED"
      }
    ]
  },
  "meta": {
    "request_id": "req-12345"
  }
}

13.5 HTTP Status Codes

Success Codes (2xx)

CodeNameWhen to Use
200OKSuccessful GET, PUT, PATCH
201CreatedSuccessful POST (resource created)
202AcceptedRequest accepted for async processing
204No ContentSuccessful DELETE (no body returned)

Client Error Codes (4xx)

CodeNameWhen to Use
400Bad RequestInvalid JSON, missing required fields
401UnauthorizedMissing or invalid authentication token
403ForbiddenValid token but insufficient permissions
404Not FoundResource doesn’t exist
409ConflictResource already exists, state conflict
422Unprocessable EntityValid JSON but business logic error
429Too Many RequestsRate limit exceeded

Server Error Codes (5xx)

CodeNameWhen to Use
500Internal Server ErrorUnexpected server error
502Bad GatewayUpstream service error
503Service UnavailableServer overloaded or maintenance
504Gateway TimeoutUpstream service timeout

13.6 Error Code Reference

Standardized error codes for programmatic handling:

Authentication Errors (AUTH_*)

CodeHTTP StatusDescription
AUTH_TOKEN_MISSING401No Authorization header
AUTH_TOKEN_INVALID401Malformed or expired token
AUTH_TOKEN_EXPIRED401Token has expired
AUTH_REFRESH_REQUIRED401Access token expired, use refresh token
AUTH_INVALID_CREDENTIALS401Wrong email or password
AUTH_ACCOUNT_LOCKED403Account locked due to failed attempts
AUTH_ACCOUNT_SUSPENDED403Account has been suspended
AUTH_EMAIL_NOT_VERIFIED403Email verification required

Authorization Errors (AUTHZ_*)

CodeHTTP StatusDescription
AUTHZ_PERMISSION_DENIED403Lacks required permission
AUTHZ_RESOURCE_ACCESS_DENIED403Can’t access this specific resource
AUTHZ_ROLE_REQUIRED403Specific role required
AUTHZ_TENANT_MISMATCH403Resource belongs to different tenant
AUTHZ_AGENCY_MISMATCH403Resource belongs to different agency

Validation Errors (VAL_*)

CodeHTTP StatusDescription
VAL_REQUIRED400Required field missing
VAL_INVALID_FORMAT400Field format invalid
VAL_INVALID_TYPE400Wrong data type
VAL_OUT_OF_RANGE400Value outside allowed range
VAL_TOO_LONG400String exceeds max length
VAL_TOO_SHORT400String below min length
VAL_INVALID_ENUM400Value not in allowed set
VAL_INVALID_EMAIL400Invalid email format
VAL_INVALID_PHONE400Invalid phone number format
VAL_INVALID_URL400Invalid URL format
VAL_INVALID_UUID400Invalid UUID format

Resource Errors (RES_*)

CodeHTTP StatusDescription
RES_NOT_FOUND404Resource doesn’t exist
RES_ALREADY_EXISTS409Resource already exists
RES_CONFLICT409State conflict
RES_DELETED410Resource was deleted
RES_LOCKED423Resource is locked

Business Logic Errors (BIZ_*)

CodeHTTP StatusDescription
BIZ_QUOTA_EXCEEDED422Quota limit reached
BIZ_SUBSCRIPTION_REQUIRED422Feature requires subscription
BIZ_INVALID_STATE422Invalid state transition
BIZ_DEPENDENCY_EXISTS422Can’t delete, has dependencies
BIZ_OPERATION_FAILED422Business operation failed

Rate Limiting Errors (RATE_*)

CodeHTTP StatusDescription
RATE_LIMIT_EXCEEDED429Too many requests
RATE_LIMIT_MINUTE429Per-minute limit exceeded
RATE_LIMIT_HOUR429Per-hour limit exceeded
RATE_LIMIT_DAY429Per-day limit exceeded

Server Errors (SRV_*)

CodeHTTP StatusDescription
SRV_INTERNAL_ERROR500Unexpected server error
SRV_DATABASE_ERROR500Database operation failed
SRV_EXTERNAL_SERVICE502External service error
SRV_TIMEOUT504Operation timed out
SRV_MAINTENANCE503Server in maintenance mode

Section 14: Authentication

14.1 Authentication Flow Overview

We use JWT (JSON Web Tokens) for authentication.
┌─────────────┐                              ┌─────────────┐
│   Client    │                              │   Server    │
└──────┬──────┘                              └──────┬──────┘
       │                                            │
       │  1. POST /api/v1/auth/login               │
       │     {email, password}                     │
       │ ─────────────────────────────────────────>│
       │                                            │
       │  2. 200 OK                                 │
       │     {access_token, refresh_token}         │
       │ <─────────────────────────────────────────│
       │                                            │
       │  3. GET /api/v1/tenants                   │
       │     Authorization: Bearer {access_token}  │
       │ ─────────────────────────────────────────>│
       │                                            │
       │  4. 200 OK                                 │
       │     {data: [...]}                         │
       │ <─────────────────────────────────────────│
       │                                            │
       │  ... access_token expires ...              │
       │                                            │
       │  5. POST /api/v1/auth/refresh             │
       │     {refresh_token}                       │
       │ ─────────────────────────────────────────>│
       │                                            │
       │  6. 200 OK                                 │
       │     {access_token, refresh_token}         │
       │ <─────────────────────────────────────────│
       │                                            │

14.2 JWT Token Structure

Access Token

Header:
{
  "alg": "HS256",
  "typ": "JWT"
}
Payload:
{
  "sub": "user-uuid-12345",
  "email": "user@example.com",
  "role": "agency_admin",
  "agency_id": "agency-uuid",
  "tenant_id": null,
  "permissions": ["calls.view", "calls.listen", "tenants.view", "tenants.create"],
  "iat": 1706180400,
  "exp": 1706266800,
  "jti": "token-unique-id"
}
Payload Fields:
FieldDescription
subSubject - User ID
emailUser’s email address
roleUser’s role (platform_admin, agency_admin, etc.)
agency_idAgency the user belongs to (null for platform admins)
tenant_idTenant the user belongs to (null for agency users)
permissionsArray of permission codes
iatIssued at (Unix timestamp)
expExpiration time (Unix timestamp)
jtiJWT ID - unique identifier for this token
Token Lifetime:
  • Access token: 24 hours
  • Refresh token: 30 days

Refresh Token

Refresh tokens are opaque strings stored in the database:
{
  "token": "rt_a1b2c3d4e5f6...",
  "user_id": "user-uuid",
  "expires_at": "2026-02-25T10:00:00Z",
  "created_at": "2026-01-25T10:00:00Z",
  "revoked": false
}

14.3 Authentication Endpoints

POST /api/v1/auth/login

Authenticate user and receive tokens. Request:
{
  "email": "user@example.com",
  "password": "secure_password_123"
}
Success Response (200):
{
  "data": {
    "access_token": "eyJhbGciOiJIUzI1NiIs...",
    "refresh_token": "rt_a1b2c3d4e5f6...",
    "token_type": "Bearer",
    "expires_in": 86400,
    "user": {
      "id": "user-uuid",
      "email": "user@example.com",
      "first_name": "John",
      "last_name": "Doe",
      "role": "agency_admin",
      "agency_id": "agency-uuid",
      "tenant_id": null
    }
  }
}
Error Responses: 401 - Invalid credentials:
{
  "error": {
    "code": "AUTH_INVALID_CREDENTIALS",
    "message": "Invalid email or password"
  }
}
403 - Account locked:
{
  "error": {
    "code": "AUTH_ACCOUNT_LOCKED",
    "message": "Account locked due to too many failed attempts. Try again in 30 minutes.",
    "details": {
      "locked_until": "2026-01-25T11:00:00Z"
    }
  }
}

POST /api/v1/auth/refresh

Get new access token using refresh token. Request:
{
  "refresh_token": "rt_a1b2c3d4e5f6..."
}
Success Response (200):
{
  "data": {
    "access_token": "eyJhbGciOiJIUzI1NiIs...",
    "refresh_token": "rt_new_token...",
    "token_type": "Bearer",
    "expires_in": 86400
  }
}
Note: Refresh token rotation - old refresh token is invalidated, new one issued.

POST /api/v1/auth/logout

Revoke refresh token. Request:
{
  "refresh_token": "rt_a1b2c3d4e5f6..."
}
Success Response (204): No content

POST /api/v1/auth/password/forgot

Request password reset. Request:
{
  "email": "user@example.com"
}
Success Response (202):
{
  "data": {
    "message": "If an account exists, a password reset email has been sent."
  }
}
Note: Always return 202 even if email doesn’t exist (security).

POST /api/v1/auth/password/reset

Reset password with token. Request:
{
  "token": "reset-token-from-email",
  "password": "new_secure_password_456",
  "password_confirmation": "new_secure_password_456"
}
Success Response (200):
{
  "data": {
    "message": "Password has been reset successfully."
  }
}

POST /api/v1/auth/email/verify

Verify email address. Request:
{
  "token": "verification-token-from-email"
}
Success Response (200):
{
  "data": {
    "message": "Email verified successfully."
  }
}

GET /api/v1/auth/me

Get current user information. Request Headers:
Authorization: Bearer {access_token}
Success Response (200):
{
  "data": {
    "id": "user-uuid",
    "type": "user",
    "attributes": {
      "email": "user@example.com",
      "first_name": "John",
      "last_name": "Doe",
      "role": "agency_admin",
      "permissions": ["calls.view", "calls.listen", ...],
      "agency": {
        "id": "agency-uuid",
        "name": "Oxford Pierpont"
      },
      "tenant": null,
      "last_login_at": "2026-01-25T09:00:00Z"
    }
  }
}

14.4 Token Validation Implementation

# Pseudocode for token validation middleware

def validate_token(request):
    # 1. Extract token from header
    auth_header = request.headers.get('Authorization')
    if not auth_header:
        raise AuthError('AUTH_TOKEN_MISSING', 'Authorization header required')
    
    if not auth_header.startswith('Bearer '):
        raise AuthError('AUTH_TOKEN_INVALID', 'Invalid authorization format')
    
    token = auth_header[7:]  # Remove 'Bearer ' prefix
    
    # 2. Decode and verify JWT
    try:
        payload = jwt.decode(
            token,
            SECRET_KEY,
            algorithms=['HS256']
        )
    except jwt.ExpiredSignatureError:
        raise AuthError('AUTH_TOKEN_EXPIRED', 'Token has expired')
    except jwt.InvalidTokenError:
        raise AuthError('AUTH_TOKEN_INVALID', 'Invalid token')
    
    # 3. Load user from database (for current state)
    user = db.get_user(payload['sub'])
    if not user:
        raise AuthError('AUTH_TOKEN_INVALID', 'User not found')
    
    if user.status == 'suspended':
        raise AuthError('AUTH_ACCOUNT_SUSPENDED', 'Account suspended')
    
    # 4. Attach user to request context
    request.user = user
    request.permissions = payload['permissions']
    
    return True

Section 15: Authorization (RBAC)

15.1 Role-Based Access Control Overview

Authorization determines what an authenticated user can do.

Role Hierarchy

Platform Admin

    └── Agency Admin

            ├── Agency User

            └── Tenant Admin

                    └── Tenant User

Scope Isolation

┌─────────────────────────────────────────────────────────────────┐
│                        PLATFORM                                  │
│                                                                  │
│   ┌────────────────────┐    ┌────────────────────┐              │
│   │     AGENCY A       │    │     AGENCY B       │              │
│   │                    │    │                    │              │
│   │  ┌──────────────┐  │    │  ┌──────────────┐  │              │
│   │  │  Tenant A1   │  │    │  │  Tenant B1   │  │              │
│   │  └──────────────┘  │    │  └──────────────┘  │              │
│   │  ┌──────────────┐  │    │  ┌──────────────┐  │              │
│   │  │  Tenant A2   │  │    │  │  Tenant B2   │  │              │
│   │  └──────────────┘  │    │  └──────────────┘  │              │
│   │                    │    │                    │              │
│   └────────────────────┘    └────────────────────┘              │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

- Platform Admin: Can access EVERYTHING
- Agency A Admin: Can access Agency A + Tenants A1, A2
- Tenant A1 Admin: Can ONLY access Tenant A1
- Agency B users CANNOT access Agency A data (and vice versa)

15.2 Permission Matrix

What Each Role Can Do

PermissionPlatform AdminAgency AdminAgency UserTenant AdminTenant User
Agencies
agencies.viewOwn only
agencies.create
agencies.editOwn only
agencies.delete
Tenants
tenants.viewOwn onlyOwn only
tenants.create
tenants.editOwn only
tenants.delete
Phone Numbers
phone_numbers.view
phone_numbers.provision
phone_numbers.configure
phone_numbers.release
Calls
calls.view
calls.listen
calls.export
calls.delete
Knowledge Base
knowledge.view
knowledge.create
knowledge.edit
knowledge.delete
Analytics
analytics.view
analytics.export
analytics.advanced
Users
users.viewOwn onlyOwn only
users.create
users.editOwn onlyOwn only
users.delete
Settings
settings.view
settings.edit
Billing
billing.view
billing.manage

15.3 Authorization Implementation

Permission Check Decorator

# Python decorator for endpoint authorization

from functools import wraps

def require_permission(permission: str):
    """Decorator that checks if user has required permission."""
    def decorator(func):
        @wraps(func)
        async def wrapper(request, *args, **kwargs):
            if permission not in request.permissions:
                raise AuthzError(
                    code='AUTHZ_PERMISSION_DENIED',
                    message=f'Permission required: {permission}'
                )
            return await func(request, *args, **kwargs)
        return wrapper
    return decorator


# Usage:
@router.get('/api/v1/tenants/{tenant_id}/calls')
@require_permission('calls.view')
async def list_calls(request, tenant_id: str):
    # Permission already verified
    ...

Resource Access Check

# Check if user can access a specific resource

def can_access_tenant(user, tenant_id: str) -> bool:
    """Check if user can access a specific tenant."""
    
    # Platform admins can access everything
    if user.role == 'platform_admin':
        return True
    
    # Tenant users can only access their own tenant
    if user.tenant_id:
        return user.tenant_id == tenant_id
    
    # Agency users can access tenants in their agency
    if user.agency_id:
        tenant = db.get_tenant(tenant_id)
        return tenant and tenant.agency_id == user.agency_id
    
    return False


def can_access_call(user, call_id: str) -> bool:
    """Check if user can access a specific call."""
    
    # Platform admins can access everything
    if user.role == 'platform_admin':
        return True
    
    call = db.get_call(call_id)
    if not call:
        return False
    
    # Tenant users check tenant match
    if user.tenant_id:
        return call.tenant_id == user.tenant_id
    
    # Agency users check agency via tenant
    if user.agency_id:
        tenant = db.get_tenant(call.tenant_id)
        return tenant and tenant.agency_id == user.agency_id
    
    return False

Query Filtering by Scope

# Automatically filter queries by user's scope

def build_tenant_filter(user) -> dict:
    """Build query filter based on user's scope."""
    
    if user.role == 'platform_admin':
        return {}  # No filter - can see all
    
    if user.tenant_id:
        return {'tenant_id': user.tenant_id}
    
    if user.agency_id:
        # Get all tenant IDs for this agency
        tenant_ids = db.get_tenant_ids_for_agency(user.agency_id)
        return {'tenant_id': {'$in': tenant_ids}}
    
    # Should never reach here
    raise AuthzError('AUTHZ_PERMISSION_DENIED', 'Invalid user scope')


# Usage in endpoint:
@router.get('/api/v1/calls')
@require_permission('calls.view')
async def list_calls(request, limit: int = 50, offset: int = 0):
    filters = build_tenant_filter(request.user)
    calls = db.query_calls(filters, limit=limit, offset=offset)
    return {'data': calls}

Section 16: Pagination, Filtering & Sorting

16.1 Pagination

Offset-Based Pagination

For most list endpoints, we use offset-based pagination: Request:
GET /api/v1/calls?limit=50&offset=100
Query Parameters:
  • limit - Number of items per page (default: 50, max: 100)
  • offset - Number of items to skip (default: 0)
Response:
{
  "data": [...],
  "meta": {
    "pagination": {
      "total": 1250,
      "limit": 50,
      "offset": 100,
      "has_more": true,
      "page": 3,
      "total_pages": 25
    }
  },
  "links": {
    "self": "/api/v1/calls?limit=50&offset=100",
    "next": "/api/v1/calls?limit=50&offset=150",
    "prev": "/api/v1/calls?limit=50&offset=50",
    "first": "/api/v1/calls?limit=50&offset=0",
    "last": "/api/v1/calls?limit=50&offset=1200"
  }
}

Cursor-Based Pagination (For Large Datasets)

For real-time or very large datasets, use cursor-based pagination: Request:
GET /api/v1/call-events?limit=100&cursor=eyJpZCI6IjEyMzQ1In0=
Query Parameters:
  • limit - Number of items per page
  • cursor - Opaque cursor from previous response
Response:
{
  "data": [...],
  "meta": {
    "pagination": {
      "limit": 100,
      "has_more": true,
      "next_cursor": "eyJpZCI6IjY3ODkwIn0=",
      "prev_cursor": "eyJpZCI6IjEyMzQ1In0="
    }
  },
  "links": {
    "next": "/api/v1/call-events?limit=100&cursor=eyJpZCI6IjY3ODkwIn0=",
    "prev": "/api/v1/call-events?limit=100&cursor=eyJpZCI6IjEyMzQ1In0="
  }
}

16.2 Filtering

Filter Syntax

Filters are query parameters with the field name and value:
GET /api/v1/calls?status=completed&direction=inbound

Filter Operators

For advanced filtering, use operators:
OperatorSyntaxExampleMeaning
Equalsfield=valuestatus=completedExact match
Not equalsfield[ne]=valuestatus[ne]=failedNot equal
Greater thanfield[gt]=valueduration[gt]=60Greater than
Greater or equalfield[gte]=valueduration[gte]=60Greater or equal
Less thanfield[lt]=valueduration[lt]=300Less than
Less or equalfield[lte]=valueduration[lte]=300Less or equal
In listfield[in]=a,b,cstatus[in]=completed,transferredIn list
Not in listfield[nin]=a,b,cstatus[nin]=failed,cancelledNot in list
Containsfield[contains]=valuefrom_number[contains]=555Contains substring
Starts withfield[starts]=valuefrom_number[starts]=+1Starts with
Is nullfield[null]=trueended_at[null]=trueIs null
Is not nullfield[null]=falseended_at[null]=falseIs not null

Date Range Filtering

GET /api/v1/calls?initiated_at[gte]=2026-01-01T00:00:00Z&initiated_at[lt]=2026-02-01T00:00:00Z

Filterable Fields by Endpoint

Calls:
  • status - Call status
  • direction - inbound/outbound
  • from_number - Caller number
  • to_number - Destination number
  • phone_number_id - Phone number UUID
  • initiated_at - Call start time
  • answered_at - Answer time
  • ended_at - End time
  • duration - Duration in seconds
  • outcome - Call outcome
  • sentiment_label - positive/neutral/negative
Tenants:
  • status - Tenant status
  • business_type - Business category
  • created_at - Creation date
Phone Numbers:
  • status - Number status
  • provider - Telephony provider

16.3 Sorting

Sort Syntax

Use the sort parameter with field names:
GET /api/v1/calls?sort=created_at
Prefix with - for descending order:
GET /api/v1/calls?sort=-created_at
Multiple sort fields (comma-separated):
GET /api/v1/calls?sort=-initiated_at,status

Sortable Fields by Endpoint

Calls:
  • initiated_at (default: -initiated_at)
  • answered_at
  • ended_at
  • duration
  • status
  • from_number
Tenants:
  • name
  • created_at (default: -created_at)
  • status
Users:
  • email
  • first_name
  • last_name
  • created_at
  • last_login_at

Full-text search on specific endpoints:
GET /api/v1/calls?q=appointment+scheduling
Searches across:
  • Transcript content
  • Caller number
  • Outcome description
GET /api/v1/knowledge/documents?q=teeth+cleaning
Searches across:
  • Document name
  • Document content

Section 17: Agency Management API

17.1 List Agencies

Endpoint: GET /api/v1/agencies Authorization: platform_admin only Query Parameters: | Parameter | Type | Description | |-----------|------|-------------| | limit | integer | Items per page (default: 50, max: 100) | | offset | integer | Items to skip | | status | string | Filter by status | | billing_plan | string | Filter by billing plan | | sort | string | Sort field (default: -created_at) | | q | string | Search name, contact_email | Request:
GET /api/v1/agencies?status=active&limit=20
Authorization: Bearer {token}
Response (200):
{
  "data": [
    {
      "id": "agency-uuid-1",
      "type": "agency",
      "attributes": {
        "name": "Oxford Pierpont",
        "slug": "oxford-pierpont",
        "contact_email": "info@oxfordpierpont.com",
        "contact_phone": "+15551234567",
        "status": "active",
        "billing_plan": "growth",
        "max_tenants": 100,
        "tenant_count": 45,
        "created_at": "2026-01-15T10:00:00Z",
        "updated_at": "2026-01-20T15:30:00Z"
      }
    },
    {
      "id": "agency-uuid-2",
      "type": "agency",
      "attributes": {
        "name": "Digital Marketing Co",
        "slug": "digital-marketing-co",
        "contact_email": "hello@digitalmarketing.co",
        "status": "active",
        "billing_plan": "starter",
        "max_tenants": 25,
        "tenant_count": 12,
        "created_at": "2026-01-10T08:00:00Z",
        "updated_at": "2026-01-18T09:15:00Z"
      }
    }
  ],
  "meta": {
    "pagination": {
      "total": 45,
      "limit": 20,
      "offset": 0,
      "has_more": true
    }
  }
}

17.2 Get Agency

**Endpoint:** `GET /api/v1/agencies/{agency_id}`

Authorization: platform_admin or agency_admin/agency_user (own agency only) Request:
GET /api/v1/agencies/agency-uuid-1
Authorization: Bearer {token}
Response (200):
{
  "data": {
    "id": "agency-uuid-1",
    "type": "agency",
    "attributes": {
      "name": "Oxford Pierpont",
      "slug": "oxford-pierpont",
      "contact_email": "info@oxfordpierpont.com",
      "contact_phone": "+15551234567",
      "contact_name": "Bob Smith",
      "address_line1": "123 Main Street",
      "address_line2": "Suite 400",
      "city": "Atlanta",
      "state": "GA",
      "postal_code": "30301",
      "country": "US",
      "company_name": "Oxford Pierpont LLC",
      "status": "active",
      "is_verified": true,
      "billing_plan": "growth",
      "billing_email": "billing@oxfordpierpont.com",
      "max_tenants": 100,
      "max_concurrent_calls": 50,
      "settings": {
        "branding": {
          "logo_url": "https://cdn.example.com/logos/oxford-pierpont.png",
          "primary_color": "#1a73e8"
        },
        "defaults": {
          "voice_id": "sarah",
          "timezone": "America/New_York"
        }
      },
      "created_at": "2026-01-15T10:00:00Z",
      "updated_at": "2026-01-20T15:30:00Z"
    },
    "relationships": {
      "tenants": {
        "count": 45,
        "link": "/api/v1/agencies/agency-uuid-1/tenants"
      },
      "users": {
        "count": 5,
        "link": "/api/v1/agencies/agency-uuid-1/users"
      }
    }
  }
}
Error (404):
{
  "error": {
    "code": "RES_NOT_FOUND",
    "message": "Agency not found"
  }
}

17.3 Create Agency

Endpoint: POST /api/v1/agencies Authorization: platform_admin only Request:
{
  "name": "New Agency",
  "slug": "new-agency",
  "contact_email": "admin@newagency.com",
  "contact_phone": "+15551234567",
  "contact_name": "Jane Doe",
  "company_name": "New Agency LLC",
  "billing_plan": "starter",
  "settings": {
    "defaults": {
      "timezone": "America/Los_Angeles"
    }
  }
}
Validation Rules: | Field | Rules | |-------|-------| | name | Required, 1-255 chars | | slug | Required, 1-100 chars, lowercase alphanumeric and hyphens, unique | | contact_email | Required, valid email, unique | | contact_phone | Optional, valid phone format | | contact_name | Optional, 1-255 chars | | company_name | Optional, 1-255 chars | | billing_plan | Optional, one of: starter, growth, scale, enterprise | | settings | Optional, valid JSON object | Response (201):
{
  "data": {
    "id": "new-agency-uuid",
    "type": "agency",
    "attributes": {
      "name": "New Agency",
      "slug": "new-agency",
      "contact_email": "admin@newagency.com",
      "status": "active",
      "is_verified": false,
      "billing_plan": "starter",
      "max_tenants": 25,
      "max_concurrent_calls": 10,
      "created_at": "2026-01-25T12:00:00Z",
      "updated_at": "2026-01-25T12:00:00Z"
    }
  }
}
Error (409 - Conflict):
{
  "error": {
    "code": "RES_ALREADY_EXISTS",
    "message": "Agency with this slug already exists",
    "details": {
      "field": "slug",
      "value": "new-agency"
    }
  }
}

17.4 Update Agency

**Endpoint:** `PATCH /api/v1/agencies/{agency_id}`

Authorization: platform_admin or agency_admin (own agency, limited fields) Request (Platform Admin - Full Access):
{
  "name": "Updated Agency Name",
  "billing_plan": "growth",
  "max_tenants": 150,
  "status": "active"
}
Request (Agency Admin - Limited Fields):
{
  "name": "Updated Agency Name",
  "contact_email": "newemail@agency.com",
  "contact_phone": "+15559876543",
  "settings": {
    "branding": {
      "logo_url": "https://cdn.example.com/new-logo.png"
    }
  }
}
Fields Agency Admin CAN Update:
  • name
  • contact_email, contact_phone, contact_name
  • address fields
  • settings (branding, defaults)
Fields Agency Admin CANNOT Update:
  • slug
  • status
  • billing_plan
  • max_tenants, max_concurrent_calls
Response (200):
{
  "data": {
    "id": "agency-uuid",
    "type": "agency",
    "attributes": {
      "name": "Updated Agency Name",
      "...": "..."
    }
  }
}

17.5 Delete Agency

**Endpoint:** `DELETE /api/v1/agencies/{agency_id}`

Authorization: platform_admin only Request:
DELETE /api/v1/agencies/agency-uuid
Authorization: Bearer {token}
Response (204): No content Error (422 - Has Dependencies):
{
  "error": {
    "code": "BIZ_DEPENDENCY_EXISTS",
    "message": "Cannot delete agency with active tenants",
    "details": {
      "active_tenants": 12
    }
  }
}
Note: Soft delete. Agency and all data retained but marked deleted.

Section 18: Tenant Management API

18.1 List Tenants

Endpoint: GET /api/v1/tenants
**Alternative:** `GET /api/v1/agencies/{agency_id}/tenants` (scoped to agency)

Authorization: Based on user role and scope Query Parameters: | Parameter | Type | Description | |-----------|------|-------------| | agency_id | uuid | Filter by agency (platform_admin only) | | status | string | Filter by status | | business_type | string | Filter by business type | | limit | integer | Items per page | | offset | integer | Items to skip | | sort | string | Sort field | | q | string | Search name | Request:
GET /api/v1/tenants?status=active&business_type=dental&limit=20
Authorization: Bearer {token}
Response (200):
{
  "data": [
    {
      "id": "tenant-uuid-1",
      "type": "tenant",
      "attributes": {
        "name": "Smile Dental",
        "slug": "smile-dental",
        "business_type": "dental",
        "timezone": "America/New_York",
        "status": "active",
        "contact_email": "info@smiledental.com",
        "phone_number_count": 2,
        "created_at": "2026-01-20T10:00:00Z"
      },
      "relationships": {
        "agency": {
          "id": "agency-uuid-1",
          "name": "Oxford Pierpont"
        }
      }
    }
  ],
  "meta": {
    "pagination": {
      "total": 45,
      "limit": 20,
      "offset": 0,
      "has_more": true
    }
  }
}

18.2 Get Tenant

**Endpoint:** `GET /api/v1/tenants/{tenant_id}`

Authorization: Based on user scope Response (200):
{
  "data": {
    "id": "tenant-uuid-1",
    "type": "tenant",
    "attributes": {
      "name": "Smile Dental",
      "slug": "smile-dental",
      "business_type": "dental",
      "timezone": "America/New_York",
      "contact_email": "info@smiledental.com",
      "contact_phone": "+15551234567",
      "contact_name": "Dr. Sarah Smith",
      "website_url": "https://smiledental.com",
      "status": "active",
      "max_concurrent_calls": 10,
      "max_monthly_minutes": null,
      "settings": {
        "voice": {
          "voice_id": "alloy",
          "speaking_rate": 1.0,
          "language": "en-US"
        },
        "behavior": {
          "greeting_delay_ms": 500,
          "silence_timeout_ms": 5000,
          "max_call_duration_seconds": 1800
        },
        "features": {
          "call_recording": true,
          "transcription": true
        },
        "transfer": {
          "default_number": "+15551234567",
          "business_hours_only": true
        }
      },
      "created_at": "2026-01-20T10:00:00Z",
      "updated_at": "2026-01-22T14:30:00Z"
    },
    "relationships": {
      "agency": {
        "id": "agency-uuid-1",
        "name": "Oxford Pierpont"
      },
      "phone_numbers": {
        "count": 2,
        "link": "/api/v1/tenants/tenant-uuid-1/phone-numbers"
      },
      "knowledge_base": {
        "id": "kb-uuid-1",
        "link": "/api/v1/tenants/tenant-uuid-1/knowledge-base"
      }
    }
  }
}

18.3 Create Tenant

Endpoint: POST /api/v1/tenants
**Alternative:** `POST /api/v1/agencies/{agency_id}/tenants`

Authorization: platform_admin or agency_admin Request:
{
  "agency_id": "agency-uuid-1",
  "name": "Happy Paws Veterinary",
  "slug": "happy-paws-vet",
  "business_type": "veterinary",
  "timezone": "America/Chicago",
  "contact_email": "info@happypawsvet.com",
  "contact_phone": "+15551234567",
  "contact_name": "Dr. Mike Johnson",
  "website_url": "https://happypawsvet.com",
  "settings": {
    "voice": {
      "voice_id": "james",
      "speaking_rate": 0.95
    },
    "features": {
      "call_recording": true,
      "transcription": true
    }
  }
}
Validation Rules: | Field | Rules | |-------|-------| | agency_id | Required (unless using nested endpoint), valid UUID, agency must exist | | name | Required, 1-255 chars | | slug | Required, 1-100 chars, lowercase alphanumeric and hyphens, unique within agency | | business_type | Optional, 1-100 chars | | timezone | Required, valid IANA timezone | | contact_email | Optional, valid email | | contact_phone | Optional, valid phone | | settings | Optional, valid JSON | Response (201):
{
  "data": {
    "id": "new-tenant-uuid",
    "type": "tenant",
    "attributes": {
      "name": "Happy Paws Veterinary",
      "slug": "happy-paws-vet",
      "status": "active",
      "...": "..."
    }
  }
}
Error (422 - Quota Exceeded):
{
  "error": {
    "code": "BIZ_QUOTA_EXCEEDED",
    "message": "Agency has reached maximum tenant limit",
    "details": {
      "current_tenants": 25,
      "max_tenants": 25
    }
  }
}

18.4 Update Tenant

**Endpoint:** `PATCH /api/v1/tenants/{tenant_id}`

Authorization: Based on user scope Request:
{
  "name": "Updated Tenant Name",
  "contact_email": "newemail@tenant.com",
  "settings": {
    "voice": {
      "voice_id": "sarah"
    }
  }
}
Note: Settings are merged, not replaced. To remove a setting, set it to null. Response (200):
{
  "data": {
    "id": "tenant-uuid",
    "type": "tenant",
    "attributes": {
      "name": "Updated Tenant Name",
      "...": "..."
    }
  }
}

18.5 Delete Tenant

**Endpoint:** `DELETE /api/v1/tenants/{tenant_id}`

Authorization: platform_admin or agency_admin Response (204): No content Error (422):
{
  "error": {
    "code": "BIZ_DEPENDENCY_EXISTS",
    "message": "Cannot delete tenant with active phone numbers",
    "details": {
      "active_phone_numbers": 2
    }
  }
}

18.6 Get Tenant Settings

**Endpoint:** `GET /api/v1/tenants/{tenant_id}/settings`

Authorization: tenant_admin or higher Response (200):
{
  "data": {
    "voice": {
      "voice_id": "sarah",
      "speaking_rate": 1.0,
      "pitch": 1.0,
      "language": "en-US"
    },
    "behavior": {
      "greeting_delay_ms": 500,
      "silence_timeout_ms": 5000,
      "max_call_duration_seconds": 1800,
      "enable_barge_in": true
    },
    "features": {
      "call_recording": true,
      "transcription": true,
      "sentiment_analysis": false
    },
    "transfer": {
      "enabled": true,
      "default_number": "+15551234567",
      "business_hours_only": true,
      "transfer_greeting": "Please hold while I transfer you."
    },
    "notifications": {
      "email_on_missed_call": true,
      "email_on_transfer": false,
      "webhook_url": null
    }
  }
}

Section 19: Phone Number Management API

19.1 List Phone Numbers

Endpoint: GET /api/v1/phone-numbers
**Alternative:** `GET /api/v1/tenants/{tenant_id}/phone-numbers`

Authorization: Based on user scope Query Parameters: | Parameter | Type | Description | |-----------|------|-------------| | tenant_id | uuid | Filter by tenant | | status | string | Filter by status | | provider | string | Filter by provider | Response (200):
{
  "data": [
    {
      "id": "pn-uuid-1",
      "type": "phone_number",
      "attributes": {
        "number": "+15551234567",
        "friendly_name": "Main Office Line",
        "provider": "gotoconnect",
        "status": "active",
        "capabilities": {
          "voice": true,
          "sms": false
        },
        "provisioned_at": "2026-01-20T10:00:00Z"
      },
      "relationships": {
        "tenant": {
          "id": "tenant-uuid-1",
          "name": "Smile Dental"
        }
      }
    }
  ]
}

19.2 Search Available Numbers

Endpoint: GET /api/v1/phone-numbers/available Authorization: agency_admin or higher Query Parameters: | Parameter | Type | Description | |-----------|------|-------------| | area_code | string | Filter by area code (e.g., “404”) | | contains | string | Number contains pattern | | state | string | US state code (e.g., “GA”) | | country | string | Country code (default: “US”) | | limit | integer | Results to return (default: 20) | Request:
GET /api/v1/phone-numbers/available?area_code=404&limit=10
Authorization: Bearer {token}
Response (200):
{
  "data": [
    {
      "number": "+14045551234",
      "formatted": "(404) 555-1234",
      "locality": "Atlanta",
      "region": "GA",
      "country": "US",
      "capabilities": {
        "voice": true,
        "sms": true
      },
      "monthly_cost_cents": 100,
      "setup_cost_cents": 0
    },
    {
      "number": "+14045555678",
      "formatted": "(404) 555-5678",
      "locality": "Atlanta",
      "region": "GA",
      "country": "US",
      "capabilities": {
        "voice": true,
        "sms": false
      },
      "monthly_cost_cents": 100,
      "setup_cost_cents": 0
    }
  ],
  "meta": {
    "search_params": {
      "area_code": "404",
      "country": "US"
    }
  }
}

19.3 Provision Phone Number

Endpoint: POST /api/v1/phone-numbers Authorization: agency_admin or higher Request:
{
  "tenant_id": "tenant-uuid-1",
  "number": "+14045551234",
  "friendly_name": "Main Office Line",
  "settings": {
    "greeting_id": null,
    "voicemail_enabled": true,
    "transfer_enabled": true,
    "transfer_number": "+14045559999"
  }
}
Response (201):
{
  "data": {
    "id": "new-pn-uuid",
    "type": "phone_number",
    "attributes": {
      "number": "+14045551234",
      "friendly_name": "Main Office Line",
      "provider": "gotoconnect",
      "provider_id": "gtc-12345",
      "status": "active",
      "provisioned_at": "2026-01-25T12:00:00Z"
    }
  }
}
Error (422 - Number Unavailable):
{
  "error": {
    "code": "BIZ_OPERATION_FAILED",
    "message": "Phone number is no longer available",
    "details": {
      "number": "+14045551234"
    }
  }
}

19.4 Update Phone Number

**Endpoint:** `PATCH /api/v1/phone-numbers/{phone_number_id}`

Authorization: agency_admin or tenant_admin (own tenant) Request:
{
  "friendly_name": "Updated Line Name",
  "settings": {
    "voicemail_enabled": false,
    "greeting_id": "greeting-uuid-1"
  }
}
Response (200):
{
  "data": {
    "id": "pn-uuid",
    "type": "phone_number",
    "attributes": {
      "...": "..."
    }
  }
}

19.5 Release Phone Number

**Endpoint:** `DELETE /api/v1/phone-numbers/{phone_number_id}`

Authorization: agency_admin or higher Response (204): No content Note: This releases the number back to the provider. The number may be reassigned to someone else. This action cannot be undone. Confirmation Required: For safety, require confirmation header:
DELETE /api/v1/phone-numbers/pn-uuid
Authorization: Bearer {token}
X-Confirm-Release: true
Without confirmation header:
{
  "error": {
    "code": "BIZ_CONFIRMATION_REQUIRED",
    "message": "Please confirm phone number release",
    "details": {
      "number": "+14045551234",
      "warning": "This action cannot be undone. The number will be released to the provider."
    }
  }
}

Section 20: Call Management API

20.1 List Calls

Endpoint: GET /api/v1/calls
**Alternative:** `GET /api/v1/tenants/{tenant_id}/calls`

Authorization: Based on user scope Query Parameters: | Parameter | Type | Description | |-----------|------|-------------| | tenant_id | uuid | Filter by tenant | | phone_number_id | uuid | Filter by phone number | | status | string | Filter by status | | direction | string | inbound or outbound | | outcome | string | Filter by outcome | | sentiment_label | string | positive, neutral, negative | | initiated_at[gte] | datetime | Started after | | initiated_at[lt] | datetime | Started before | | duration[gte] | integer | Minimum duration (seconds) | | duration[lt] | integer | Maximum duration (seconds) | | q | string | Search transcript | | limit | integer | Items per page | | offset | integer | Items to skip | | sort | string | Sort field (default: -initiated_at) | Request:
GET /api/v1/calls?status=completed&direction=inbound&initiated_at[gte]=2026-01-20T00:00:00Z&limit=50
Authorization: Bearer {token}
Response (200):
{
  "data": [
    {
      "id": "call-uuid-1",
      "type": "call",
      "attributes": {
        "direction": "inbound",
        "status": "completed",
        "from_number": "+15559876543",
        "to_number": "+15551234567",
        "initiated_at": "2026-01-25T10:00:00Z",
        "answered_at": "2026-01-25T10:00:02Z",
        "ended_at": "2026-01-25T10:03:15Z",
        "duration_seconds": 193,
        "outcome": "appointment_scheduled",
        "sentiment_label": "positive",
        "sentiment_score": 0.75,
        "has_recording": true,
        "has_transcript": true
      },
      "relationships": {
        "tenant": {
          "id": "tenant-uuid-1",
          "name": "Smile Dental"
        },
        "phone_number": {
          "id": "pn-uuid-1",
          "number": "+15551234567"
        }
      }
    }
  ],
  "meta": {
    "pagination": {
      "total": 1250,
      "limit": 50,
      "offset": 0,
      "has_more": true
    },
    "aggregations": {
      "total_duration_seconds": 45678,
      "average_duration_seconds": 183,
      "status_counts": {
        "completed": 1100,
        "transferred": 100,
        "voicemail": 50
      }
    }
  }
}

20.2 Get Call

**Endpoint:** `GET /api/v1/calls/{call_id}`

Authorization: Based on user scope Query Parameters: | Parameter | Type | Description | |-----------|------|-------------| | include | string | Include related resources: transcript, recording, events | Request:
GET /api/v1/calls/call-uuid-1?include=transcript,events
Authorization: Bearer {token}
Response (200):
{
  "data": {
    "id": "call-uuid-1",
    "type": "call",
    "attributes": {
      "external_call_id": "gtc-call-12345",
      "livekit_room_name": "call-smile-dental-call-uuid-1",
      "direction": "inbound",
      "status": "completed",
      "from_number": "+15559876543",
      "to_number": "+15551234567",
      "initiated_at": "2026-01-25T10:00:00Z",
      "ringing_at": "2026-01-25T10:00:00Z",
      "answered_at": "2026-01-25T10:00:02Z",
      "ended_at": "2026-01-25T10:03:15Z",
      "duration_seconds": 193,
      "ring_duration_seconds": 2,
      "outcome": "appointment_scheduled",
      "outcome_details": {
        "appointment_date": "2026-01-30",
        "appointment_time": "14:30",
        "service": "teeth_cleaning"
      },
      "sentiment_score": 0.75,
      "sentiment_label": "positive",
      "cost_cents": 48,
      "cost_breakdown": {
        "telephony": 10,
        "stt": 12,
        "llm": 20,
        "tts": 5,
        "livekit": 1
      },
      "recording_url": "https://storage.example.com/recordings/call-uuid-1.wav",
      "recording_duration_seconds": 193,
      "metadata": {
        "caller_recognized": false,
        "transfer_attempted": false
      }
    },
    "relationships": {
      "tenant": {
        "id": "tenant-uuid-1",
        "name": "Smile Dental"
      },
      "phone_number": {
        "id": "pn-uuid-1",
        "number": "+15551234567",
        "friendly_name": "Main Office Line"
      },
      "transcript": {
        "id": "transcript-uuid-1"
      }
    },
    "included": {
      "transcript": {
        "id": "transcript-uuid-1",
        "full_text": "Agent: Thank you for calling Smile Dental...",
        "turn_count": 12,
        "word_count": 245
      },
      "events": [
        {
          "id": "event-1",
          "event_type": "status_changed",
          "previous_status": "pending",
          "new_status": "ringing",
          "occurred_at": "2026-01-25T10:00:00Z"
        },
        {
          "id": "event-2",
          "event_type": "status_changed",
          "previous_status": "ringing",
          "new_status": "answered",
          "occurred_at": "2026-01-25T10:00:02Z"
        }
      ]
    }
  }
}

20.3 Get Call Transcript

**Endpoint:** `GET /api/v1/calls/{call_id}/transcript`

Authorization: Based on user scope Response (200):
{
  "data": {
    "id": "transcript-uuid-1",
    "type": "transcript",
    "attributes": {
      "status": "complete",
      "full_text": "Agent: Thank you for calling Smile Dental, this is Dr. Smith's office. How can I help you today?\n\nCaller: Hi, I need to schedule a teeth cleaning.\n\nAgent: I'd be happy to help you schedule a cleaning! Are you an existing patient with us, or will this be your first visit?\n\n...",
      "turn_count": 12,
      "word_count": 245,
      "duration_seconds": 193,
      "turns": [
        {
          "turn_index": 0,
          "speaker": "agent",
          "content": "Thank you for calling Smile Dental, this is Dr. Smith's office. How can I help you today?",
          "start_time_ms": 2000,
          "end_time_ms": 6500,
          "confidence": 0.98
        },
        {
          "turn_index": 1,
          "speaker": "caller",
          "content": "Hi, I need to schedule a teeth cleaning.",
          "start_time_ms": 7000,
          "end_time_ms": 9500,
          "confidence": 0.95
        },
        {
          "turn_index": 2,
          "speaker": "agent",
          "content": "I'd be happy to help you schedule a cleaning! Are you an existing patient with us, or will this be your first visit?",
          "start_time_ms": 10000,
          "end_time_ms": 15000,
          "confidence": 0.99
        }
      ]
    }
  }
}

20.4 Get Call Recording

**Endpoint:** `GET /api/v1/calls/{call_id}/recording`

Authorization: Based on user scope + calls.listen permission Response (200):
{
  "data": {
    "id": "recording-uuid-1",
    "type": "recording",
    "attributes": {
      "status": "ready",
      "format": "wav",
      "file_size_bytes": 4567890,
      "duration_seconds": 193,
      "sample_rate": 48000,
      "channels": 2,
      "download_url": "https://signed-url.example.com/recordings/call-uuid-1.wav?signature=...",
      "download_url_expires_at": "2026-01-25T11:00:00Z",
      "stream_url": "https://signed-url.example.com/recordings/call-uuid-1.wav?signature=...&streaming=true"
    }
  }
}
Note: URLs are pre-signed and expire after 1 hour.

20.5 Get Call Events

**Endpoint:** `GET /api/v1/calls/{call_id}/events`

Authorization: Based on user scope Query Parameters: | Parameter | Type | Description | |-----------|------|-------------| | event_type | string | Filter by event type | | limit | integer | Items per page | | cursor | string | Pagination cursor | Response (200):
{
  "data": [
    {
      "id": "event-uuid-1",
      "type": "call_event",
      "attributes": {
        "event_type": "status_changed",
        "previous_status": "pending",
        "new_status": "ringing",
        "source": "gotoconnect",
        "data": {},
        "occurred_at": "2026-01-25T10:00:00.000Z"
      }
    },
    {
      "id": "event-uuid-2",
      "type": "call_event",
      "attributes": {
        "event_type": "status_changed",
        "previous_status": "ringing",
        "new_status": "answered",
        "source": "gotoconnect",
        "data": {},
        "occurred_at": "2026-01-25T10:00:02.150Z"
      }
    },
    {
      "id": "event-uuid-3",
      "type": "call_event",
      "attributes": {
        "event_type": "speech_detected",
        "source": "agent",
        "data": {
          "speaker": "caller",
          "duration_ms": 2500,
          "transcript": "Hi, I need to schedule a teeth cleaning."
        },
        "occurred_at": "2026-01-25T10:00:07.000Z"
      }
    },
    {
      "id": "event-uuid-4",
      "type": "call_event",
      "attributes": {
        "event_type": "response_generated",
        "source": "agent",
        "data": {
          "response": "I'd be happy to help you schedule a cleaning!...",
          "latency_ms": 450,
          "tokens_used": 85
        },
        "occurred_at": "2026-01-25T10:00:07.450Z"
      }
    }
  ]
}

20.6 Initiate Call Transfer

**Endpoint:** `POST /api/v1/calls/{call_id}/transfer`

Authorization: Based on user scope + active call required Request:
{
  "destination_number": "+15559999999",
  "destination_name": "Dr. Smith",
  "transfer_type": "warm",
  "reason": "Customer requested to speak with dentist",
  "announcement": "I have a patient on the line who would like to speak with you about their upcoming appointment."
}
Validation Rules: | Field | Rules | |-------|-------| | destination_number | Required, valid E.164 phone number | | destination_name | Optional, 1-255 chars | | transfer_type | Required, one of: cold, warm, blind | | reason | Optional, 1-255 chars | | announcement | Optional, message for warm transfer | Response (202 - Accepted):
{
  "data": {
    "id": "transfer-uuid-1",
    "type": "call_transfer",
    "attributes": {
      "status": "pending",
      "transfer_type": "warm",
      "destination_number": "+15559999999",
      "destination_name": "Dr. Smith",
      "initiated_at": "2026-01-25T10:02:00Z"
    }
  }
}
Error (422 - Invalid State):
{
  "error": {
    "code": "BIZ_INVALID_STATE",
    "message": "Call is not in a transferable state",
    "details": {
      "current_status": "completed",
      "required_status": "answered"
    }
  }
}

20.7 End Call

**Endpoint:** `POST /api/v1/calls/{call_id}/hangup`

Authorization: Based on user scope + active call required Request:
{
  "reason": "supervisor_requested"
}
Response (202 - Accepted):
{
  "data": {
    "id": "call-uuid-1",
    "type": "call",
    "attributes": {
      "status": "completed",
      "ended_at": "2026-01-25T10:03:15Z"
    }
  }
}

Section 21: Knowledge Base API

21.1 Get Knowledge Base

**Endpoint:** `GET /api/v1/tenants/{tenant_id}/knowledge-base`

Authorization: Based on user scope Response (200):
{
  "data": {
    "id": "kb-uuid-1",
    "type": "knowledge_base",
    "attributes": {
      "name": "Primary Knowledge Base",
      "status": "active",
      "document_count": 5,
      "chunk_count": 127,
      "total_tokens": 45000,
      "last_processed_at": "2026-01-24T15:00:00Z",
      "settings": {
        "chunk_size": 500,
        "chunk_overlap": 50,
        "embedding_model": "text-embedding-3-small"
      }
    },
    "relationships": {
      "documents": {
        "count": 5,
        "link": "/api/v1/tenants/tenant-uuid-1/knowledge-base/documents"
      }
    }
  }
}

21.2 List Documents

**Endpoint:** `GET /api/v1/tenants/{tenant_id}/knowledge-base/documents`

Authorization: Based on user scope + knowledge.view Response (200):
{
  "data": [
    {
      "id": "doc-uuid-1",
      "type": "knowledge_document",
      "attributes": {
        "name": "Services and Pricing",
        "document_type": "text",
        "status": "ready",
        "chunk_count": 25,
        "token_count": 8500,
        "character_count": 34000,
        "processed_at": "2026-01-20T10:30:00Z",
        "created_at": "2026-01-20T10:00:00Z"
      }
    },
    {
      "id": "doc-uuid-2",
      "type": "knowledge_document",
      "attributes": {
        "name": "FAQ",
        "document_type": "faq",
        "status": "ready",
        "chunk_count": 45,
        "token_count": 12000,
        "created_at": "2026-01-21T09:00:00Z"
      }
    },
    {
      "id": "doc-uuid-3",
      "type": "knowledge_document",
      "attributes": {
        "name": "Office Policies",
        "document_type": "pdf",
        "status": "ready",
        "chunk_count": 30,
        "token_count": 10000,
        "created_at": "2026-01-22T14:00:00Z"
      }
    }
  ]
}

21.3 Create Document (Text)

**Endpoint:** `POST /api/v1/tenants/{tenant_id}/knowledge-base/documents`

Authorization: Based on user scope + knowledge.create Request (Text Document):
{
  "name": "Services and Pricing",
  "document_type": "text",
  "content": "# Dental Services\n\n## Teeth Cleaning\n- Standard cleaning: $100\n- Deep cleaning: $200\n\nAppointments are 45 minutes for standard cleaning.\n\n## Teeth Whitening\n- In-office whitening: $300\n- Take-home kit: $150\n\n## Hours\nMonday - Friday: 8am - 5pm\nSaturday: 9am - 2pm\nSunday: Closed"
}
Response (202 - Accepted for Processing):
{
  "data": {
    "id": "new-doc-uuid",
    "type": "knowledge_document",
    "attributes": {
      "name": "Services and Pricing",
      "document_type": "text",
      "status": "processing",
      "created_at": "2026-01-25T12:00:00Z"
    }
  }
}

21.4 Create Document (File Upload)

**Endpoint:** `POST /api/v1/tenants/{tenant_id}/knowledge-base/documents/upload`

Authorization: Based on user scope + knowledge.create Request: multipart/form-data
Content-Type: multipart/form-data; boundary=----WebKitFormBoundary

------WebKitFormBoundary
Content-Disposition: form-data; name="file"; filename="policies.pdf"
Content-Type: application/pdf

[binary PDF data]
------WebKitFormBoundary
Content-Disposition: form-data; name="name"

Office Policies
------WebKitFormBoundary--
Response (202 - Accepted for Processing):
{
  "data": {
    "id": "new-doc-uuid",
    "type": "knowledge_document",
    "attributes": {
      "name": "Office Policies",
      "document_type": "pdf",
      "status": "processing",
      "metadata": {
        "file_name": "policies.pdf",
        "file_size": 102400,
        "mime_type": "application/pdf"
      }
    }
  }
}
Supported File Types:
  • PDF (.pdf)
  • Word (.docx)
  • Text (.txt)
  • Markdown (.md)
Max File Size: 10 MB

21.5 Create Document (URL)

**Endpoint:** `POST /api/v1/tenants/{tenant_id}/knowledge-base/documents`

Request (URL Document):
{
  "name": "Website FAQ",
  "document_type": "url",
  "source_url": "https://smiledental.com/faq"
}
Response (202 - Accepted for Processing):
{
  "data": {
    "id": "new-doc-uuid",
    "type": "knowledge_document",
    "attributes": {
      "name": "Website FAQ",
      "document_type": "url",
      "source_url": "https://smiledental.com/faq",
      "status": "processing"
    }
  }
}

21.6 Get Document

**Endpoint:** `GET /api/v1/tenants/{tenant_id}/knowledge-base/documents/{document_id}`

Authorization: Based on user scope + knowledge.view Response (200):
{
  "data": {
    "id": "doc-uuid-1",
    "type": "knowledge_document",
    "attributes": {
      "name": "Services and Pricing",
      "document_type": "text",
      "status": "ready",
      "original_content": "# Dental Services\n\n## Teeth Cleaning...",
      "chunk_count": 25,
      "token_count": 8500,
      "character_count": 34000,
      "processed_at": "2026-01-20T10:30:00Z",
      "metadata": {
        "source": "manual_entry",
        "uploaded_by": "user-uuid"
      },
      "created_at": "2026-01-20T10:00:00Z",
      "updated_at": "2026-01-20T10:30:00Z"
    }
  }
}

21.7 Update Document

**Endpoint:** `PATCH /api/v1/tenants/{tenant_id}/knowledge-base/documents/{document_id}`

Authorization: Based on user scope + knowledge.edit Request:
{
  "name": "Updated Document Name",
  "content": "Updated content..."
}
Note: Updating content triggers reprocessing (chunking and embedding). Response (202 - Accepted for Reprocessing):
{
  "data": {
    "id": "doc-uuid-1",
    "type": "knowledge_document",
    "attributes": {
      "name": "Updated Document Name",
      "status": "processing"
    }
  }
}

21.8 Delete Document

**Endpoint:** `DELETE /api/v1/tenants/{tenant_id}/knowledge-base/documents/{document_id}`

Authorization: Based on user scope + knowledge.delete Response (204): No content

21.9 Search Knowledge Base

**Endpoint:** `POST /api/v1/tenants/{tenant_id}/knowledge-base/search`

Authorization: Based on user scope + knowledge.view Request:
{
  "query": "How much does teeth cleaning cost?",
  "limit": 5,
  "min_similarity": 0.7
}
Response (200):
{
  "data": [
    {
      "chunk_id": "chunk-uuid-1",
      "document_id": "doc-uuid-1",
      "document_name": "Services and Pricing",
      "content": "## Teeth Cleaning\n- Standard cleaning: $100\n- Deep cleaning: $200\n\nAppointments are 45 minutes for standard cleaning.",
      "similarity": 0.92,
      "metadata": {
        "section": "Services"
      }
    },
    {
      "chunk_id": "chunk-uuid-2",
      "document_id": "doc-uuid-2",
      "document_name": "FAQ",
      "content": "Q: How much does a cleaning cost?\nA: Standard teeth cleaning is $100 for existing patients. New patients may have an additional exam fee of $50.",
      "similarity": 0.88,
      "metadata": {
        "section": "Pricing"
      }
    }
  ],
  "meta": {
    "query": "How much does teeth cleaning cost?",
    "results_count": 2,
    "search_time_ms": 45
  }
}

Section 22: Webhook Endpoints (Inbound)

These endpoints receive webhook notifications from external services.

22.1 GoToConnect Webhooks

Endpoint: POST /api/v1/webhooks/gotoconnect Authentication: Webhook signature validation

Signature Validation

GoToConnect signs webhooks with HMAC-SHA256:
import hmac
import hashlib

def validate_gotoconnect_webhook(request):
    signature = request.headers.get('X-GTC-Signature')
    timestamp = request.headers.get('X-GTC-Timestamp')
    body = request.body
    
    # Reconstruct the signed payload
    signed_payload = f"{timestamp}.{body}"
    
    # Calculate expected signature
    expected = hmac.new(
        GOTOCONNECT_WEBHOOK_SECRET.encode(),
        signed_payload.encode(),
        hashlib.sha256
    ).hexdigest()
    
    # Compare signatures
    if not hmac.compare_digest(signature, expected):
        raise WebhookValidationError("Invalid signature")
    
    # Check timestamp freshness (prevent replay attacks)
    timestamp_age = time.time() - int(timestamp)
    if timestamp_age > 300:  # 5 minutes
        raise WebhookValidationError("Webhook too old")
    
    return True

Event: call.ringing

Received when an incoming call starts ringing. Payload:
{
  "event": "call.ringing",
  "call_id": "gtc-call-12345",
  "account_id": "gtc-account-1",
  "line_id": "gtc-line-1",
  "from": "+15559876543",
  "to": "+15551234567",
  "direction": "inbound",
  "timestamp": "2026-01-25T10:00:00.000Z"
}
Processing:
  1. Look up phone number +15551234567 → find tenant
  2. Create call record in database
  3. Trigger WebRTC bridge to answer
  4. Create LiveKit room
  5. Dispatch AI agent
Response (200):
{
  "received": true,
  "call_id": "internal-call-uuid"
}

Event: call.answered

Received when call is answered. Payload:
{
  "event": "call.answered",
  "call_id": "gtc-call-12345",
  "answered_at": "2026-01-25T10:00:02.150Z",
  "timestamp": "2026-01-25T10:00:02.150Z"
}
Processing:
  1. Update call record: status = answered, answered_at = timestamp
  2. Record call_event

Event: call.ended

Received when call ends. Payload:
{
  "event": "call.ended",
  "call_id": "gtc-call-12345",
  "ended_at": "2026-01-25T10:03:15.000Z",
  "duration": 193,
  "reason": "caller_hangup",
  "timestamp": "2026-01-25T10:03:15.000Z"
}
Processing:
  1. Update call record: status = completed, ended_at, duration_seconds
  2. Close LiveKit room
  3. Trigger post-call processing (transcript finalization, analytics)

22.2 LiveKit Webhooks

Endpoint: POST /api/v1/webhooks/livekit Authentication: Webhook signature validation (JWT)

Signature Validation

LiveKit signs webhooks with the API secret:
import jwt

def validate_livekit_webhook(request):
    auth_header = request.headers.get('Authorization')
    if not auth_header or not auth_header.startswith('Bearer '):
        raise WebhookValidationError("Missing Authorization header")
    
    token = auth_header[7:]
    
    try:
        payload = jwt.decode(
            token,
            LIVEKIT_API_SECRET,
            algorithms=['HS256'],
            options={'verify_aud': False}
        )
    except jwt.InvalidTokenError:
        raise WebhookValidationError("Invalid token")
    
    # Verify the webhook event claim
    if 'video' not in payload:
        raise WebhookValidationError("Not a LiveKit webhook")
    
    return payload

Event: room_started

Payload:
{
  "event": "room_started",
  "room": {
    "name": "call-smile-dental-call-uuid-1",
    "sid": "RM_xxx",
    "creation_time": 1706180400
  }
}

Event: room_finished

Payload:
{
  "event": "room_finished",
  "room": {
    "name": "call-smile-dental-call-uuid-1",
    "sid": "RM_xxx"
  }
}
Processing:
  1. Mark LiveKit room as closed
  2. If call still marked as active, end it

Event: participant_joined

Payload:
{
  "event": "participant_joined",
  "room": {
    "name": "call-smile-dental-call-uuid-1",
    "sid": "RM_xxx"
  },
  "participant": {
    "identity": "caller-call-uuid-1",
    "sid": "PA_xxx",
    "name": "Caller",
    "metadata": "{\"type\":\"caller\"}"
  }
}
Processing:
  1. Record call_event: participant_joined
  2. Update participant count

Event: participant_left

Payload:
{
  "event": "participant_left",
  "room": {
    "name": "call-smile-dental-call-uuid-1",
    "sid": "RM_xxx"
  },
  "participant": {
    "identity": "caller-call-uuid-1",
    "sid": "PA_xxx"
  }
}
Processing:
  1. Record call_event: participant_left
  2. If caller left, initiate call ending

Event: track_published

Payload:
{
  "event": "track_published",
  "room": {
    "name": "call-smile-dental-call-uuid-1"
  },
  "participant": {
    "identity": "caller-call-uuid-1"
  },
  "track": {
    "sid": "TR_xxx",
    "type": "audio",
    "source": "microphone"
  }
}

Event: egress_ended

Payload:
{
  "event": "egress_ended",
  "egress_info": {
    "egress_id": "EG_xxx",
    "room_name": "call-smile-dental-call-uuid-1",
    "status": "EGRESS_COMPLETE",
    "file": {
      "filename": "call-uuid-1.wav",
      "size": 4567890,
      "duration": 193.5,
      "location": "s3://bucket/recordings/call-uuid-1.wav"
    }
  }
}
Processing:
  1. Update recording record with file info
  2. Mark recording as ready
  3. Trigger transcript processing if enabled

22.3 Deepgram Webhooks (If Using Callback Mode)

Endpoint: POST /api/v1/webhooks/deepgram Note: We primarily use streaming WebSocket, but callback mode is used for batch transcription. Payload:
{
  "metadata": {
    "request_id": "req-12345",
    "sha256": "abc123...",
    "created": "2026-01-25T10:05:00.000Z",
    "duration": 193.5,
    "channels": 2
  },
  "results": {
    "channels": [
      {
        "alternatives": [
          {
            "transcript": "Thank you for calling Smile Dental...",
            "confidence": 0.96,
            "words": [
              {"word": "Thank", "start": 0.0, "end": 0.3, "confidence": 0.99},
              {"word": "you", "start": 0.3, "end": 0.5, "confidence": 0.98}
            ]
          }
        ]
      }
    ]
  }
}

22.4 Webhook Security Best Practices

1. Always Validate Signatures

# Never skip signature validation
if not validate_signature(request):
    return Response(status=401)

2. Check Timestamp Freshness

# Prevent replay attacks
if webhook_timestamp < (now - 5_minutes):
    return Response(status=401)

3. Use HTTPS Only

# Reject non-HTTPS webhooks
if request.headers.get('X-Forwarded-Proto') != 'https':
    return Response(status=400)

4. Idempotency

# Handle duplicate webhooks gracefully
webhook_id = request.headers.get('X-Webhook-ID')
if db.webhook_processed(webhook_id):
    return Response(status=200)  # Already processed, OK

# Process webhook
process_webhook(request)
db.mark_webhook_processed(webhook_id)

5. Respond Quickly

# Acknowledge immediately, process async
@app.route('/webhooks/gotoconnect', methods=['POST'])
def handle_webhook():
    # Validate
    validate_signature(request)
    
    # Queue for async processing
    queue.enqueue(process_gotoconnect_webhook, request.json)
    
    # Respond immediately
    return Response(status=200)

6. Retry Logic on Failures

External services retry failed webhooks. Handle retries:
def process_webhook(data):
    try:
        # Process...
        pass
    except TemporaryError:
        # Return 5xx to trigger retry
        raise
    except PermanentError:
        # Return 4xx to stop retries
        log_permanent_failure(data)
        return Response(status=400)

End of Part 3

You now have:
  1. ✅ Complete REST API architecture and conventions
  2. ✅ Full authentication system with JWT
  3. ✅ Comprehensive RBAC authorization
  4. ✅ Pagination, filtering, and sorting patterns
  5. ✅ Complete endpoint specifications for all resources
  6. ✅ Inbound webhook handling
Next: Part 4 - GoToConnect Integration Part 4 will cover:
  • GoToConnect account setup
  • OAuth 2.0 implementation
  • Webhook event processing
  • Call control API usage
  • Phone number management
  • Ooma WebRTC Softphone automation

Document End - Part 3 of 10

Junior Developer PRD - Part 4: GoToConnect Integration

Document Version: 1.0
Last Updated: January 25, 2026
Part: 4 of 10
Sections: 23-30
Audience: Junior developers with no prior context

Section 23: GoToConnect Overview

23.1 What is GoToConnect

GoToConnect (formerly Jive) is a cloud-based business phone system owned by GoTo (formerly LogMeIn). It provides:
  • VoIP Phone Service - Cloud-hosted phone lines
  • Phone Numbers - Provision and manage DIDs
  • Call Routing - PBX functionality
  • WebRTC Calling - Browser-based soft phone
  • APIs - Programmatic control of calls and data
For Voice by aiConnected, GoToConnect serves as our telephony provider - the bridge between the traditional phone network (PSTN) and our internet-based AI system.

23.2 GoToConnect API Architecture

GoToConnect provides several API domains:
APIBase URLPurpose
Authenticationhttps://authentication.logmeininc.comOAuth 2.0 token management
Adminhttps://api.goto.comAccount & user management
Voice Adminhttps://api.goto.com/voice-admin/v1Phone system configuration
Web Callshttps://webrtc.jive.com/web-calls-v1WebRTC call control
Call Eventshttps://api.goto.com/call-events-report/v1Call event reporting & webhooks
Recordinghttps://api.goto.com/recording/v1Call recording access
Notification Channelhttps://api.goto.com/notification-channel/v1Webhook subscriptions

23.3 Authentication Scopes

GoToConnect uses OAuth 2.0 scopes to control API access:
ScopePermission
identity:scim.meRead user identity
voice-admin.v1.readRead phone system config
voice-admin.v1.writeModify phone system config
call-events.v1.notifications.manageManage call event webhooks
call-events.v1.reads.readRead call history
calls.v2.initiateInitiate outbound calls
cr.v1.readRead call recordings
users.v1.lines.readRead user line assignments

23.4 Key Concepts

Account Key

Every GoToConnect organization has an accountKey - a unique identifier for the account. This is required for most API calls.

Line

A “line” represents a phone line/extension in the system. Each user can have multiple lines assigned. Lines have:
  • Extension number (e.g., “1001”)
  • Phone numbers (DIDs) associated
  • Call forwarding rules

Device

A device is an endpoint that can make/receive calls:
  • Physical desk phone
  • Mobile app
  • WebRTC softphone (browser)

Session

For WebRTC calls, a “session” represents an active connection between a client and GoToConnect’s WebRTC infrastructure.

Section 24: OAuth 2.0 Authentication

24.1 OAuth Flow Overview

GoToConnect uses OAuth 2.0 Authorization Code flow:
┌──────────────┐                                    ┌──────────────┐
│    User      │                                    │ GoToConnect  │
│   Browser    │                                    │    OAuth     │
└──────┬───────┘                                    └──────┬───────┘
       │                                                   │
       │  1. User clicks "Connect GoToConnect"             │
       │ ─────────────────────────────────────────────────>│
       │                                                   │
       │  2. Redirect to GoTo login page                   │
       │ <─────────────────────────────────────────────────│
       │                                                   │
       │  3. User enters credentials                       │
       │ ─────────────────────────────────────────────────>│
       │                                                   │
       │  4. Redirect to callback with auth code           │
       │ <─────────────────────────────────────────────────│
       │                                                   │
       │                                                   │
┌──────┴───────┐                                    ┌──────┴───────┐
│  Our Server  │                                    │ GoToConnect  │
└──────┬───────┘                                    └──────┬───────┘
       │                                                   │
       │  5. Exchange code for tokens                      │
       │     POST /oauth/token                             │
       │ ─────────────────────────────────────────────────>│
       │                                                   │
       │  6. Return access_token + refresh_token           │
       │ <─────────────────────────────────────────────────│
       │                                                   │
       │  7. Use access_token for API calls                │
       │ ─────────────────────────────────────────────────>│
       │                                                   │

24.2 Step 1: Redirect to Authorization

Build the authorization URL and redirect the user:
from urllib.parse import urlencode

def get_authorization_url(state: str) -> str:
    """Generate GoToConnect OAuth authorization URL."""
    
    params = {
        'response_type': 'code',
        'client_id': GOTOCONNECT_CLIENT_ID,
        'redirect_uri': GOTOCONNECT_REDIRECT_URI,
        'state': state,  # CSRF protection
        'scope': ' '.join([
            'identity:scim.me',
            'voice-admin.v1.read',
            'voice-admin.v1.write',
            'call-events.v1.notifications.manage',
            'call-events.v1.reads.read',
            'calls.v2.initiate',
            'cr.v1.read',
            'users.v1.lines.read'
        ])
    }
    
    base_url = 'https://authentication.logmeininc.com/oauth/authorize'
    return f"{base_url}?{urlencode(params)}"


# In your web framework:
@app.get('/oauth/gotoconnect/start')
def start_oauth():
    # Generate random state for CSRF protection
    state = secrets.token_urlsafe(32)
    
    # Store state in session
    session['oauth_state'] = state
    
    # Redirect to GoToConnect
    auth_url = get_authorization_url(state)
    return redirect(auth_url)

24.3 Step 2: Handle Callback

When user authorizes, GoToConnect redirects to your callback URL:
GET /oauth/gotoconnect/callback?code=AUTH_CODE&state=STATE_VALUE
@app.get('/oauth/gotoconnect/callback')
async def oauth_callback(code: str, state: str):
    # Verify state matches (CSRF protection)
    if state != session.get('oauth_state'):
        raise HTTPException(400, 'Invalid state parameter')
    
    # Exchange code for tokens
    tokens = await exchange_code_for_tokens(code)
    
    # Store tokens securely
    await store_gotoconnect_tokens(
        agency_id=current_user.agency_id,
        access_token=tokens['access_token'],
        refresh_token=tokens['refresh_token'],
        expires_at=datetime.utcnow() + timedelta(seconds=tokens['expires_in'])
    )
    
    # Get account info
    account_info = await get_gotoconnect_account(tokens['access_token'])
    
    # Store account key
    await update_agency_gotoconnect_config(
        agency_id=current_user.agency_id,
        account_key=account_info['account_key']
    )
    
    return redirect('/settings/integrations?status=connected')

24.4 Step 3: Exchange Code for Tokens

import httpx

async def exchange_code_for_tokens(code: str) -> dict:
    """Exchange authorization code for access and refresh tokens."""
    
    async with httpx.AsyncClient() as client:
        response = await client.post(
            'https://authentication.logmeininc.com/oauth/token',
            data={
                'grant_type': 'authorization_code',
                'code': code,
                'redirect_uri': GOTOCONNECT_REDIRECT_URI,
                'client_id': GOTOCONNECT_CLIENT_ID,
                'client_secret': GOTOCONNECT_CLIENT_SECRET
            },
            headers={
                'Content-Type': 'application/x-www-form-urlencoded'
            }
        )
        
        if response.status_code != 200:
            raise GoToConnectError(f"Token exchange failed: {response.text}")
        
        return response.json()

# Response example:
# {
#     "access_token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...",
#     "token_type": "Bearer",
#     "expires_in": 3600,
#     "refresh_token": "a1b2c3d4e5f6...",
#     "scope": "identity:scim.me voice-admin.v1.read ..."
# }

24.5 Token Refresh

Access tokens expire (typically 1 hour). Use refresh token to get new access token:
async def refresh_access_token(refresh_token: str) -> dict:
    """Get new access token using refresh token."""
    
    async with httpx.AsyncClient() as client:
        response = await client.post(
            'https://authentication.logmeininc.com/oauth/token',
            data={
                'grant_type': 'refresh_token',
                'refresh_token': refresh_token,
                'client_id': GOTOCONNECT_CLIENT_ID,
                'client_secret': GOTOCONNECT_CLIENT_SECRET
            },
            headers={
                'Content-Type': 'application/x-www-form-urlencoded'
            }
        )
        
        if response.status_code != 200:
            # Refresh token may be invalid/expired
            # User needs to re-authorize
            raise GoToConnectReauthorizationRequired()
        
        return response.json()


async def get_valid_access_token(agency_id: str) -> str:
    """Get a valid access token, refreshing if necessary."""
    
    tokens = await get_stored_tokens(agency_id)
    
    if not tokens:
        raise GoToConnectNotConnected()
    
    # Check if access token is expired (with 5 minute buffer)
    if tokens.expires_at < datetime.utcnow() + timedelta(minutes=5):
        # Refresh the token
        new_tokens = await refresh_access_token(tokens.refresh_token)
        
        # Store new tokens
        await store_gotoconnect_tokens(
            agency_id=agency_id,
            access_token=new_tokens['access_token'],
            refresh_token=new_tokens.get('refresh_token', tokens.refresh_token),
            expires_at=datetime.utcnow() + timedelta(seconds=new_tokens['expires_in'])
        )
        
        return new_tokens['access_token']
    
    return tokens.access_token

24.6 Token Storage Schema

CREATE TABLE gotoconnect_credentials (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    agency_id UUID NOT NULL REFERENCES agencies(id) UNIQUE,
    
    -- Account Info
    account_key VARCHAR(255) NOT NULL,
    organization_name VARCHAR(255),
    
    -- OAuth Tokens (encrypted at rest)
    access_token_encrypted BYTEA NOT NULL,
    refresh_token_encrypted BYTEA NOT NULL,
    token_expires_at TIMESTAMPTZ NOT NULL,
    
    -- Scopes granted
    scopes TEXT[] NOT NULL,
    
    -- Status
    is_active BOOLEAN NOT NULL DEFAULT TRUE,
    last_used_at TIMESTAMPTZ,
    last_error TEXT,
    
    -- Timestamps
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX ix_gotoconnect_credentials_agency ON gotoconnect_credentials(agency_id);
Important: Encrypt tokens at rest using application-level encryption (e.g., Fernet symmetric encryption with a key from environment variables).

Section 25: Voice Admin API

The Voice Admin API manages phone system configuration.

25.1 Get Account Information

Retrieve account configuration details.
**Endpoint:** `GET /voice-admin/v1/accounts/{accountKey}`

Required Scope: voice-admin.v1.read
async def get_account_info(access_token: str, account_key: str) -> dict:
    """Get GoToConnect account information."""
    
    async with httpx.AsyncClient() as client:
        response = await client.get(
            f'https://api.goto.com/voice-admin/v1/accounts/{account_key}',
            headers={
                'Authorization': f'Bearer {access_token}'
            }
        )
        
        if response.status_code == 401:
            raise GoToConnectAuthError(response.json())
        elif response.status_code == 403:
            raise GoToConnectPermissionError(response.json())
        elif response.status_code == 404:
            raise GoToConnectNotFoundError('Account not found')
        elif response.status_code != 200:
            raise GoToConnectError(f"Unexpected response: {response.status_code}")
        
        return response.json()

# Response example:
# {
#     "extensionDigits": 4
# }

25.2 List Lines (Extensions)

Get all phone lines in the account. Endpoint: GET /voice-admin/v1/lines Required Scope: voice-admin.v1.read
async def list_lines(
    access_token: str, 
    account_key: str,
    line_type: str = None  # 'user', 'shared', 'fax', etc.
) -> list:
    """List all phone lines in the account."""
    
    params = {'accountKey': account_key}
    if line_type:
        params['lineType'] = line_type
    
    async with httpx.AsyncClient() as client:
        response = await client.get(
            'https://api.goto.com/voice-admin/v1/lines',
            params=params,
            headers={'Authorization': f'Bearer {access_token}'}
        )
        
        if response.status_code != 200:
            raise GoToConnectError(f"Failed to list lines: {response.text}")
        
        return response.json().get('items', [])

# Response example:
# {
#     "items": [
#         {
#             "id": "line-uuid-1",
#             "name": "Main Reception",
#             "extension": "1001",
#             "lineType": "user",
#             "phoneNumbers": [
#                 {
#                     "phoneNumber": "+14045551234",
#                     "type": "direct"
#                 }
#             ],
#             "owner": {
#                 "id": "user-uuid",
#                 "name": "John Doe",
#                 "email": "john@company.com"
#             }
#         }
#     ]
# }

25.3 Get Line Details

Get details for a specific line.
**Endpoint:** `GET /voice-admin/v1/lines/{lineId}`

async def get_line(access_token: str, line_id: str) -> dict:
    """Get details for a specific line."""
    
    async with httpx.AsyncClient() as client:
        response = await client.get(
            f'https://api.goto.com/voice-admin/v1/lines/{line_id}',
            headers={'Authorization': f'Bearer {access_token}'}
        )
        
        if response.status_code == 404:
            raise GoToConnectNotFoundError('Line not found')
        elif response.status_code != 200:
            raise GoToConnectError(f"Failed to get line: {response.text}")
        
        return response.json()

25.4 Search Available Phone Numbers

Search for phone numbers available for purchase. Endpoint: GET /voice-admin/v1/phone-numbers/available Required Scope: voice-admin.v1.read
async def search_available_numbers(
    access_token: str,
    account_key: str,
    area_code: str = None,
    country: str = 'US',
    state: str = None,
    contains: str = None,
    limit: int = 20
) -> list:
    """Search for available phone numbers to provision."""
    
    params = {
        'accountKey': account_key,
        'country': country,
        'limit': limit
    }
    
    if area_code:
        params['areaCode'] = area_code
    if state:
        params['state'] = state
    if contains:
        params['contains'] = contains
    
    async with httpx.AsyncClient() as client:
        response = await client.get(
            'https://api.goto.com/voice-admin/v1/phone-numbers/available',
            params=params,
            headers={'Authorization': f'Bearer {access_token}'}
        )
        
        if response.status_code != 200:
            raise GoToConnectError(f"Failed to search numbers: {response.text}")
        
        return response.json().get('items', [])

# Response example:
# {
#     "items": [
#         {
#             "phoneNumber": "+14045551234",
#             "locality": "Atlanta",
#             "region": "GA",
#             "country": "US",
#             "capabilities": ["voice", "sms"]
#         }
#     ]
# }

25.5 Create Phone Number Order

Order a new phone number. Endpoint: POST /voice-admin/v1/phone-number-orders Required Scope: voice-admin.v1.write
async def order_phone_number(
    access_token: str,
    account_key: str,
    area_code: str
) -> dict:
    """Start phone number ordering process."""
    
    async with httpx.AsyncClient() as client:
        response = await client.post(
            'https://api.goto.com/voice-admin/v1/phone-number-orders',
            json={
                'accountKey': account_key,
                'areaCode': area_code
            },
            headers={
                'Authorization': f'Bearer {access_token}',
                'Content-Type': 'application/json'
            }
        )
        
        if response.status_code == 201:
            return response.json()  # Returns order ID
        elif response.status_code == 400:
            error = response.json()
            if error.get('errorCode') == 'NO_PHONE_NUMBER_FOUND_FOR_AREA_CODE':
                raise GoToConnectNoNumbersAvailable(area_code)
            raise GoToConnectError(error.get('message', 'Order failed'))
        else:
            raise GoToConnectError(f"Order failed: {response.text}")

# Response:
# {
#     "id": "order-uuid-12345"
# }

25.6 Get Phone Number Order Status

Check the status of a phone number order.
**Endpoint:** `GET /voice-admin/v1/phone-number-orders/{orderId}`

async def get_order_status(access_token: str, order_id: str) -> dict:
    """Get phone number order status."""
    
    async with httpx.AsyncClient() as client:
        response = await client.get(
            f'https://api.goto.com/voice-admin/v1/phone-number-orders/{order_id}',
            headers={'Authorization': f'Bearer {access_token}'}
        )
        
        if response.status_code != 200:
            raise GoToConnectError(f"Failed to get order: {response.text}")
        
        return response.json()

# Response:
# {
#     "id": "order-uuid-12345",
#     "status": "complete",  # pending, processing, complete, failed
#     "phoneNumber": "+14045551234",
#     "createdAt": "2026-01-25T10:00:00Z",
#     "completedAt": "2026-01-25T10:00:30Z"
# }

25.7 Assign Phone Number to Line

Assign a provisioned number to a specific line.
**Endpoint:** `PATCH /voice-admin/v1/lines/{lineId}`

async def assign_number_to_line(
    access_token: str,
    line_id: str,
    phone_number: str
) -> dict:
    """Assign a phone number to a line."""
    
    async with httpx.AsyncClient() as client:
        response = await client.patch(
            f'https://api.goto.com/voice-admin/v1/lines/{line_id}',
            json={
                'phoneNumbers': [
                    {
                        'phoneNumber': phone_number,
                        'type': 'direct'
                    }
                ]
            },
            headers={
                'Authorization': f'Bearer {access_token}',
                'Content-Type': 'application/json'
            }
        )
        
        if response.status_code not in (200, 204):
            raise GoToConnectError(f"Failed to assign number: {response.text}")
        
        return response.json() if response.content else {}

Section 26: WebRTC Call Control API

The Web Calls API (webrtc.jive.com) provides programmatic control of WebRTC calls. This is how we answer incoming calls and route audio to our AI system.

26.1 Understanding WebRTC Sessions

A WebRTC session represents an authenticated connection to GoToConnect’s real-time infrastructure:
┌─────────────────┐              ┌─────────────────┐
│  Our Service    │              │  GoToConnect    │
│  (Bridge)       │              │  WebRTC Server  │
└────────┬────────┘              └────────┬────────┘
         │                                │
         │  1. Create Session             │
         │     POST /sessions             │
         │─────────────────────────────────>
         │                                │
         │  2. Session Created            │
         │     {sessionId, wsUrl}         │
         │<─────────────────────────────────
         │                                │
         │  3. Connect WebSocket          │
         │     ws://wsUrl                 │
         │═════════════════════════════════
         │                                │
         │  4. Receive call events        │
         │     {type: "incoming_call"}    │
         │<═════════════════════════════════
         │                                │
         │  5. Answer call                │
         │     POST /sessions/{id}/calls/ │
         │           {callId}/answer      │
         │─────────────────────────────────>
         │                                │
         │  6. WebRTC media established   │
         │     (audio streams flow)       │
         │<════════════════════════════════>
         │                                │

26.2 Create WebRTC Session

Create a session to receive and control calls. Endpoint: POST /web-calls-v1/sessions Base URL: https://webrtc.jive.com
async def create_webrtc_session(
    access_token: str,
    device_name: str = 'AI Voice Agent'
) -> dict:
    """Create a WebRTC session for call control."""
    
    async with httpx.AsyncClient() as client:
        response = await client.post(
            'https://webrtc.jive.com/web-calls-v1/sessions',
            json={
                'deviceName': device_name,
                'deviceType': 'web'  # or 'mobile', 'desktop'
            },
            headers={
                'Authorization': f'Bearer {access_token}',
                'Content-Type': 'application/json'
            }
        )
        
        if response.status_code != 201:
            raise GoToConnectError(f"Failed to create session: {response.text}")
        
        return response.json()

# Response:
# {
#     "sessionId": "session-uuid-12345",
#     "wsUrl": "wss://webrtc.jive.com/ws/session-uuid-12345",
#     "expiresAt": "2026-01-25T11:00:00Z"
# }

26.3 Session WebSocket Events

Connect to the WebSocket URL to receive real-time events:
import websockets
import json

class GoToConnectWebSocket:
    def __init__(self, ws_url: str, access_token: str):
        self.ws_url = ws_url
        self.access_token = access_token
        self.ws = None
        self.handlers = {}
    
    async def connect(self):
        """Connect to GoToConnect WebSocket."""
        headers = {
            'Authorization': f'Bearer {self.access_token}'
        }
        
        self.ws = await websockets.connect(
            self.ws_url,
            extra_headers=headers
        )
        
        # Start listening for events
        asyncio.create_task(self._listen())
    
    async def _listen(self):
        """Listen for WebSocket events."""
        try:
            async for message in self.ws:
                event = json.loads(message)
                await self._handle_event(event)
        except websockets.ConnectionClosed:
            # Handle reconnection
            await self._reconnect()
    
    async def _handle_event(self, event: dict):
        """Route event to appropriate handler."""
        event_type = event.get('type')
        
        if event_type in self.handlers:
            await self.handlers[event_type](event)
        else:
            print(f"Unhandled event type: {event_type}")
    
    def on(self, event_type: str, handler):
        """Register event handler."""
        self.handlers[event_type] = handler


# Event types:
# - incoming_call: New inbound call
# - call_state_changed: Call state updated
# - call_ended: Call terminated
# - dtmf_received: DTMF digit pressed
# - media_state_changed: Audio/video state changed

Incoming Call Event

{
  "type": "incoming_call",
  "callId": "call-uuid-12345",
  "from": {
    "phoneNumber": "+15559876543",
    "displayName": "John Doe"
  },
  "to": {
    "phoneNumber": "+15551234567",
    "extension": "1001"
  },
  "lineId": "line-uuid",
  "direction": "inbound",
  "timestamp": "2026-01-25T10:00:00.000Z"
}

Call State Changed Event

{
  "type": "call_state_changed",
  "callId": "call-uuid-12345",
  "state": "ringing",  // ringing, active, held, ended
  "previousState": "incoming",
  "timestamp": "2026-01-25T10:00:02.000Z"
}

26.4 Answer Incoming Call

Answer a ringing call.
**Endpoint:** `POST /web-calls-v1/sessions/{sessionId}/calls/{callId}/answer`

async def answer_call(
    access_token: str,
    session_id: str,
    call_id: str,
    sdp_offer: str = None  # WebRTC SDP offer (optional)
) -> dict:
    """Answer an incoming call."""
    
    body = {}
    if sdp_offer:
        body['sdpOffer'] = sdp_offer
    
    async with httpx.AsyncClient() as client:
        response = await client.post(
            f'https://webrtc.jive.com/web-calls-v1/sessions/{session_id}/calls/{call_id}/answer',
            json=body if body else None,
            headers={
                'Authorization': f'Bearer {access_token}',
                'Content-Type': 'application/json'
            }
        )
        
        if response.status_code == 200:
            return response.json()
        elif response.status_code == 404:
            raise GoToConnectNotFoundError('Call not found or already ended')
        elif response.status_code == 409:
            raise GoToConnectConflictError('Call already answered')
        else:
            raise GoToConnectError(f"Failed to answer call: {response.text}")

# Response:
# {
#     "callId": "call-uuid-12345",
#     "state": "active",
#     "sdpAnswer": "v=0\r\no=- 12345 ..."  // WebRTC SDP answer
# }

26.5 Place Call on Hold

Put an active call on hold.
**Endpoint:** `POST /web-calls-v1/sessions/{sessionId}/calls/{callId}/hold`

async def hold_call(
    access_token: str,
    session_id: str,
    call_id: str
) -> dict:
    """Place call on hold."""
    
    async with httpx.AsyncClient() as client:
        response = await client.post(
            f'https://webrtc.jive.com/web-calls-v1/sessions/{session_id}/calls/{call_id}/hold',
            headers={'Authorization': f'Bearer {access_token}'}
        )
        
        if response.status_code != 200:
            raise GoToConnectError(f"Failed to hold call: {response.text}")
        
        return response.json()

26.6 Resume Call from Hold

Take a call off hold.
**Endpoint:** `POST /web-calls-v1/sessions/{sessionId}/calls/{callId}/unhold`

async def unhold_call(
    access_token: str,
    session_id: str,
    call_id: str
) -> dict:
    """Resume call from hold."""
    
    async with httpx.AsyncClient() as client:
        response = await client.post(
            f'https://webrtc.jive.com/web-calls-v1/sessions/{session_id}/calls/{call_id}/unhold',
            headers={'Authorization': f'Bearer {access_token}'}
        )
        
        if response.status_code != 200:
            raise GoToConnectError(f"Failed to unhold call: {response.text}")
        
        return response.json()

26.7 Mute/Unmute

Control audio muting.
**Endpoint:** `POST /web-calls-v1/sessions/{sessionId}/calls/{callId}/mute` **Endpoint:** `POST /web-calls-v1/sessions/{sessionId}/calls/{callId}/unmute`

async def mute_call(access_token: str, session_id: str, call_id: str) -> dict:
    """Mute outgoing audio."""
    
    async with httpx.AsyncClient() as client:
        response = await client.post(
            f'https://webrtc.jive.com/web-calls-v1/sessions/{session_id}/calls/{call_id}/mute',
            headers={'Authorization': f'Bearer {access_token}'}
        )
        
        if response.status_code != 200:
            raise GoToConnectError(f"Failed to mute: {response.text}")
        
        return response.json()


async def unmute_call(access_token: str, session_id: str, call_id: str) -> dict:
    """Unmute outgoing audio."""
    
    async with httpx.AsyncClient() as client:
        response = await client.post(
            f'https://webrtc.jive.com/web-calls-v1/sessions/{session_id}/calls/{call_id}/unmute',
            headers={'Authorization': f'Bearer {access_token}'}
        )
        
        if response.status_code != 200:
            raise GoToConnectError(f"Failed to unmute: {response.text}")
        
        return response.json()

26.8 Send DTMF Tones

Send touch-tone digits during a call.
**Endpoint:** `POST /web-calls-v1/sessions/{sessionId}/calls/{callId}/dtmf`

async def send_dtmf(
    access_token: str,
    session_id: str,
    call_id: str,
    digit: str  # '0'-'9', '*', '#'
) -> dict:
    """Send DTMF tone during call."""
    
    if digit not in '0123456789*#':
        raise ValueError(f"Invalid DTMF digit: {digit}")
    
    async with httpx.AsyncClient() as client:
        response = await client.post(
            f'https://webrtc.jive.com/web-calls-v1/sessions/{session_id}/calls/{call_id}/dtmf',
            json={'digit': digit},
            headers={
                'Authorization': f'Bearer {access_token}',
                'Content-Type': 'application/json'
            }
        )
        
        if response.status_code != 200:
            raise GoToConnectError(f"Failed to send DTMF: {response.text}")
        
        return response.json()

26.9 Blind Transfer

Transfer call directly to another number without consultation.
**Endpoint:** `POST /web-calls-v1/sessions/{sessionId}/calls/{callId}/blind-transfer`

async def blind_transfer(
    access_token: str,
    session_id: str,
    call_id: str,
    destination: str  # Phone number or extension
) -> dict:
    """Blind transfer call to destination."""
    
    async with httpx.AsyncClient() as client:
        response = await client.post(
            f'https://webrtc.jive.com/web-calls-v1/sessions/{session_id}/calls/{call_id}/blind-transfer',
            json={'destination': destination},
            headers={
                'Authorization': f'Bearer {access_token}',
                'Content-Type': 'application/json'
            }
        )
        
        if response.status_code == 200:
            return response.json()
        elif response.status_code == 400:
            raise GoToConnectError('Invalid transfer destination')
        else:
            raise GoToConnectError(f"Transfer failed: {response.text}")

26.10 Warm Transfer

Transfer with consultation - speak to recipient before completing transfer.
**Endpoint:** `POST /web-calls-v1/sessions/{sessionId}/calls/{callId}/warm-transfer`

async def start_warm_transfer(
    access_token: str,
    session_id: str,
    call_id: str,
    destination: str
) -> dict:
    """Initiate warm transfer - original caller put on hold."""
    
    async with httpx.AsyncClient() as client:
        response = await client.post(
            f'https://webrtc.jive.com/web-calls-v1/sessions/{session_id}/calls/{call_id}/warm-transfer',
            json={'destination': destination},
            headers={
                'Authorization': f'Bearer {access_token}',
                'Content-Type': 'application/json'
            }
        )
        
        if response.status_code != 200:
            raise GoToConnectError(f"Warm transfer failed: {response.text}")
        
        return response.json()

# After consulting with transfer target:
# - Complete transfer: POST .../complete-transfer
# - Cancel transfer: POST .../cancel-transfer

26.11 End Call (Hangup)

Terminate an active call.
**Endpoint:** `DELETE /web-calls-v1/sessions/{sessionId}/calls/{callId}`

async def hangup_call(
    access_token: str,
    session_id: str,
    call_id: str
) -> None:
    """End/hangup a call."""
    
    async with httpx.AsyncClient() as client:
        response = await client.delete(
            f'https://webrtc.jive.com/web-calls-v1/sessions/{session_id}/calls/{call_id}',
            headers={'Authorization': f'Bearer {access_token}'}
        )
        
        if response.status_code not in (200, 204):
            raise GoToConnectError(f"Failed to hangup: {response.text}")

26.12 Refresh Session

Keep session alive (call periodically).
**Endpoint:** `POST /web-calls-v1/sessions/{sessionId}/refresh`

async def refresh_session(access_token: str, session_id: str) -> dict:
    """Refresh WebRTC session to prevent expiration."""
    
    async with httpx.AsyncClient() as client:
        response = await client.post(
            f'https://webrtc.jive.com/web-calls-v1/sessions/{session_id}/refresh',
            headers={'Authorization': f'Bearer {access_token}'}
        )
        
        if response.status_code != 200:
            raise GoToConnectError(f"Failed to refresh session: {response.text}")
        
        return response.json()

# Call every 5-10 minutes to keep session alive

Section 27: Notification Channel API (Webhooks)

The Notification Channel API lets us subscribe to events via webhooks.

27.1 Create Notification Channel

Register a webhook URL to receive events. Endpoint: POST /notification-channel/v1/channels
async def create_notification_channel(
    access_token: str,
    webhook_url: str,
    channel_type: str = 'integrations'  # or 'webhook'
) -> dict:
    """Create notification channel for webhooks."""
    
    async with httpx.AsyncClient() as client:
        response = await client.post(
            'https://api.goto.com/notification-channel/v1/channels',
            json={
                'webhookUrl': webhook_url,
                'channelType': channel_type
            },
            headers={
                'Authorization': f'Bearer {access_token}',
                'Content-Type': 'application/json'
            }
        )
        
        if response.status_code != 201:
            raise GoToConnectError(f"Failed to create channel: {response.text}")
        
        return response.json()

# Response:
# {
#     "channelId": "channel-uuid-12345",
#     "webhookUrl": "https://your-server.com/webhooks/gotoconnect",
#     "channelLifetime": "PT24H",  # 24 hours
#     "expiresAt": "2026-01-26T10:00:00Z"
# }

27.2 Extend Channel Lifetime

Prevent channel from expiring.
**Endpoint:** `PUT /notification-channel/v1/channels/{channelId}/channel-lifetime`

async def extend_channel_lifetime(
    access_token: str,
    channel_id: str
) -> dict:
    """Extend notification channel lifetime."""
    
    async with httpx.AsyncClient() as client:
        response = await client.put(
            f'https://api.goto.com/notification-channel/v1/channels/{channel_id}/channel-lifetime',
            headers={'Authorization': f'Bearer {access_token}'}
        )
        
        if response.status_code != 200:
            raise GoToConnectError(f"Failed to extend channel: {response.text}")
        
        return response.json()

27.3 Delete Notification Channel

Remove a webhook subscription.
**Endpoint:** `DELETE /notification-channel/v1/channels/{channelId}`

async def delete_notification_channel(
    access_token: str,
    channel_id: str
) -> None:
    """Delete notification channel."""
    
    async with httpx.AsyncClient() as client:
        response = await client.delete(
            f'https://api.goto.com/notification-channel/v1/channels/{channel_id}',
            headers={'Authorization': f'Bearer {access_token}'}
        )
        
        if response.status_code not in (200, 204):
            raise GoToConnectError(f"Failed to delete channel: {response.text}")

Section 28: Call Events API

Subscribe to and receive call events via webhooks.

28.1 Create Call Events Subscription

Subscribe to call events for specific lines. Endpoint: POST /call-events-report/v1/subscriptions
async def subscribe_to_call_events(
    access_token: str,
    channel_id: str,
    line_ids: list = None  # None = all lines
) -> dict:
    """Subscribe to call events."""
    
    body = {
        'channelId': channel_id
    }
    
    if line_ids:
        body['lineIds'] = line_ids
    
    async with httpx.AsyncClient() as client:
        response = await client.post(
            'https://api.goto.com/call-events-report/v1/subscriptions',
            json=body,
            headers={
                'Authorization': f'Bearer {access_token}',
                'Content-Type': 'application/json'
            }
        )
        
        if response.status_code != 201:
            raise GoToConnectError(f"Failed to subscribe: {response.text}")
        
        return response.json()

# Response:
# {
#     "subscriptionId": "sub-uuid-12345",
#     "channelId": "channel-uuid",
#     "lineIds": ["line-1", "line-2"]
# }

28.2 Webhook Event Payloads

Call Started Event

{
  "source": "call-events",
  "type": "call.started",
  "timestamp": "2026-01-25T10:00:00.000Z",
  "data": {
    "callId": "call-uuid-12345",
    "organizationId": "org-uuid",
    "accountKey": "12345",
    "lineId": "line-uuid",
    "direction": "inbound",
    "callerNumber": "+15559876543",
    "callerName": "John Doe",
    "dialedNumber": "+15551234567",
    "startTime": "2026-01-25T10:00:00.000Z"
  }
}

Call Answered Event

{
  "source": "call-events",
  "type": "call.answered",
  "timestamp": "2026-01-25T10:00:02.150Z",
  "data": {
    "callId": "call-uuid-12345",
    "answeredBy": {
      "userId": "user-uuid",
      "deviceType": "webrtc"
    },
    "answerTime": "2026-01-25T10:00:02.150Z"
  }
}

Call Ended Event

{
  "source": "call-events",
  "type": "call.ended",
  "timestamp": "2026-01-25T10:03:15.000Z",
  "data": {
    "callId": "call-uuid-12345",
    "endTime": "2026-01-25T10:03:15.000Z",
    "duration": 193,
    "endReason": "caller_hangup",
    "outcome": "answered"  // answered, missed, voicemail, busy
  }
}

Call Transferred Event

{
  "source": "call-events",
  "type": "call.transferred",
  "timestamp": "2026-01-25T10:02:00.000Z",
  "data": {
    "callId": "call-uuid-12345",
    "transferType": "blind",
    "transferredTo": "+15559999999",
    "transferredBy": {
      "userId": "user-uuid"
    }
  }
}

28.3 Process Webhook Events

from fastapi import Request, HTTPException
import hmac
import hashlib

@app.post('/webhooks/gotoconnect/call-events')
async def handle_call_event_webhook(request: Request):
    """Handle GoToConnect call event webhooks."""
    
    # Get raw body for signature verification
    body = await request.body()
    
    # Verify signature (if provided)
    signature = request.headers.get('X-Signature')
    if signature and not verify_webhook_signature(body, signature):
        raise HTTPException(401, 'Invalid signature')
    
    # Parse event
    event = await request.json()
    
    event_type = event.get('type')
    event_data = event.get('data', {})
    
    # Route to appropriate handler
    if event_type == 'call.started':
        await handle_call_started(event_data)
    elif event_type == 'call.answered':
        await handle_call_answered(event_data)
    elif event_type == 'call.ended':
        await handle_call_ended(event_data)
    elif event_type == 'call.transferred':
        await handle_call_transferred(event_data)
    else:
        logger.warning(f"Unknown event type: {event_type}")
    
    return {'received': True}


async def handle_call_started(data: dict):
    """Handle new inbound call."""
    call_id = data['callId']
    line_id = data['lineId']
    caller_number = data['callerNumber']
    dialed_number = data['dialedNumber']
    
    # Look up tenant by dialed number
    phone_number = await db.get_phone_number_by_number(dialed_number)
    if not phone_number:
        logger.error(f"Unknown phone number: {dialed_number}")
        return
    
    # Create call record
    call = await db.create_call(
        tenant_id=phone_number.tenant_id,
        phone_number_id=phone_number.id,
        external_call_id=call_id,
        direction='inbound',
        from_number=caller_number,
        to_number=dialed_number,
        status='ringing',
        initiated_at=datetime.utcnow()
    )
    
    # Trigger AI agent setup
    await setup_ai_agent_for_call(call)


async def handle_call_ended(data: dict):
    """Handle call termination."""
    call_id = data['callId']
    duration = data.get('duration', 0)
    end_reason = data.get('endReason', 'unknown')
    
    # Update call record
    call = await db.get_call_by_external_id(call_id)
    if call:
        await db.update_call(
            call.id,
            status='completed',
            ended_at=datetime.utcnow(),
            duration_seconds=duration
        )
        
        # Trigger post-call processing
        await process_call_completion(call)

Section 29: Recording API

Access call recordings from GoToConnect.

29.1 Get Recording Content

Download the audio file for a recording.
**Endpoint:** `GET /recording/v1/recordings/{recordingId}/content`

async def get_recording_content(
    access_token: str,
    recording_id: str
) -> bytes:
    """Download recording audio content."""
    
    async with httpx.AsyncClient() as client:
        response = await client.get(
            f'https://api.goto.com/recording/v1/recordings/{recording_id}/content',
            headers={'Authorization': f'Bearer {access_token}'},
            follow_redirects=True
        )
        
        if response.status_code != 200:
            raise GoToConnectError(f"Failed to get recording: {response.text}")
        
        return response.content

29.2 Get Recording Token URL

Get a time-limited URL to access recording.
**Endpoint:** `GET /recording/v1/recordings/{recordingId}/content/token`

async def get_recording_url(
    access_token: str,
    recording_id: str
) -> str:
    """Get signed URL for recording access."""
    
    async with httpx.AsyncClient() as client:
        response = await client.get(
            f'https://api.goto.com/recording/v1/recordings/{recording_id}/content/token',
            headers={'Authorization': f'Bearer {access_token}'}
        )
        
        if response.status_code != 200:
            raise GoToConnectError(f"Failed to get recording URL: {response.text}")
        
        data = response.json()
        return data.get('url')

# Response:
# {
#     "url": "https://recordings.goto.com/download?token=abc123...",
#     "expiresAt": "2026-01-25T11:00:00Z"
# }

29.3 Subscribe to Recording Events

Get notified when recordings are available. Endpoint: POST /recording/v1/subscriptions
async def subscribe_to_recordings(
    access_token: str,
    channel_id: str
) -> dict:
    """Subscribe to recording availability events."""
    
    async with httpx.AsyncClient() as client:
        response = await client.post(
            'https://api.goto.com/recording/v1/subscriptions',
            json={'channelId': channel_id},
            headers={
                'Authorization': f'Bearer {access_token}',
                'Content-Type': 'application/json'
            }
        )
        
        if response.status_code != 201:
            raise GoToConnectError(f"Failed to subscribe: {response.text}")
        
        return response.json()

Recording Available Event

{
  "source": "recordings",
  "type": "recording.available",
  "timestamp": "2026-01-25T10:05:00.000Z",
  "data": {
    "recordingId": "rec-uuid-12345",
    "callId": "call-uuid-12345",
    "duration": 193,
    "format": "mp3",
    "size": 1234567
  }
}

Section 30: Complete Integration Flow

30.1 Initial Setup Flow

When an agency connects GoToConnect:
async def setup_gotoconnect_integration(agency_id: str):
    """Complete GoToConnect integration setup."""
    
    # 1. User completes OAuth flow (handled by callback)
    # access_token and account_key are now stored
    
    access_token = await get_valid_access_token(agency_id)
    account_key = await get_agency_account_key(agency_id)
    
    # 2. Create notification channel for webhooks
    channel = await create_notification_channel(
        access_token,
        webhook_url=f"{BASE_URL}/webhooks/gotoconnect/events"
    )
    
    await db.update_agency_gotoconnect_config(
        agency_id,
        channel_id=channel['channelId']
    )
    
    # 3. Subscribe to call events
    subscription = await subscribe_to_call_events(
        access_token,
        channel_id=channel['channelId']
    )
    
    await db.update_agency_gotoconnect_config(
        agency_id,
        call_events_subscription_id=subscription['subscriptionId']
    )
    
    # 4. Subscribe to recordings
    recording_sub = await subscribe_to_recordings(
        access_token,
        channel_id=channel['channelId']
    )
    
    # 5. Get available lines
    lines = await list_lines(access_token, account_key)
    
    await db.store_agency_lines(agency_id, lines)
    
    return {
        'status': 'connected',
        'lines_available': len(lines)
    }

30.2 Phone Number Provisioning Flow

When provisioning a new phone number for a tenant:
async def provision_phone_number(
    agency_id: str,
    tenant_id: str,
    area_code: str
) -> dict:
    """Provision a new phone number from GoToConnect."""
    
    access_token = await get_valid_access_token(agency_id)
    account_key = await get_agency_account_key(agency_id)
    
    # 1. Search for available numbers
    available = await search_available_numbers(
        access_token,
        account_key,
        area_code=area_code,
        limit=5
    )
    
    if not available:
        raise NoNumbersAvailableError(f"No numbers in area code {area_code}")
    
    # 2. Create order for first available
    order = await order_phone_number(
        access_token,
        account_key,
        area_code=area_code
    )
    
    # 3. Poll for order completion (or wait for webhook)
    for _ in range(30):  # Max 30 seconds
        status = await get_order_status(access_token, order['id'])
        
        if status['status'] == 'complete':
            phone_number = status['phoneNumber']
            break
        elif status['status'] == 'failed':
            raise PhoneNumberProvisioningError(status.get('error'))
        
        await asyncio.sleep(1)
    else:
        raise PhoneNumberProvisioningError("Order timed out")
    
    # 4. Get or create a dedicated line for this tenant
    line = await get_or_create_tenant_line(
        access_token,
        account_key,
        tenant_id
    )
    
    # 5. Assign phone number to line
    await assign_number_to_line(
        access_token,
        line['id'],
        phone_number
    )
    
    # 6. Store in our database
    phone_number_record = await db.create_phone_number(
        tenant_id=tenant_id,
        number=phone_number,
        provider='gotoconnect',
        provider_id=order['id'],
        provider_data={
            'line_id': line['id'],
            'account_key': account_key
        }
    )
    
    return phone_number_record

30.3 Call Handling Flow

Complete flow when an inbound call arrives:
class CallHandler:
    """Handles inbound call lifecycle."""
    
    def __init__(self, agency_id: str):
        self.agency_id = agency_id
        self.access_token = None
        self.session_id = None
        self.ws = None
    
    async def initialize(self):
        """Initialize WebRTC session."""
        self.access_token = await get_valid_access_token(self.agency_id)
        
        # Create WebRTC session
        session = await create_webrtc_session(
            self.access_token,
            device_name='AI Voice Agent'
        )
        
        self.session_id = session['sessionId']
        
        # Connect WebSocket
        self.ws = GoToConnectWebSocket(
            session['wsUrl'],
            self.access_token
        )
        
        # Register handlers
        self.ws.on('incoming_call', self.on_incoming_call)
        self.ws.on('call_state_changed', self.on_call_state_changed)
        self.ws.on('call_ended', self.on_call_ended)
        
        await self.ws.connect()
        
        # Start session refresh timer
        asyncio.create_task(self._refresh_session_periodically())
    
    async def on_incoming_call(self, event: dict):
        """Handle incoming call event."""
        call_id = event['callId']
        from_number = event['from']['phoneNumber']
        to_number = event['to']['phoneNumber']
        
        logger.info(f"Incoming call {call_id} from {from_number} to {to_number}")
        
        # Look up tenant
        phone_number = await db.get_phone_number_by_number(to_number)
        if not phone_number:
            logger.error(f"Unknown number: {to_number}")
            return
        
        # Create call record
        call = await db.create_call(
            tenant_id=phone_number.tenant_id,
            phone_number_id=phone_number.id,
            external_call_id=call_id,
            direction='inbound',
            from_number=from_number,
            to_number=to_number,
            status='ringing'
        )
        
        # Answer the call
        try:
            result = await answer_call(
                self.access_token,
                self.session_id,
                call_id
            )
            
            # Update call status
            await db.update_call(call.id, status='answered')
            
            # Start AI agent
            await self.start_ai_agent(call, result)
            
        except Exception as e:
            logger.error(f"Failed to answer call: {e}")
            await db.update_call(call.id, status='failed', error_message=str(e))
    
    async def start_ai_agent(self, call, webrtc_result: dict):
        """Initialize AI agent for the call."""
        
        # Get tenant configuration
        tenant = await db.get_tenant(call.tenant_id)
        
        # Create LiveKit room
        room_name = f"call-{tenant.slug}-{call.id}"
        room = await livekit.create_room(room_name)
        
        # Store room info
        await db.update_call(call.id, livekit_room_name=room_name)
        
        # The WebRTC SDP from GoToConnect provides the audio streams
        # We need to bridge these to LiveKit
        await bridge_webrtc_to_livekit(
            gotoconnect_sdp=webrtc_result.get('sdpAnswer'),
            livekit_room=room_name,
            call_id=call.id
        )
        
        # Dispatch AI agent to join room
        await dispatch_ai_agent(
            room_name=room_name,
            tenant_id=tenant.id,
            call_id=call.id
        )
    
    async def transfer_call(self, call_id: str, destination: str, transfer_type: str = 'blind'):
        """Transfer call to another number."""
        
        call = await db.get_call(call_id)
        gtc_call_id = call.external_call_id
        
        if transfer_type == 'blind':
            await blind_transfer(
                self.access_token,
                self.session_id,
                gtc_call_id,
                destination
            )
        else:
            await start_warm_transfer(
                self.access_token,
                self.session_id,
                gtc_call_id,
                destination
            )
        
        # Record transfer
        await db.create_call_transfer(
            call_id=call.id,
            tenant_id=call.tenant_id,
            transfer_type=transfer_type,
            destination_number=destination,
            status='pending'
        )
    
    async def end_call(self, call_id: str):
        """End/hangup a call."""
        
        call = await db.get_call(call_id)
        
        await hangup_call(
            self.access_token,
            self.session_id,
            call.external_call_id
        )
        
        await db.update_call(call.id, status='completed')
    
    async def _refresh_session_periodically(self):
        """Keep session alive."""
        while True:
            await asyncio.sleep(300)  # Every 5 minutes
            try:
                await refresh_session(self.access_token, self.session_id)
            except Exception as e:
                logger.error(f"Session refresh failed: {e}")
                # Attempt to recreate session
                await self.initialize()

30.4 Background Jobs

Channel Lifetime Renewal

async def renew_notification_channels():
    """Renew all agency notification channels before expiry."""
    
    agencies = await db.get_agencies_with_gotoconnect()
    
    for agency in agencies:
        try:
            access_token = await get_valid_access_token(agency.id)
            channel_id = agency.gotoconnect_config.channel_id
            
            await extend_channel_lifetime(access_token, channel_id)
            
            logger.info(f"Renewed channel for agency {agency.id}")
            
        except Exception as e:
            logger.error(f"Failed to renew channel for {agency.id}: {e}")
            # Alert operations team
            await send_alert(
                f"GoToConnect channel renewal failed for {agency.name}",
                str(e)
            )

# Run every 12 hours
scheduler.add_job(renew_notification_channels, 'interval', hours=12)

Token Refresh

async def refresh_expiring_tokens():
    """Proactively refresh tokens before expiry."""
    
    # Find tokens expiring in next 15 minutes
    expiring = await db.get_expiring_gotoconnect_credentials(
        expires_before=datetime.utcnow() + timedelta(minutes=15)
    )
    
    for cred in expiring:
        try:
            await get_valid_access_token(cred.agency_id)  # Will refresh if needed
            logger.info(f"Refreshed token for agency {cred.agency_id}")
        except GoToConnectReauthorizationRequired:
            logger.error(f"Agency {cred.agency_id} needs to reauthorize")
            await notify_agency_reauth_required(cred.agency_id)
        except Exception as e:
            logger.error(f"Token refresh failed for {cred.agency_id}: {e}")

# Run every 10 minutes
scheduler.add_job(refresh_expiring_tokens, 'interval', minutes=10)

30.5 Error Handling Reference

GoToConnect Error Codes

Error CodeHTTP StatusMeaningAction
AUTHN_INVALID_TOKEN401Token invalidRefresh or reauthorize
AUTHN_EXPIRED_TOKEN401Token expiredRefresh token
AUTHN_MALFORMED_TOKEN401Token malformedReauthorize
AUTHZ_INSUFFICIENT_SCOPE403Missing scopeRequest additional scopes
NOT_FOUND404Resource not foundCheck IDs
INVALID_ACCOUNT_KEY400Bad account keyVerify account key
INVALID_AREA_CODE400Invalid area codeUse valid area code
NO_PHONE_NUMBER_FOUND400No numbers availableTry different area code
TOO_MANY_REQUESTS429Rate limitedBack off and retry
UNKNOWN_ERROR500Server errorRetry with backoff

Error Handling Strategy

import asyncio
from typing import TypeVar, Callable

T = TypeVar('T')

async def with_retry(
    func: Callable[..., T],
    *args,
    max_retries: int = 3,
    backoff_factor: float = 1.5,
    **kwargs
) -> T:
    """Execute function with exponential backoff retry."""
    
    last_error = None
    
    for attempt in range(max_retries):
        try:
            return await func(*args, **kwargs)
        
        except GoToConnectRateLimitError:
            # Rate limited - back off
            wait_time = backoff_factor ** attempt
            logger.warning(f"Rate limited, waiting {wait_time}s")
            await asyncio.sleep(wait_time)
            last_error = e
        
        except GoToConnectAuthError as e:
            # Auth error - try refreshing token once
            if attempt == 0:
                try:
                    await refresh_access_token_for_request(args, kwargs)
                    continue
                except:
                    pass
            raise
        
        except GoToConnectServerError as e:
            # Server error - retry with backoff
            wait_time = backoff_factor ** attempt
            logger.warning(f"Server error, retrying in {wait_time}s")
            await asyncio.sleep(wait_time)
            last_error = e
        
        except GoToConnectError:
            # Other errors - don't retry
            raise
    
    raise last_error or GoToConnectError("Max retries exceeded")

End of Part 4

You now have:
  1. ✅ Complete OAuth 2.0 authentication implementation
  2. ✅ Voice Admin API for account and phone number management
  3. ✅ WebRTC Call Control API for programmatic call handling
  4. ✅ Notification Channel API for webhooks
  5. ✅ Call Events API for event subscriptions
  6. ✅ Recording API for call recording access
  7. ✅ Complete integration flows with code examples
Next: Part 5 - WebRTC Bridge Service Part 5 will cover:
  • Bridge architecture connecting GoToConnect WebRTC to LiveKit
  • Audio capture and forwarding
  • SDP negotiation handling
  • Bridge state management
  • Error recovery and reconnection

Document End - Part 4 of 10

Junior Developer PRD Part 5: WebRTC Bridge Service

Document Version: 1.0
Last Updated: January 25, 2026
Sections: 28-33
Estimated Reading Time: 45 minutes

How to Use This Document

This is Part 5 of a 10-part PRD series. Each part is designed to be read in order, building on concepts from previous parts. Prerequisites: Before reading this document, you should have completed:
  • Part 1: Foundation & Context (understanding of the overall system)
  • Part 2: Database Design (understanding of data models)
  • Part 3: API Design (understanding of REST endpoints)
  • Part 4: GoToConnect Integration (understanding of telephony layer)
What You’ll Learn:
  • What the WebRTC Bridge does and why it exists
  • How audio flows between phone callers and AI agents
  • How to implement bidirectional audio streaming
  • How to manage WebRTC connections with aiortc
  • How to integrate with LiveKit for real-time communication
  • How to handle connection lifecycle and error recovery

Table of Contents


Section 28: Bridge Architecture

28.1 What is the WebRTC Bridge?

The WebRTC Bridge is the most critical component in our voice infrastructure. It’s the “glue” that connects telephone callers to our AI processing pipeline. Without it, there’s no way to get audio from a phone call into our AI system.

The Problem It Solves

When someone calls our platform:
  1. Their phone call arrives via PSTN (public phone network)
  2. GoToConnect receives the call and converts it to WebRTC audio
  3. But how do we get that audio to our AI agent?
That’s what the bridge does. It:
  • Establishes a WebRTC connection with GoToConnect to receive caller audio
  • Establishes a separate connection with LiveKit to send audio to AI agents
  • Routes audio bidirectionally between these two connections
  • Handles codec conversion, resampling, and buffering

Why Can’t We Just Connect GoToConnect Directly to LiveKit?

Good question! In theory, both use WebRTC. But there are several problems:
  1. Different Authentication: GoToConnect uses OAuth + proprietary signaling; LiveKit uses JWT tokens
  2. Different Signaling: GoToConnect controls SDP exchange through their API; LiveKit uses their own protocol
  3. Codec Negotiation: GoToConnect may offer different codecs than LiveKit expects
  4. Participant Management: LiveKit needs to track participants in rooms; GoToConnect doesn’t know about rooms
  5. Processing Opportunity: We need to capture audio for our AI pipeline anyway
The bridge acts as an intelligent intermediary that speaks both “languages.”

28.2 High-Level Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                           WEBRTC BRIDGE                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│                               PSTN NETWORK                                  │
│                                    │                                        │
│                                    ▼                                        │
│   ┌──────────────────────────────────────────────────────────────────────┐  │
│   │                         GoToConnect                                   │  │
│   │                                                                       │  │
│   │   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐             │  │
│   │   │  SIP/PSTN   │───▶│   Media     │───▶│   WebRTC    │             │  │
│   │   │  Gateway    │    │   Server    │    │   Endpoint  │             │  │
│   │   └─────────────┘    └─────────────┘    └──────┬──────┘             │  │
│   │                                                │                     │  │
│   └────────────────────────────────────────────────┼─────────────────────┘  │
│                                                    │                        │
│                                        WebRTC (DTLS-SRTP)                  │
│                                                    │                        │
│   ┌────────────────────────────────────────────────┼─────────────────────┐  │
│   │                      BRIDGE CORE               │                     │  │
│   │                                                │                     │  │
│   │   ┌────────────────────────────────────────────▼────────────────┐   │  │
│   │   │                  GoTo Connection Handler                     │   │  │
│   │   │                                                              │   │  │
│   │   │  ┌───────────┐  ┌───────────┐  ┌───────────┐               │   │  │
│   │   │  │   SDP     │  │   ICE     │  │  Audio    │               │   │  │
│   │   │  │ Negotiator│  │  Agent    │  │  Track    │               │   │  │
│   │   │  └───────────┘  └───────────┘  └─────┬─────┘               │   │  │
│   │   │                                      │                      │   │  │
│   │   └──────────────────────────────────────┼──────────────────────┘   │  │
│   │                                          │                          │  │
│   │   ┌──────────────────────────────────────▼──────────────────────┐   │  │
│   │   │                    AUDIO BRIDGE                              │   │  │
│   │   │                                                              │   │  │
│   │   │  ┌───────────┐  ┌───────────┐  ┌───────────┐               │   │  │
│   │   │  │  Decoder  │  │ Resampler │  │  Encoder  │               │   │  │
│   │   │  │(Opus/G711)│  │ (48k↔16k) │  │  (Opus)   │               │   │  │
│   │   │  └───────────┘  └───────────┘  └───────────┘               │   │  │
│   │   │                                                              │   │  │
│   │   │  ┌──────────────────────────────────────────────────────┐   │   │  │
│   │   │  │             Bidirectional Buffer                      │   │   │  │
│   │   │  │        GoTo → Pipeline    Pipeline → GoTo             │   │   │  │
│   │   │  └──────────────────────────────────────────────────────┘   │   │  │
│   │   │                                                              │   │  │
│   │   └──────────────────────────────────────┬──────────────────────┘   │  │
│   │                                          │                          │  │
│   │   ┌──────────────────────────────────────▼──────────────────────┐   │  │
│   │   │                 LiveKit Connection Handler                   │   │  │
│   │   │                                                              │   │  │
│   │   │  ┌───────────┐  ┌───────────┐  ┌───────────┐               │   │  │
│   │   │  │   Room    │  │  Audio    │  │  Audio    │               │   │  │
│   │   │  │  Client   │  │  Source   │  │   Sink    │               │   │  │
│   │   │  └───────────┘  └───────────┘  └───────────┘               │   │  │
│   │   │                                                              │   │  │
│   │   └──────────────────────────────────────────────────────────────┘   │  │
│   │                                                                      │  │
│   └──────────────────────────────────────────────────────────────────────┘  │
│                                                    │                        │
│                                        WebRTC (DTLS-SRTP)                  │
│                                                    │                        │
│   ┌────────────────────────────────────────────────┼─────────────────────┐  │
│   │                      LiveKit Cloud             │                     │  │
│   │                                                │                     │  │
│   │   ┌────────────────────────────────────────────▼────────────────┐   │  │
│   │   │                Room: call_{tenant}_{call_id}                │   │  │
│   │   │                                                              │   │  │
│   │   │   ┌─────────────────┐      ┌─────────────────┐             │   │  │
│   │   │   │  bridge_{id}    │      │  agent_{id}     │             │   │  │
│   │   │   │  (caller audio) │◀────▶│  (AI audio)     │             │   │  │
│   │   │   └─────────────────┘      └─────────────────┘             │   │  │
│   │   │                                                              │   │  │
│   │   └──────────────────────────────────────────────────────────────┘   │  │
│   │                                                                      │  │
│   └──────────────────────────────────────────────────────────────────────┘  │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

28.3 Component Responsibilities

GoTo Connection Handler

  • Receives SDP offers from GoToConnect
  • Negotiates audio codecs (prefers Opus, falls back to G.711)
  • Manages ICE candidate exchange
  • Receives caller audio from GoToConnect WebRTC
  • Sends AI response audio back to GoToConnect

Audio Bridge

  • Decodes incoming audio (Opus or G.711 to PCM)
  • Resamples audio between different sample rates (8kHz, 16kHz, 48kHz)
  • Buffers audio to handle timing variations
  • Encodes outgoing audio (PCM to Opus)

LiveKit Connection Handler

  • Creates and joins LiveKit rooms
  • Publishes caller audio as a track
  • Subscribes to AI agent audio tracks
  • Manages participant lifecycle

28.4 Design Goals

GoalTargetWhy It Matters
Audio latency&lt; 50ms bridge overheadUsers notice delays &gt; 150ms total
Connection setup&lt; 2 secondsCallers expect fast answers
Audio qualityNo degradationPoor quality = poor user experience
Reliability99.9% call completionDropped calls lose customers
Scalability1000 concurrent callsSupport growth
Resource efficiency&lt; 50MB RAM per callKeep infrastructure costs low

28.5 Technology Choice: aiortc

We use aiortc (Python asyncio WebRTC implementation) for the bridge. Here’s why:

What is aiortc?

aiortc is a Python library that implements the WebRTC specification. It provides:
  • Full WebRTC stack in pure Python
  • asyncio-based for non-blocking I/O
  • Built-in codecs (Opus, G.711, VP8, H.264)
  • Support for audio/video/data channels

Why Not Browser-Based?

An alternative approach would be to run a headless browser (Playwright/Puppeteer) and use the browser’s WebRTC. We don’t do this because:
  1. Resource Usage: Each browser instance uses 200-500MB RAM; aiortc uses ~50MB
  2. Startup Time: Browsers take 2-5 seconds to start; aiortc is instant
  3. Direct Control: With aiortc, we have direct access to audio frames; browsers add abstraction
  4. Simpler Deployment: No need to install Chrome/Chromium in containers
  5. Better Debugging: Python code is easier to debug than browser internals

When Browser Automation IS Used

We do use Playwright in Part 4 for the Ooma softphone login automation. That’s a different use case - we need to authenticate to GoToConnect’s web interface. Once authenticated, we hand off to aiortc for the actual WebRTC connection.

28.6 Threading Model

The bridge uses multiple threads/tasks for performance:
┌─────────────────────────────────────────────────────────────────────────────┐
│                         THREADING MODEL                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                      ASYNCIO EVENT LOOP                             │   │
│   │                                                                     │   │
│   │   ┌─────────────┐  ┌─────────────┐  ┌─────────────┐               │   │
│   │   │  WebSocket  │  │  HTTP       │  │  Event      │               │   │
│   │   │  Handler    │  │  Server     │  │  Publisher  │               │   │
│   │   └─────────────┘  └─────────────┘  └─────────────┘               │   │
│   │                                                                     │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                      AIORTC MEDIA THREAD                            │   │
│   │                                                                     │   │
│   │   ┌─────────────┐  ┌─────────────┐  ┌─────────────┐               │   │
│   │   │    RTP      │  │   Codec     │  │   RTCP      │               │   │
│   │   │  Processing │  │  Encode/    │  │  Processing │               │   │
│   │   │             │  │  Decode     │  │             │               │   │
│   │   └─────────────┘  └─────────────┘  └─────────────┘               │   │
│   │                                                                     │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                      AUDIO PROCESSING THREAD                        │   │
│   │                                                                     │   │
│   │   ┌─────────────┐  ┌─────────────┐  ┌─────────────┐               │   │
│   │   │  Resampling │  │   Buffer    │  │   Format    │               │   │
│   │   │             │  │  Management │  │  Conversion │               │   │
│   │   └─────────────┘  └─────────────┘  └─────────────┘               │   │
│   │                                                                     │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│   COMMUNICATION:                                                            │
│   • asyncio.Queue for cross-thread audio transfer                          │
│   • Thread-safe buffers for frame handoff                                  │
│   • Events for synchronization                                             │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Why Multiple Threads?

  1. Asyncio Event Loop: Handles all I/O operations (network, API calls)
  2. Media Thread: aiortc runs RTP/RTCP processing in a dedicated thread for timing accuracy
  3. Audio Processing Thread: Heavy operations like resampling don’t block the event loop

28.7 State Machine

The bridge follows a strict state machine to ensure consistent behavior:
┌─────────────────────────────────────────────────────────────────────────────┐
│                      BRIDGE STATE MACHINE                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│                           ┌──────────────┐                                  │
│                           │   CREATED    │                                  │
│                           └───────┬──────┘                                  │
│                                   │                                         │
│                           initialize()                                      │
│                                   │                                         │
│                                   ▼                                         │
│                        ┌──────────────────┐                                 │
│                        │  INITIALIZING    │                                 │
│                        └────────┬─────────┘                                 │
│                                 │                                           │
│                     receive SDP offer                                       │
│                                 │                                           │
│                                 ▼                                           │
│                        ┌──────────────────┐                                 │
│                        │   NEGOTIATING    │                                 │
│                        └────────┬─────────┘                                 │
│                                 │                                           │
│                    ICE + DTLS complete                                      │
│                                 │                                           │
│                                 ▼                                           │
│                        ┌──────────────────┐                                 │
│                        │   CONNECTING     │◀──────────┐                    │
│                        └────────┬─────────┘           │                    │
│                                 │              ICE restart                  │
│                       connected │                     │                    │
│                                 │                     │                    │
│                                 ▼                     │                    │
│                        ┌──────────────────┐           │                    │
│                        │    CONNECTED     │           │                    │
│                        └────────┬─────────┘           │                    │
│                                 │                     │                    │
│                         start()│                      │                    │
│                                 │                     │                    │
│                                 ▼                     │                    │
│                        ┌──────────────────┐           │                    │
│                        │     ACTIVE       │───────────┘                    │
│                        └────────┬─────────┘     ICE disconnected           │
│                                 │                                           │
│              hangup / timeout / error                                       │
│                                 │                                           │
│                                 ▼                                           │
│                       ┌───────────────────┐                                 │
│                       │  DISCONNECTING    │                                 │
│                       └─────────┬─────────┘                                 │
│                                 │                                           │
│                        cleanup complete                                     │
│                                 │                                           │
│                                 ▼                                           │
│                       ┌───────────────────┐                                 │
│                       │   TERMINATED      │                                 │
│                       └───────────────────┘                                 │
│                                                                             │
│   ERROR TRANSITIONS:                                                        │
│   • Any state can transition to FAILED on unrecoverable error              │
│   • Timeout in INITIALIZING/NEGOTIATING/CONNECTING → FAILED                │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

28.8 Environment Variables

The bridge requires these environment variables:
# WebRTC Configuration
STUN_SERVERS="stun:stun.l.google.com:19302,stun:stun1.l.google.com:19302"
TURN_SERVER_URL="turn:turn.example.com:3478"
TURN_SERVER_USERNAME="turnuser"
TURN_SERVER_PASSWORD="turnpassword"

# LiveKit Configuration  
LIVEKIT_URL="wss://aiconnected.livekit.cloud"
LIVEKIT_API_KEY="your-api-key"
LIVEKIT_API_SECRET="your-api-secret"

# Bridge Settings
BRIDGE_MAX_CONCURRENT_CALLS=1000
BRIDGE_AUDIO_BUFFER_MS=100
BRIDGE_CONNECTION_TIMEOUT_MS=30000
BRIDGE_HEALTH_CHECK_INTERVAL_MS=10000

# Logging
LOG_LEVEL="INFO"
LOG_AUDIO_FRAMES="false"  # Enable for debugging

Section 29: aiortc WebRTC Implementation

29.1 What is WebRTC?

Before diving into code, let’s understand WebRTC (Web Real-Time Communication):

Core Concepts

Peer Connection: A connection between two endpoints that can carry audio/video/data. SDP (Session Description Protocol): A text format describing what media capabilities each peer has. Example:
v=0
o=- 12345 1 IN IP4 0.0.0.0
s=-
t=0 0
m=audio 9 UDP/TLS/RTP/SAVPF 111 0
a=rtpmap:111 opus/48000/2
a=rtpmap:0 PCMU/8000
ICE (Interactive Connectivity Establishment): The process of finding a network path between two peers, handling NAT traversal. STUN Server: Tells a client its public IP address (for direct connections). TURN Server: Relays media when direct connection is impossible (firewall blocking).

Offer/Answer Model

WebRTC uses an “offer/answer” model:
  1. Offerer creates an SDP offer listing their capabilities
  2. Answerer receives offer, creates answer with compatible settings
  3. Both exchange ICE candidates (network paths)
  4. Connection established when media flows
┌───────────────┐                    ┌───────────────┐
│   Offerer     │                    │   Answerer    │
│  (GoToConnect)│                    │   (Bridge)    │
└───────┬───────┘                    └───────┬───────┘
        │                                    │
        │  1. createOffer()                  │
        │────────────────────────────────────▶
        │        SDP Offer                   │
        │                                    │
        │                    2. setRemoteDescription(offer)
        │                    3. createAnswer()
        │                                    │
        │◀────────────────────────────────────
        │        SDP Answer                  │
        │                                    │
        │  4. setRemoteDescription(answer)   │
        │                                    │
        │  5. ICE candidates (trickle)       │
        │◀───────────────────────────────────▶
        │                                    │
        │  6. DTLS handshake                 │
        │◀═══════════════════════════════════▶
        │                                    │
        │  7. SRTP media                     │
        │◀═══════════════════════════════════▶
        │                                    │

29.2 aiortc Basics

Installation

pip install aiortc

Core Classes

from aiortc import (
    RTCPeerConnection,      # Main WebRTC connection
    RTCSessionDescription,  # SDP offer/answer
    RTCIceCandidate,        # ICE candidate
    RTCConfiguration,       # STUN/TURN config
    RTCIceServer,           # Individual ICE server
    MediaStreamTrack,       # Base class for audio/video
)
from aiortc.mediastreams import AudioStreamTrack
import av  # PyAV for audio frames

29.3 WebRTC Connection Implementation

Here’s our WebRTC connection wrapper:
# bridge/webrtc/connection.py

import asyncio
from aiortc import (
    RTCPeerConnection,
    RTCConfiguration,
    RTCIceServer,
    RTCSessionDescription,
    RTCIceCandidate,
    MediaStreamTrack,
)
from dataclasses import dataclass
from typing import Optional, Callable, List
import logging

logger = logging.getLogger(__name__)


@dataclass
class WebRTCConfig:
    """Configuration for WebRTC connection."""
    
    # STUN servers for NAT traversal
    stun_servers: List[str] = None
    
    # TURN servers for relay (when direct fails)
    turn_servers: List[dict] = None
    
    # ICE transport policy: "all" or "relay"
    ice_transport_policy: str = "all"
    
    # Bundle policy for media
    bundle_policy: str = "max-bundle"
    
    def __post_init__(self):
        """Set default STUN servers if none provided."""
        if self.stun_servers is None:
            self.stun_servers = [
                "stun:stun.l.google.com:19302",
                "stun:stun1.l.google.com:19302",
            ]


class WebRTCConnection:
    """
    Wrapper around aiortc RTCPeerConnection.
    
    Provides a simplified interface for managing
    WebRTC connections with proper lifecycle handling.
    
    Example usage:
        config = WebRTCConfig()
        conn = WebRTCConnection(config, call_id="call-123")
        
        await conn.initialize()
        await conn.set_remote_description(sdp_offer, "offer")
        answer = await conn.create_answer()
        
        # Connection events are handled via callbacks
        conn.on_track = handle_remote_track
        conn.on_connection_state_change = handle_state_change
    """
    
    def __init__(
        self,
        config: WebRTCConfig = None,
        call_id: str = None,
    ):
        self.config = config or WebRTCConfig()
        self.call_id = call_id or "unknown"
        
        # The underlying RTCPeerConnection
        self._pc: Optional[RTCPeerConnection] = None
        
        # Track management
        self._local_tracks: List[MediaStreamTrack] = []
        self._remote_tracks: List[MediaStreamTrack] = []
        
        # Event callbacks (set by user)
        self.on_track: Optional[Callable] = None
        self.on_ice_candidate: Optional[Callable] = None
        self.on_connection_state_change: Optional[Callable] = None
        self.on_ice_connection_state_change: Optional[Callable] = None
    
    async def initialize(self) -> None:
        """
        Initialize the peer connection.
        
        Must be called before any other operations.
        """
        # Build ICE server configuration
        ice_servers = self._build_ice_servers()
        
        # Create RTCConfiguration
        rtc_config = RTCConfiguration(
            iceServers=ice_servers,
            iceTransportPolicy=self.config.ice_transport_policy,
            bundlePolicy=self.config.bundle_policy,
        )
        
        # Create peer connection
        self._pc = RTCPeerConnection(configuration=rtc_config)
        
        # Set up event handlers
        self._pc.on("track", self._handle_track)
        self._pc.on("icecandidate", self._handle_ice_candidate)
        self._pc.on("connectionstatechange", self._handle_connection_state_change)
        self._pc.on("iceconnectionstatechange", self._handle_ice_connection_state_change)
        
        logger.info(f"[{self.call_id}] WebRTC connection initialized")
    
    def _build_ice_servers(self) -> List[RTCIceServer]:
        """Build ICE server configuration from config."""
        servers = []
        
        # Add STUN servers
        for url in self.config.stun_servers:
            servers.append(RTCIceServer(urls=[url]))
        
        # Add TURN servers (if configured)
        if self.config.turn_servers:
            for turn in self.config.turn_servers:
                servers.append(RTCIceServer(
                    urls=[turn["url"]],
                    username=turn.get("username"),
                    credential=turn.get("credential"),
                ))
        
        return servers
    
    async def add_track(self, track: MediaStreamTrack) -> None:
        """
        Add a local track to the connection.
        
        Tracks must be added before creating an offer
        or after receiving a remote offer.
        
        Args:
            track: Audio or video track to add
        """
        if self._pc is None:
            raise RuntimeError("Connection not initialized")
        
        self._pc.addTrack(track)
        self._local_tracks.append(track)
        logger.debug(f"[{self.call_id}] Added track: {track.kind}")
    
    async def create_offer(self) -> str:
        """
        Create an SDP offer.
        
        Use when initiating a connection (outbound calls).
        
        Returns:
            SDP offer string
        """
        if self._pc is None:
            raise RuntimeError("Connection not initialized")
        
        offer = await self._pc.createOffer()
        await self._pc.setLocalDescription(offer)
        
        logger.debug(f"[{self.call_id}] Created offer")
        return self._pc.localDescription.sdp
    
    async def create_answer(self) -> str:
        """
        Create an SDP answer.
        
        Use after receiving a remote offer (inbound calls).
        
        Returns:
            SDP answer string
        """
        if self._pc is None:
            raise RuntimeError("Connection not initialized")
        
        answer = await self._pc.createAnswer()
        await self._pc.setLocalDescription(answer)
        
        logger.debug(f"[{self.call_id}] Created answer")
        return self._pc.localDescription.sdp
    
    async def set_remote_description(
        self,
        sdp: str,
        sdp_type: str,
    ) -> None:
        """
        Set the remote SDP description.
        
        Args:
            sdp: Raw SDP string
            sdp_type: "offer" or "answer"
        """
        if self._pc is None:
            raise RuntimeError("Connection not initialized")
        
        description = RTCSessionDescription(sdp=sdp, type=sdp_type)
        await self._pc.setRemoteDescription(description)
        
        logger.debug(f"[{self.call_id}] Set remote description: {sdp_type}")
    
    async def add_ice_candidate(
        self,
        candidate: str,
        sdp_mid: str,
        sdp_mline_index: int,
    ) -> None:
        """
        Add a remote ICE candidate.
        
        Called when receiving ICE candidates from remote peer.
        
        Args:
            candidate: ICE candidate string
            sdp_mid: Media stream ID
            sdp_mline_index: Media line index
        """
        if self._pc is None:
            raise RuntimeError("Connection not initialized")
        
        ice_candidate = RTCIceCandidate(
            candidate=candidate,
            sdpMid=sdp_mid,
            sdpMLineIndex=sdp_mline_index,
        )
        
        await self._pc.addIceCandidate(ice_candidate)
        logger.debug(f"[{self.call_id}] Added ICE candidate")
    
    async def close(self) -> None:
        """
        Close the connection and cleanup resources.
        
        Always call this when done with the connection.
        """
        if self._pc:
            await self._pc.close()
            self._pc = None
        
        # Stop all local tracks
        for track in self._local_tracks:
            track.stop()
        
        self._local_tracks.clear()
        self._remote_tracks.clear()
        
        logger.info(f"[{self.call_id}] WebRTC connection closed")
    
    # Event handlers
    
    def _handle_track(self, track: MediaStreamTrack) -> None:
        """Handle incoming track from remote peer."""
        logger.info(f"[{self.call_id}] Received track: {track.kind}")
        self._remote_tracks.append(track)
        
        if self.on_track:
            asyncio.create_task(self.on_track(track))
    
    def _handle_ice_candidate(self, candidate: RTCIceCandidate) -> None:
        """Handle locally gathered ICE candidate."""
        if candidate and self.on_ice_candidate:
            asyncio.create_task(self.on_ice_candidate(candidate))
    
    def _handle_connection_state_change(self) -> None:
        """Handle connection state change."""
        state = self._pc.connectionState if self._pc else "closed"
        logger.info(f"[{self.call_id}] Connection state: {state}")
        
        if self.on_connection_state_change:
            asyncio.create_task(self.on_connection_state_change(state))
    
    def _handle_ice_connection_state_change(self) -> None:
        """Handle ICE connection state change."""
        state = self._pc.iceConnectionState if self._pc else "closed"
        logger.info(f"[{self.call_id}] ICE state: {state}")
        
        if self.on_ice_connection_state_change:
            asyncio.create_task(self.on_ice_connection_state_change(state))
    
    # Properties
    
    @property
    def connection_state(self) -> str:
        """Current connection state."""
        return self._pc.connectionState if self._pc else "closed"
    
    @property
    def ice_connection_state(self) -> str:
        """Current ICE connection state."""
        return self._pc.iceConnectionState if self._pc else "closed"
    
    @property
    def signaling_state(self) -> str:
        """Current signaling state."""
        return self._pc.signalingState if self._pc else "closed"
    
    @property
    def remote_tracks(self) -> List[MediaStreamTrack]:
        """List of remote tracks."""
        return self._remote_tracks.copy()

29.4 SDP Negotiation

SDP parsing and manipulation is complex. Here’s our negotiator:
# bridge/webrtc/sdp_negotiator.py

import re
import time
from dataclasses import dataclass, field
from typing import List, Optional
from enum import Enum


class SDPType(Enum):
    """SDP message types."""
    OFFER = "offer"
    ANSWER = "answer"
    PRANSWER = "pranswer"


@dataclass
class CodecInfo:
    """Information about a codec in SDP."""
    payload_type: int   # e.g., 111 for Opus
    name: str           # e.g., "opus"
    clock_rate: int     # e.g., 48000
    channels: int = 1   # e.g., 2 for stereo
    fmtp: Optional[str] = None  # Format parameters


@dataclass
class MediaDescription:
    """Parsed media section from SDP."""
    media_type: str      # "audio" or "video"
    port: int            # Port number (usually 9)
    protocol: str        # e.g., "UDP/TLS/RTP/SAVPF"
    formats: List[int]   # Payload type numbers
    codecs: List[CodecInfo] = field(default_factory=list)
    direction: str = "sendrecv"  # sendrecv, sendonly, recvonly, inactive
    ice_ufrag: Optional[str] = None
    ice_pwd: Optional[str] = None
    fingerprint: Optional[str] = None
    setup: Optional[str] = None  # actpass, active, passive
    mid: Optional[str] = None
    candidates: List[str] = field(default_factory=list)


@dataclass
class ParsedSDP:
    """Fully parsed SDP."""
    version: int
    origin: str
    session_name: str
    timing: str
    media: List[MediaDescription] = field(default_factory=list)
    ice_ufrag: Optional[str] = None
    ice_pwd: Optional[str] = None
    fingerprint: Optional[str] = None
    groups: List[str] = field(default_factory=list)


class SDPNegotiator:
    """
    Handles SDP parsing and codec negotiation.
    
    Manages codec selection between GoToConnect and our bridge,
    ensuring we use the best available codec.
    
    Preferred codecs (in order):
    1. Opus (48kHz, stereo) - Best quality
    2. PCMU (G.711 μ-law, 8kHz) - Telephony standard
    3. PCMA (G.711 A-law, 8kHz) - European telephony
    """
    
    # Preferred codecs in priority order
    PREFERRED_AUDIO_CODECS = [
        ("opus", 48000, 2),    # Opus stereo
        ("PCMU", 8000, 1),     # G.711 μ-law
        ("PCMA", 8000, 1),     # G.711 A-law
    ]
    
    def parse_sdp(self, sdp: str) -> ParsedSDP:
        """
        Parse an SDP string into structured data.
        
        Args:
            sdp: Raw SDP string
        
        Returns:
            ParsedSDP with all sections parsed
        
        Example:
            >>> negotiator = SDPNegotiator()
            >>> parsed = negotiator.parse_sdp(raw_sdp)
            >>> print(parsed.media[0].codecs)
        """
        # Handle both \r\n and \n line endings
        lines = sdp.strip().split('\r\n')
        if len(lines) == 1:
            lines = sdp.strip().split('\n')
        
        parsed = ParsedSDP(
            version=0,
            origin="",
            session_name="",
            timing="",
        )
        
        current_media: Optional[MediaDescription] = None
        
        for line in lines:
            if not line or '=' not in line:
                continue
            
            key, value = line[0], line[2:]
            
            # Session-level attributes
            if key == 'v':
                parsed.version = int(value)
            elif key == 'o':
                parsed.origin = value
            elif key == 's':
                parsed.session_name = value
            elif key == 't':
                parsed.timing = value
            elif key == 'm':
                # New media section
                if current_media:
                    parsed.media.append(current_media)
                current_media = self._parse_media_line(value)
            elif key == 'a' and current_media:
                # Media-level attribute
                self._parse_media_attribute(current_media, value)
            elif key == 'a':
                # Session-level attribute
                self._parse_session_attribute(parsed, value)
        
        # Don't forget the last media section
        if current_media:
            parsed.media.append(current_media)
        
        return parsed
    
    def _parse_media_line(self, value: str) -> MediaDescription:
        """Parse m= line (e.g., 'm=audio 9 UDP/TLS/RTP/SAVPF 111 0 8')."""
        parts = value.split()
        return MediaDescription(
            media_type=parts[0],
            port=int(parts[1]),
            protocol=parts[2],
            formats=[int(f) for f in parts[3:]],
        )
    
    def _parse_media_attribute(
        self,
        media: MediaDescription,
        value: str,
    ) -> None:
        """Parse media-level attribute."""
        if value.startswith("rtpmap:"):
            codec = self._parse_rtpmap(value[7:])
            if codec:
                media.codecs.append(codec)
        elif value.startswith("fmtp:"):
            self._attach_fmtp(media, value[5:])
        elif value.startswith("ice-ufrag:"):
            media.ice_ufrag = value[10:]
        elif value.startswith("ice-pwd:"):
            media.ice_pwd = value[8:]
        elif value.startswith("fingerprint:"):
            media.fingerprint = value[12:]
        elif value.startswith("setup:"):
            media.setup = value[6:]
        elif value.startswith("mid:"):
            media.mid = value[4:]
        elif value.startswith("candidate:"):
            media.candidates.append(value)
        elif value in ("sendrecv", "sendonly", "recvonly", "inactive"):
            media.direction = value
    
    def _parse_session_attribute(
        self,
        parsed: ParsedSDP,
        value: str,
    ) -> None:
        """Parse session-level attribute."""
        if value.startswith("ice-ufrag:"):
            parsed.ice_ufrag = value[10:]
        elif value.startswith("ice-pwd:"):
            parsed.ice_pwd = value[8:]
        elif value.startswith("fingerprint:"):
            parsed.fingerprint = value[12:]
        elif value.startswith("group:"):
            parsed.groups.append(value[6:])
    
    def _parse_rtpmap(self, value: str) -> Optional[CodecInfo]:
        """Parse rtpmap attribute (e.g., '111 opus/48000/2')."""
        match = re.match(r'(\d+)\s+(\w+)/(\d+)(?:/(\d+))?', value)
        if match:
            return CodecInfo(
                payload_type=int(match.group(1)),
                name=match.group(2),
                clock_rate=int(match.group(3)),
                channels=int(match.group(4)) if match.group(4) else 1,
            )
        return None
    
    def _attach_fmtp(
        self,
        media: MediaDescription,
        value: str,
    ) -> None:
        """Attach fmtp parameters to matching codec."""
        parts = value.split(' ', 1)
        if len(parts) == 2:
            payload_type = int(parts[0])
            for codec in media.codecs:
                if codec.payload_type == payload_type:
                    codec.fmtp = parts[1]
                    break
    
    def negotiate_codecs(
        self,
        offered: List[CodecInfo],
    ) -> List[CodecInfo]:
        """
        Negotiate codecs from an offer.
        
        Returns codecs in our preferred order that are
        also supported by the remote peer.
        
        Args:
            offered: List of codecs from remote SDP
        
        Returns:
            List of mutually supported codecs
        """
        negotiated = []
        
        for pref_name, pref_rate, pref_channels in self.PREFERRED_AUDIO_CODECS:
            for offered_codec in offered:
                if (offered_codec.name.lower() == pref_name.lower() and
                    offered_codec.clock_rate == pref_rate):
                    negotiated.append(offered_codec)
                    break
        
        return negotiated
    
    def get_best_codec(self, offered: List[CodecInfo]) -> Optional[CodecInfo]:
        """
        Get the single best codec from an offer.
        
        Args:
            offered: List of codecs from remote SDP
        
        Returns:
            Best codec or None if no compatible codec found
        """
        negotiated = self.negotiate_codecs(offered)
        return negotiated[0] if negotiated else None

29.5 ICE Candidate Handling

ICE candidates are exchanged asynchronously (trickle ICE):
# bridge/webrtc/ice_manager.py

import asyncio
from dataclasses import dataclass, field
from typing import List, Dict, Optional, Callable
from enum import Enum
import logging

logger = logging.getLogger(__name__)


class ICEConnectionState(Enum):
    """ICE connection states."""
    NEW = "new"
    CHECKING = "checking"
    CONNECTED = "connected"
    COMPLETED = "completed"
    DISCONNECTED = "disconnected"
    FAILED = "failed"
    CLOSED = "closed"


@dataclass
class ICECandidate:
    """Parsed ICE candidate."""
    foundation: str
    component: int
    protocol: str
    priority: int
    ip: str
    port: int
    type: str  # host, srflx, relay
    related_address: Optional[str] = None
    related_port: Optional[int] = None
    
    @classmethod
    def parse(cls, candidate_str: str) -> Optional["ICECandidate"]:
        """
        Parse ICE candidate string.
        
        Example candidate:
        "candidate:0 1 UDP 2130706431 192.168.1.100 54321 typ host"
        """
        try:
            parts = candidate_str.split()
            if not parts[0].startswith("candidate:"):
                return None
            
            return cls(
                foundation=parts[0].split(":")[1],
                component=int(parts[1]),
                protocol=parts[2],
                priority=int(parts[3]),
                ip=parts[4],
                port=int(parts[5]),
                type=parts[7],
            )
        except (IndexError, ValueError):
            return None


class ICEManager:
    """
    Manages ICE candidate exchange.
    
    Handles:
    - Gathering local candidates
    - Processing remote candidates
    - Tracking connectivity state
    - Candidate trickling to GoToConnect
    """
    
    def __init__(
        self,
        call_id: str,
        goto_client: "GoToCallControlClient",
    ):
        self.call_id = call_id
        self.goto_client = goto_client
        
        # Candidate tracking
        self._local_candidates: List[ICECandidate] = []
        self._remote_candidates: List[ICECandidate] = []
        self._pending_remote: asyncio.Queue = asyncio.Queue()
        
        # State
        self._connection_state = ICEConnectionState.NEW
        self._gathering_complete = False
        
        # Callbacks
        self.on_connection_state_change: Optional[Callable] = None
    
    async def handle_local_candidate(
        self,
        candidate: "RTCIceCandidate",
    ) -> None:
        """
        Handle a locally gathered ICE candidate.
        
        Sends the candidate to GoToConnect via their API.
        
        Args:
            candidate: aiortc RTCIceCandidate
        """
        if candidate is None:
            # Gathering complete
            self._gathering_complete = True
            logger.info(f"[{self.call_id}] ICE gathering complete")
            return
        
        # Parse for logging
        parsed = ICECandidate.parse(candidate.candidate)
        if parsed:
            self._local_candidates.append(parsed)
            logger.debug(
                f"[{self.call_id}] Local candidate: "
                f"{parsed.type} {parsed.ip}:{parsed.port}"
            )
        
        # Send to GoToConnect
        try:
            await self.goto_client.send_ice_candidate(
                call_id=self.call_id,
                candidate=candidate.candidate,
                sdp_mid=candidate.sdpMid,
                sdp_mline_index=candidate.sdpMLineIndex,
            )
        except Exception as e:
            logger.error(f"[{self.call_id}] Failed to send ICE candidate: {e}")
    
    async def handle_remote_candidate(
        self,
        candidate_data: dict,
        peer_connection: "RTCPeerConnection",
    ) -> None:
        """
        Handle a remote ICE candidate from GoToConnect.
        
        Args:
            candidate_data: Dict with candidate, sdpMid, sdpMLineIndex
            peer_connection: aiortc peer connection to add candidate to
        """
        from aiortc import RTCIceCandidate
        
        candidate_str = candidate_data.get("candidate", "")
        
        # Parse for logging
        parsed = ICECandidate.parse(candidate_str)
        if parsed:
            self._remote_candidates.append(parsed)
            logger.debug(
                f"[{self.call_id}] Remote candidate: "
                f"{parsed.type} {parsed.ip}:{parsed.port}"
            )
        
        # Add to peer connection
        ice_candidate = RTCIceCandidate(
            candidate=candidate_str,
            sdpMid=candidate_data.get("sdpMid", "0"),
            sdpMLineIndex=candidate_data.get("sdpMLineIndex", 0),
        )
        
        await peer_connection.addIceCandidate(ice_candidate)
    
    def update_connection_state(self, state: str) -> None:
        """Update ICE connection state."""
        try:
            new_state = ICEConnectionState(state)
        except ValueError:
            logger.warning(f"[{self.call_id}] Unknown ICE state: {state}")
            return
        
        old_state = self._connection_state
        self._connection_state = new_state
        
        logger.info(
            f"[{self.call_id}] ICE state: "
            f"{old_state.value}{new_state.value}"
        )
        
        if self.on_connection_state_change:
            asyncio.create_task(
                self.on_connection_state_change(new_state)
            )
    
    @property
    def is_connected(self) -> bool:
        """Whether ICE is in a connected state."""
        return self._connection_state in (
            ICEConnectionState.CONNECTED,
            ICEConnectionState.COMPLETED,
        )
    
    @property
    def local_candidates_count(self) -> int:
        """Number of local candidates gathered."""
        return len(self._local_candidates)
    
    @property
    def remote_candidates_count(self) -> int:
        """Number of remote candidates received."""
        return len(self._remote_candidates)

Section 30: Audio Capture & Processing

30.1 Audio Fundamentals

Before processing audio, understand these concepts:

Sample Rate

How many audio samples per second:
  • 8000 Hz: Telephone quality (G.711)
  • 16000 Hz: Wideband telephony
  • 48000 Hz: High-quality audio (Opus default)
Higher sample rate = better quality but more bandwidth.

Bit Depth

How many bits per sample:
  • 16-bit: Standard for voice (-32768 to 32767)
  • 32-bit float: Used in processing

Frame Size

Audio is processed in chunks called “frames”:
Duration8kHz16kHz48kHz
10ms80 samples160 samples480 samples
20ms160 samples320 samples960 samples
40ms320 samples640 samples1920 samples
20ms at 48kHz = 960 samples is the most common for Opus.

Byte Size

For 16-bit mono audio, bytes = samples × 2:
Duration8kHz16kHz48kHz
10ms160 bytes320 bytes960 bytes
20ms320 bytes640 bytes1920 bytes
40ms640 bytes1280 bytes3840 bytes

30.2 Audio Frame Processing with PyAV

aiortc uses PyAV (FFmpeg bindings) for audio frames:
# bridge/audio/frames.py

import av
import numpy as np
import fractions
from typing import Tuple
import asyncio


class AudioFrameProcessor:
    """
    Utilities for working with PyAV AudioFrames.
    
    Handles conversion between AudioFrame and numpy arrays,
    as well as common audio manipulations.
    """
    
    @staticmethod
    def frame_to_numpy(frame: av.AudioFrame) -> np.ndarray:
        """
        Convert AudioFrame to numpy array.
        
        Args:
            frame: PyAV AudioFrame
        
        Returns:
            numpy array of shape (samples, channels)
        
        Example:
            >>> frame = await track.recv()
            >>> audio = AudioFrameProcessor.frame_to_numpy(frame)
            >>> print(audio.shape)  # (960, 1) for 20ms mono
        """
        # frame.to_ndarray() returns shape (channels, samples)
        data = frame.to_ndarray()
        
        # Transpose to (samples, channels) for easier processing
        if data.ndim == 2:
            data = data.T
        
        return data
    
    @staticmethod
    def numpy_to_frame(
        data: np.ndarray,
        sample_rate: int,
        pts: int = 0,
        format: str = "s16",
        layout: str = "mono",
    ) -> av.AudioFrame:
        """
        Convert numpy array to AudioFrame.
        
        Args:
            data: numpy array (samples,) or (samples, channels)
            sample_rate: Sample rate in Hz (e.g., 48000)
            pts: Presentation timestamp (frame number)
            format: Audio format ("s16" for 16-bit signed)
            layout: Channel layout ("mono" or "stereo")
        
        Returns:
            PyAV AudioFrame
        
        Example:
            >>> audio = np.zeros(960, dtype=np.int16)  # Silence
            >>> frame = AudioFrameProcessor.numpy_to_frame(
            ...     audio, sample_rate=48000, pts=0
            ... )
        """
        # Ensure 2D array
        if data.ndim == 1:
            data = data.reshape(-1, 1)
        
        samples = data.shape[0]
        channels = data.shape[1]
        
        # Determine layout
        if channels == 1:
            layout = "mono"
        elif channels == 2:
            layout = "stereo"
        
        # Create frame
        frame = av.AudioFrame(
            format=format,
            layout=layout,
            samples=samples,
        )
        
        # Set frame data (PyAV expects channels, samples)
        frame.planes[0].update(data.T.tobytes())
        
        frame.sample_rate = sample_rate
        frame.pts = pts
        frame.time_base = fractions.Fraction(1, sample_rate)
        
        return frame
    
    @staticmethod
    def get_frame_info(frame: av.AudioFrame) -> dict:
        """
        Get information about an audio frame.
        
        Useful for debugging and logging.
        """
        return {
            "samples": frame.samples,
            "sample_rate": frame.sample_rate,
            "channels": len(frame.layout.channels),
            "format": frame.format.name,
            "pts": frame.pts,
            "duration_ms": (frame.samples / frame.sample_rate) * 1000,
            "size_bytes": sum(len(p) for p in frame.planes),
        }

30.3 Audio Buffering

Audio streams need buffering to handle timing variations:
# bridge/audio/buffer.py

import asyncio
import numpy as np
from typing import Optional
import logging

logger = logging.getLogger(__name__)


class AudioBuffer:
    """
    Ring buffer for audio frames.
    
    Provides smooth audio flow by buffering frames
    and handling timing variations. Uses a circular
    buffer to efficiently manage memory.
    
    Example:
        buffer = AudioBuffer(
            max_duration_ms=500,
            sample_rate=48000,
            channels=1
        )
        
        # Write incoming audio
        await buffer.write(audio_data)
        
        # Read 20ms chunks for processing
        chunk = await buffer.read(960)  # 20ms at 48kHz
    """
    
    def __init__(
        self,
        max_duration_ms: float = 500,
        sample_rate: int = 48000,
        channels: int = 1,
    ):
        self.sample_rate = sample_rate
        self.channels = channels
        
        # Calculate buffer size
        max_samples = int(sample_rate * max_duration_ms / 1000)
        self._buffer = np.zeros((max_samples, channels), dtype=np.int16)
        
        # Ring buffer pointers
        self._write_pos = 0
        self._read_pos = 0
        self._available = 0
        
        # Thread safety
        self._lock = asyncio.Lock()
    
    async def write(self, data: np.ndarray) -> int:
        """
        Write audio data to buffer.
        
        Args:
            data: Audio samples (samples,) or (samples, channels)
        
        Returns:
            Number of samples written
        """
        async with self._lock:
            # Ensure correct shape
            if data.ndim == 1:
                data = data.reshape(-1, 1)
            
            samples = data.shape[0]
            buffer_size = self._buffer.shape[0]
            
            # Check available space
            space = buffer_size - self._available
            if samples > space:
                # Buffer full - drop oldest data
                drop = samples - space
                self._read_pos = (self._read_pos + drop) % buffer_size
                self._available -= drop
                logger.warning(f"Buffer overflow, dropped {drop} samples")
            
            # Write data
            end_pos = self._write_pos + samples
            
            if end_pos <= buffer_size:
                # Simple write
                self._buffer[self._write_pos:end_pos] = data
            else:
                # Wrap around
                first_part = buffer_size - self._write_pos
                self._buffer[self._write_pos:] = data[:first_part]
                self._buffer[:end_pos - buffer_size] = data[first_part:]
            
            self._write_pos = end_pos % buffer_size
            self._available += samples
            
            return samples
    
    async def read(self, samples: int) -> np.ndarray:
        """
        Read audio data from buffer.
        
        Args:
            samples: Number of samples to read
        
        Returns:
            Audio data or silence if not enough available
        """
        async with self._lock:
            buffer_size = self._buffer.shape[0]
            
            if self._available < samples:
                # Not enough data - return silence
                logger.debug(
                    f"Buffer underrun: wanted {samples}, "
                    f"have {self._available}"
                )
                return np.zeros((samples, self.channels), dtype=np.int16)
            
            # Read data
            end_pos = self._read_pos + samples
            
            if end_pos <= buffer_size:
                # Simple read
                data = self._buffer[self._read_pos:end_pos].copy()
            else:
                # Wrap around
                first_part = buffer_size - self._read_pos
                data = np.concatenate([
                    self._buffer[self._read_pos:],
                    self._buffer[:end_pos - buffer_size],
                ])
            
            self._read_pos = end_pos % buffer_size
            self._available -= samples
            
            return data
    
    @property
    def available_samples(self) -> int:
        """Number of samples available to read."""
        return self._available
    
    @property
    def available_ms(self) -> float:
        """Duration of audio available in milliseconds."""
        return (self._available / self.sample_rate) * 1000
    
    @property
    def buffer_utilization(self) -> float:
        """Buffer utilization as percentage (0-1)."""
        return self._available / self._buffer.shape[0]
    
    def clear(self) -> None:
        """Clear the buffer."""
        self._write_pos = 0
        self._read_pos = 0
        self._available = 0

30.4 Audio Resampling

Different parts of the pipeline use different sample rates:
# bridge/audio/resampler.py

import numpy as np
import av
from typing import Optional


class AudioResampler:
    """
    High-quality audio resampling.
    
    Converts between different sample rates while
    maintaining audio quality using PyAV's resampler
    (which uses FFmpeg internally).
    
    Common conversions:
    - 8kHz → 48kHz (G.711 to Opus)
    - 48kHz → 16kHz (Opus to STT)
    - 16kHz → 48kHz (TTS to Opus)
    """
    
    def __init__(
        self,
        input_rate: int,
        output_rate: int,
        input_channels: int = 1,
        output_channels: int = 1,
        input_format: str = "s16",
        output_format: str = "s16",
    ):
        self.input_rate = input_rate
        self.output_rate = output_rate
        self.input_channels = input_channels
        self.output_channels = output_channels
        
        # Determine layouts
        in_layout = "mono" if input_channels == 1 else "stereo"
        out_layout = "mono" if output_channels == 1 else "stereo"
        
        # Create PyAV resampler
        self._resampler = av.AudioResampler(
            format=output_format,
            layout=out_layout,
            rate=output_rate,
        )
        
        # Ratio for numpy-based fallback
        self._ratio = output_rate / input_rate
    
    def resample_frame(self, frame: av.AudioFrame) -> Optional[av.AudioFrame]:
        """
        Resample an AudioFrame.
        
        Args:
            frame: Input frame at input_rate
        
        Returns:
            Resampled frame at output_rate
        """
        frames = self._resampler.resample(frame)
        return frames[0] if frames else None
    
    def resample_numpy(self, data: np.ndarray) -> np.ndarray:
        """
        Resample numpy audio data.
        
        Args:
            data: Input samples (samples,) or (samples, channels)
        
        Returns:
            Resampled samples
        
        Note: For best quality, use resample_frame() when possible.
        This method uses linear interpolation.
        """
        if self.input_rate == self.output_rate:
            return data
        
        # Ensure 2D
        if data.ndim == 1:
            data = data.reshape(-1, 1)
        
        input_samples = data.shape[0]
        output_samples = int(input_samples * self._ratio)
        
        # Linear interpolation resampling
        indices = np.linspace(0, input_samples - 1, output_samples)
        
        output = np.zeros((output_samples, data.shape[1]), dtype=data.dtype)
        
        for ch in range(data.shape[1]):
            output[:, ch] = np.interp(
                indices,
                np.arange(input_samples),
                data[:, ch],
            ).astype(data.dtype)
        
        return output
    
    def flush(self) -> Optional[av.AudioFrame]:
        """Flush any remaining samples from resampler."""
        frames = self._resampler.resample(None)
        return frames[0] if frames else None


class MultiRateResampler:
    """
    Manages resamplers for multiple rate conversions.
    
    Caches resamplers for common conversions to avoid
    recreating them for each frame.
    """
    
    def __init__(self):
        self._resamplers: dict[tuple, AudioResampler] = {}
    
    def get_resampler(
        self,
        input_rate: int,
        output_rate: int,
        channels: int = 1,
    ) -> AudioResampler:
        """Get or create a resampler for the given rates."""
        key = (input_rate, output_rate, channels)
        
        if key not in self._resamplers:
            self._resamplers[key] = AudioResampler(
                input_rate=input_rate,
                output_rate=output_rate,
                input_channels=channels,
                output_channels=channels,
            )
        
        return self._resamplers[key]
    
    def resample(
        self,
        data: np.ndarray,
        input_rate: int,
        output_rate: int,
        channels: int = 1,
    ) -> np.ndarray:
        """Resample audio data using cached resampler."""
        resampler = self.get_resampler(input_rate, output_rate, channels)
        return resampler.resample_numpy(data)

30.5 Custom Audio Tracks

We need custom track implementations for both receiving and sending audio:
# bridge/webrtc/tracks.py

import asyncio
import fractions
import time
from typing import Optional
from aiortc import MediaStreamTrack
from av import AudioFrame
import numpy as np


class AudioTrackSink(MediaStreamTrack):
    """
    Audio track that receives frames from a WebRTC peer.
    
    Wraps an incoming remote track and provides access
    to audio frames for processing.
    
    Example:
        remote_track = ...  # From track event
        sink = AudioTrackSink(
            track=remote_track,
            on_frame=handle_audio_frame
        )
        
        # Frames are delivered to handle_audio_frame
    """
    
    kind = "audio"
    
    def __init__(
        self,
        track: MediaStreamTrack,
        on_frame: callable = None,
    ):
        super().__init__()
        self._track = track
        self.on_frame = on_frame
        self._running = True
    
    async def recv(self) -> AudioFrame:
        """Receive and process audio frames."""
        frame = await self._track.recv()
        
        if self.on_frame and self._running:
            await self.on_frame(frame)
        
        return frame
    
    def stop(self) -> None:
        """Stop receiving frames."""
        self._running = False
        super().stop()


class AudioTrackSource(MediaStreamTrack):
    """
    Audio track that generates frames for a WebRTC peer.
    
    Receives audio from our processing pipeline and sends
    it to the remote peer via WebRTC.
    
    Example:
        source = AudioTrackSource(
            sample_rate=48000,
            channels=1,
            samples_per_frame=960  # 20ms
        )
        
        # Add to peer connection
        await connection.add_track(source)
        
        # Push audio to be sent
        await source.push_audio(audio_data)
    """
    
    kind = "audio"
    
    def __init__(
        self,
        sample_rate: int = 48000,
        channels: int = 1,
        samples_per_frame: int = 960,  # 20ms at 48kHz
    ):
        super().__init__()
        
        self.sample_rate = sample_rate
        self.channels = channels
        self.samples_per_frame = samples_per_frame
        
        # Frame timing
        self._frame_duration = samples_per_frame / sample_rate
        self._start_time: Optional[float] = None
        self._frame_count = 0
        
        # Audio buffer queue
        self._queue: asyncio.Queue[np.ndarray] = asyncio.Queue(maxsize=50)
        
        # Silence frame for when buffer is empty
        self._silence = np.zeros(
            (samples_per_frame, channels),
            dtype=np.int16,
        )
    
    async def recv(self) -> AudioFrame:
        """
        Generate the next audio frame.
        
        Called automatically by aiortc at the required rate.
        """
        # Initialize timing on first frame
        if self._start_time is None:
            self._start_time = time.time()
        
        # Calculate expected time for this frame
        expected_time = self._start_time + (
            self._frame_count * self._frame_duration
        )
        
        # Wait until it's time to send this frame
        now = time.time()
        if now < expected_time:
            await asyncio.sleep(expected_time - now)
        
        # Get audio data from queue or use silence
        try:
            audio_data = self._queue.get_nowait()
        except asyncio.QueueEmpty:
            audio_data = self._silence
        
        # Create AudioFrame
        frame = AudioFrame(
            format="s16",
            layout="mono" if self.channels == 1 else "stereo",
            samples=self.samples_per_frame,
        )
        
        # Set frame data
        frame.planes[0].update(audio_data.tobytes())
        frame.sample_rate = self.sample_rate
        frame.pts = self._frame_count * self.samples_per_frame
        frame.time_base = fractions.Fraction(1, self.sample_rate)
        
        self._frame_count += 1
        
        return frame
    
    async def push_audio(self, audio_data: np.ndarray) -> bool:
        """
        Push audio data to be sent.
        
        Args:
            audio_data: Audio samples as numpy array (int16)
        
        Returns:
            True if queued, False if queue was full (data dropped)
        """
        try:
            self._queue.put_nowait(audio_data)
            return True
        except asyncio.QueueFull:
            # Drop oldest frame to make room
            try:
                self._queue.get_nowait()
                self._queue.put_nowait(audio_data)
                return True
            except asyncio.QueueEmpty:
                return False
    
    def clear_buffer(self) -> None:
        """Clear the audio buffer."""
        while not self._queue.empty():
            try:
                self._queue.get_nowait()
            except asyncio.QueueEmpty:
                break
    
    @property
    def queue_size(self) -> int:
        """Current number of frames in queue."""
        return self._queue.qsize()
    
    @property
    def queue_duration_ms(self) -> float:
        """Duration of audio in queue in milliseconds."""
        return self._queue.qsize() * self._frame_duration * 1000
    
    def stop(self) -> None:
        """Stop the track."""
        self.clear_buffer()
        super().stop()

Section 31: LiveKit Connection

31.1 LiveKit Overview

LiveKit is an open-source WebRTC SFU (Selective Forwarding Unit) that provides:
  • Room-based architecture for real-time communication
  • Server-side SDKs for Python, Go, Node.js
  • Low-latency media routing
  • Automatic scaling

Why LiveKit?

FeatureBenefit
Room modelLogical grouping for calls
Participant managementTrack who’s in each call
Server-side APICreate rooms, manage participants
Recording (Egress)Built-in recording service
Agents frameworkDispatch AI agents to rooms

Room Architecture for Calls

┌─────────────────────────────────────────────────────────────────────────────┐
│                      LIVEKIT ROOM ARCHITECTURE                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                    Room: call_{tenant}_{call_id}                    │   │
│   │                                                                      │   │
│   │   PARTICIPANTS:                                                      │   │
│   │                                                                      │   │
│   │   ┌─────────────────┐       ┌─────────────────┐                    │   │
│   │   │  bridge_{id}    │       │  agent_{id}     │                    │   │
│   │   │                 │       │                 │                    │   │
│   │   │  Tracks:        │       │  Tracks:        │                    │   │
│   │   │  • caller_audio │◀─────▶│  • agent_audio  │                    │   │
│   │   │    (publish)    │       │    (publish)    │                    │   │
│   │   │                 │       │                 │                    │   │
│   │   │  Subscriptions: │       │  Subscriptions: │                    │   │
│   │   │  • agent_audio  │       │  • caller_audio │                    │   │
│   │   │                 │       │                 │                    │   │
│   │   └─────────────────┘       └─────────────────┘                    │   │
│   │                                                                      │   │
│   │   Optional:                                                          │   │
│   │   ┌─────────────────┐                                               │   │
│   │   │  supervisor_{id}│  (For QA monitoring)                          │   │
│   │   │  • subscribe all│                                               │   │
│   │   │  • hidden=true  │                                               │   │
│   │   └─────────────────┘                                               │   │
│   │                                                                      │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│   AUDIO FLOW:                                                               │
│                                                                             │
│   Caller → GoTo → Bridge → [caller_audio track] → Agent Worker             │
│   Agent Worker → [agent_audio track] → Bridge → GoTo → Caller              │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

31.2 LiveKit Python SDK

Installation

pip install livekit livekit-api
The livekit package contains the real-time client SDK. The livekit-api package contains the server-side API client.

31.3 LiveKit Connection Handler

Here’s our LiveKit integration:
# bridge/livekit/connection_handler.py

import asyncio
from dataclasses import dataclass
from typing import Optional, Callable
from livekit import rtc, api
import numpy as np
import logging

logger = logging.getLogger(__name__)


@dataclass
class LiveKitConfig:
    """Configuration for LiveKit connection."""
    url: str               # wss://aiconnected.livekit.cloud
    api_key: str           # API key from LiveKit Cloud
    api_secret: str        # API secret from LiveKit Cloud
    room_prefix: str = "call_"  # Prefix for room names


@dataclass
class LiveKitRoomInfo:
    """Information about a LiveKit room."""
    room_name: str
    participant_identity: str
    participant_name: str


class LiveKitConnectionHandler:
    """
    Manages connection to LiveKit for voice pipeline integration.
    
    Handles:
    - Room creation and joining
    - Publishing caller audio
    - Subscribing to agent audio
    - Participant lifecycle events
    
    Example:
        config = LiveKitConfig(
            url="wss://aiconnected.livekit.cloud",
            api_key="...",
            api_secret="..."
        )
        
        handler = LiveKitConnectionHandler(config, call_id="call-123")
        
        # Connect to room
        await handler.connect()
        
        # Publish caller audio
        await handler.publish_audio(audio_data)
        
        # Agent audio is delivered via callback
        handler.on_agent_audio = handle_agent_audio
    """
    
    def __init__(
        self,
        config: LiveKitConfig,
        call_id: str,
        tenant_id: str = "default",
    ):
        self.config = config
        self.call_id = call_id
        self.tenant_id = tenant_id
        
        # Room info
        self.room_info = LiveKitRoomInfo(
            room_name=f"{config.room_prefix}{tenant_id}_{call_id}",
            participant_identity=f"bridge_{call_id}",
            participant_name="WebRTC Bridge",
        )
        
        # LiveKit components
        self._room: Optional[rtc.Room] = None
        self._api = api.LiveKitAPI(
            url=config.url.replace("wss://", "https://"),
            api_key=config.api_key,
            api_secret=config.api_secret,
        )
        
        # Audio tracks
        self._local_source: Optional[rtc.AudioSource] = None
        self._local_track: Optional[rtc.LocalAudioTrack] = None
        self._remote_tracks: dict[str, rtc.RemoteAudioTrack] = {}
        
        # Callbacks
        self.on_agent_audio: Optional[Callable] = None
        self.on_connected: Optional[Callable] = None
        self.on_disconnected: Optional[Callable] = None
        self.on_agent_joined: Optional[Callable] = None
        
        # State
        self._connected = False
        self._audio_tasks: list[asyncio.Task] = []
    
    async def connect(self) -> None:
        """
        Connect to LiveKit and join the room.
        
        Creates the room if it doesn't exist, then joins
        as a participant with the bridge identity.
        """
        # Create room if it doesn't exist
        try:
            await self._api.room.create_room(
                api.CreateRoomRequest(
                    name=self.room_info.room_name,
                    empty_timeout=300,  # 5 minutes
                    max_participants=10,
                )
            )
            logger.info(f"[{self.call_id}] Created LiveKit room")
        except Exception as e:
            # Room may already exist - that's OK
            logger.debug(f"[{self.call_id}] Room creation: {e}")
        
        # Generate access token
        token = api.AccessToken(
            self.config.api_key,
            self.config.api_secret,
        )
        token.with_identity(self.room_info.participant_identity)
        token.with_name(self.room_info.participant_name)
        token.with_grants(api.VideoGrants(
            room_join=True,
            room=self.room_info.room_name,
            can_publish=True,
            can_subscribe=True,
        ))
        token.with_ttl(3600)  # 1 hour
        
        # Create room client
        self._room = rtc.Room()
        
        # Set up event handlers
        self._room.on("participant_connected", self._on_participant_connected)
        self._room.on("participant_disconnected", self._on_participant_disconnected)
        self._room.on("track_subscribed", self._on_track_subscribed)
        self._room.on("track_unsubscribed", self._on_track_unsubscribed)
        self._room.on("disconnected", self._on_disconnected)
        
        # Connect to room
        await self._room.connect(
            self.config.url,
            token.to_jwt(),
        )
        
        self._connected = True
        logger.info(
            f"[{self.call_id}] Connected to LiveKit room: "
            f"{self.room_info.room_name}"
        )
        
        # Create and publish local audio track
        await self._setup_local_audio()
        
        if self.on_connected:
            await self.on_connected()
    
    async def _setup_local_audio(self) -> None:
        """Set up local audio track for publishing caller audio."""
        # Create audio source
        self._local_source = rtc.AudioSource(
            sample_rate=48000,
            num_channels=1,
        )
        
        # Create track from source
        self._local_track = rtc.LocalAudioTrack.create_audio_track(
            "caller_audio",
            self._local_source,
        )
        
        # Publish track
        options = rtc.TrackPublishOptions(
            source=rtc.TrackSource.SOURCE_MICROPHONE,
        )
        
        await self._room.local_participant.publish_track(
            self._local_track,
            options,
        )
        
        logger.info(f"[{self.call_id}] Published caller audio track")
    
    async def publish_audio(self, audio_data: np.ndarray) -> None:
        """
        Publish audio data to LiveKit.
        
        Args:
            audio_data: PCM audio samples (int16, 48kHz, mono)
        """
        if not self._local_source or not self._connected:
            return
        
        # Create audio frame
        frame = rtc.AudioFrame(
            data=audio_data.tobytes(),
            sample_rate=48000,
            num_channels=1,
            samples_per_channel=len(audio_data),
        )
        
        # Capture frame to source
        await self._local_source.capture_frame(frame)
    
    async def _on_participant_connected(
        self,
        participant: rtc.RemoteParticipant,
    ) -> None:
        """Handle new participant joining."""
        logger.info(
            f"[{self.call_id}] Participant connected: {participant.identity}"
        )
        
        # Check if it's an agent
        if participant.identity.startswith("agent_"):
            if self.on_agent_joined:
                await self.on_agent_joined(participant.identity)
    
    async def _on_participant_disconnected(
        self,
        participant: rtc.RemoteParticipant,
    ) -> None:
        """Handle participant leaving."""
        logger.info(
            f"[{self.call_id}] Participant disconnected: {participant.identity}"
        )
    
    async def _on_track_subscribed(
        self,
        track: rtc.Track,
        publication: rtc.RemoteTrackPublication,
        participant: rtc.RemoteParticipant,
    ) -> None:
        """Handle subscribing to a remote track."""
        if track.kind != rtc.TrackKind.KIND_AUDIO:
            return
        
        logger.info(
            f"[{self.call_id}] Subscribed to audio track "
            f"from {participant.identity}"
        )
        
        # Store track
        self._remote_tracks[participant.identity] = track
        
        # Start receiving audio
        if self.on_agent_audio:
            task = asyncio.create_task(
                self._receive_audio(track, participant.identity)
            )
            self._audio_tasks.append(task)
    
    async def _receive_audio(
        self,
        track: rtc.RemoteAudioTrack,
        participant_id: str,
    ) -> None:
        """
        Receive audio frames from a track.
        
        Runs continuously, delivering frames to on_agent_audio callback.
        """
        audio_stream = rtc.AudioStream(track)
        
        async for frame_event in audio_stream:
            frame = frame_event.frame
            
            if self.on_agent_audio:
                # Convert to numpy
                audio_data = np.frombuffer(
                    frame.data,
                    dtype=np.int16,
                )
                
                await self.on_agent_audio(audio_data, participant_id)
    
    async def _on_track_unsubscribed(
        self,
        track: rtc.Track,
        publication: rtc.RemoteTrackPublication,
        participant: rtc.RemoteParticipant,
    ) -> None:
        """Handle unsubscribing from a remote track."""
        if participant.identity in self._remote_tracks:
            del self._remote_tracks[participant.identity]
    
    async def _on_disconnected(self) -> None:
        """Handle disconnection from room."""
        self._connected = False
        logger.warning(f"[{self.call_id}] Disconnected from LiveKit")
        
        if self.on_disconnected:
            await self.on_disconnected()
    
    async def disconnect(self) -> None:
        """Disconnect from LiveKit room."""
        # Cancel audio tasks
        for task in self._audio_tasks:
            task.cancel()
            try:
                await task
            except asyncio.CancelledError:
                pass
        self._audio_tasks.clear()
        
        # Disconnect from room
        if self._room:
            await self._room.disconnect()
            self._room = None
        
        self._connected = False
        logger.info(f"[{self.call_id}] Disconnected from LiveKit")
    
    async def delete_room(self) -> None:
        """Delete the LiveKit room after call ends."""
        try:
            await self._api.room.delete_room(
                api.DeleteRoomRequest(room=self.room_info.room_name)
            )
            logger.info(f"[{self.call_id}] Deleted LiveKit room")
        except Exception as e:
            logger.warning(f"[{self.call_id}] Failed to delete room: {e}")
    
    @property
    def is_connected(self) -> bool:
        """Whether connected to LiveKit."""
        return self._connected
    
    @property
    def room_name(self) -> str:
        """Current room name."""
        return self.room_info.room_name

31.4 Token Generation

Tokens are generated for different participant types:
# bridge/livekit/tokens.py

from livekit import api
from dataclasses import dataclass


@dataclass
class TokenConfig:
    """Configuration for token generation."""
    api_key: str
    api_secret: str


class LiveKitTokenGenerator:
    """
    Generate LiveKit access tokens.
    
    Provides tokens for different participant types
    with appropriate permissions.
    """
    
    def __init__(self, config: TokenConfig):
        self.config = config
    
    def generate_bridge_token(
        self,
        room_name: str,
        call_id: str,
        ttl_seconds: int = 3600,
    ) -> str:
        """
        Generate token for WebRTC bridge.
        
        Bridge can publish (caller audio) and subscribe (agent audio).
        """
        token = api.AccessToken(
            self.config.api_key,
            self.config.api_secret,
        )
        token.with_identity(f"bridge_{call_id}")
        token.with_name("WebRTC Bridge")
        token.with_grants(api.VideoGrants(
            room_join=True,
            room=room_name,
            can_publish=True,
            can_subscribe=True,
        ))
        token.with_ttl(ttl_seconds)
        return token.to_jwt()
    
    def generate_agent_token(
        self,
        room_name: str,
        agent_id: str,
        ttl_seconds: int = 7200,
    ) -> str:
        """
        Generate token for AI agent.
        
        Agent can publish (response audio) and subscribe (caller audio).
        Also can publish data (for metadata/events).
        """
        token = api.AccessToken(
            self.config.api_key,
            self.config.api_secret,
        )
        token.with_identity(f"agent_{agent_id}")
        token.with_name("AI Agent")
        token.with_grants(api.VideoGrants(
            room_join=True,
            room=room_name,
            can_publish=True,
            can_subscribe=True,
            can_publish_data=True,
        ))
        token.with_ttl(ttl_seconds)
        return token.to_jwt()
    
    def generate_supervisor_token(
        self,
        room_name: str,
        supervisor_id: str,
        hidden: bool = True,
        ttl_seconds: int = 3600,
    ) -> str:
        """
        Generate token for human supervisor.
        
        Supervisor can listen to calls for QA.
        Hidden by default (participants don't see them).
        """
        token = api.AccessToken(
            self.config.api_key,
            self.config.api_secret,
        )
        token.with_identity(f"supervisor_{supervisor_id}")
        token.with_name("Supervisor")
        token.with_grants(api.VideoGrants(
            room_join=True,
            room=room_name,
            can_publish=True,  # Can speak if needed
            can_subscribe=True,
            hidden=hidden,
        ))
        token.with_ttl(ttl_seconds)
        return token.to_jwt()
    
    def generate_recording_token(
        self,
        room_name: str,
        recording_id: str,
        ttl_seconds: int = 86400,
    ) -> str:
        """
        Generate token for recording service.
        
        Recording participant only subscribes (no publishing).
        Always hidden from other participants.
        """
        token = api.AccessToken(
            self.config.api_key,
            self.config.api_secret,
        )
        token.with_identity(f"recorder_{recording_id}")
        token.with_name("Recording Service")
        token.with_grants(api.VideoGrants(
            room_join=True,
            room=room_name,
            can_publish=False,
            can_subscribe=True,
            hidden=True,
            recorder=True,
        ))
        token.with_ttl(ttl_seconds)
        return token.to_jwt()

Section 32: Audio Routing

32.1 Bidirectional Audio Flow

The bridge routes audio in two directions:
┌─────────────────────────────────────────────────────────────────────────────┐
│                    BIDIRECTIONAL AUDIO FLOW                                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   INBOUND (Caller → Agent):                                                │
│   ─────────────────────────                                                 │
│                                                                             │
│   ┌───────────┐    ┌───────────┐    ┌───────────┐    ┌───────────┐       │
│   │  GoTo     │    │  Decode   │    │ Resample  │    │ LiveKit   │       │
│   │  WebRTC   │───▶│  Opus/    │───▶│  to       │───▶│ Publish   │       │
│   │  Receive  │    │  G711     │    │  48kHz    │    │           │       │
│   └───────────┘    └───────────┘    └───────────┘    └───────────┘       │
│                                                                             │
│        │                │                │                │                │
│        │                │                │                │                │
│        ▼                ▼                ▼                ▼                │
│   Opus/G711        PCM 8-48kHz      PCM 48kHz         To Agent            │
│   RTP packets      raw audio        uniform           via SFU              │
│                                                                             │
│                                                                             │
│   OUTBOUND (Agent → Caller):                                               │
│   ──────────────────────────                                                │
│                                                                             │
│   ┌───────────┐    ┌───────────┐    ┌───────────┐    ┌───────────┐       │
│   │ LiveKit   │    │ Resample  │    │  Encode   │    │  GoTo     │       │
│   │ Subscribe │───▶│  to       │───▶│  Opus/    │───▶│  WebRTC   │       │
│   │           │    │  match    │    │  G711     │    │  Send     │       │
│   └───────────┘    └───────────┘    └───────────┘    └───────────┘       │
│                                                                             │
│        │                │                │                │                │
│        │                │                │                │                │
│        ▼                ▼                ▼                ▼                │
│   From Agent       Match GoTo      RTP packets         To Caller          │
│   PCM 48kHz        codec rate      Opus/G711           via PSTN           │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

32.2 Audio Bridge Implementation

Here’s the complete audio bridge that coordinates both directions:
# bridge/audio/audio_bridge.py

import asyncio
from dataclasses import dataclass
from typing import Optional, Callable
import numpy as np
import logging
from enum import Enum

from bridge.webrtc.tracks import AudioTrackSink, AudioTrackSource
from bridge.audio.resampler import MultiRateResampler
from bridge.audio.buffer import AudioBuffer

logger = logging.getLogger(__name__)


class AudioDirection(Enum):
    """Audio flow direction."""
    INBOUND = "inbound"    # Caller → Agent
    OUTBOUND = "outbound"  # Agent → Caller


@dataclass
class AudioBridgeConfig:
    """Configuration for audio bridge."""
    # Sample rates
    goto_sample_rate: int = 48000  # May be 8000 for G.711
    livekit_sample_rate: int = 48000
    
    # Buffer settings
    buffer_duration_ms: int = 100
    
    # Frame size
    frame_size_samples: int = 960  # 20ms at 48kHz


class AudioBridge:
    """
    Bidirectional audio bridge between GoToConnect and LiveKit.
    
    Routes caller audio to the AI agent (inbound) and
    routes AI responses back to the caller (outbound).
    
    Example:
        bridge = AudioBridge(
            config=AudioBridgeConfig(),
            call_id="call-123"
        )
        
        # Set up audio sources
        bridge.set_goto_audio_source(goto_audio_track)
        bridge.set_livekit_audio_sink(livekit_audio_source)
        
        # Start routing
        await bridge.start()
    """
    
    def __init__(
        self,
        config: AudioBridgeConfig,
        call_id: str,
    ):
        self.config = config
        self.call_id = call_id
        
        # Resamplers
        self._resamplers = MultiRateResampler()
        
        # Buffers
        self._inbound_buffer = AudioBuffer(
            max_duration_ms=config.buffer_duration_ms,
            sample_rate=config.livekit_sample_rate,
            channels=1,
        )
        self._outbound_buffer = AudioBuffer(
            max_duration_ms=config.buffer_duration_ms,
            sample_rate=config.goto_sample_rate,
            channels=1,
        )
        
        # Audio sources/sinks
        self._goto_source: Optional[AudioTrackSink] = None
        self._goto_sink: Optional[AudioTrackSource] = None
        self._livekit_publish: Optional[Callable] = None
        self._livekit_callback: Optional[Callable] = None
        
        # State
        self._running = False
        self._inbound_task: Optional[asyncio.Task] = None
        self._outbound_task: Optional[asyncio.Task] = None
        
        # Metrics
        self._inbound_frames = 0
        self._outbound_frames = 0
        
        # Callbacks for monitoring
        self.on_inbound_audio: Optional[Callable] = None  # For STT
        self.on_outbound_audio: Optional[Callable] = None  # For logging
    
    def set_goto_audio_source(
        self,
        track_sink: AudioTrackSink,
        sample_rate: int,
    ) -> None:
        """
        Set the GoToConnect audio source (caller audio).
        
        Args:
            track_sink: Wrapped remote track from GoTo
            sample_rate: Sample rate of GoTo audio
        """
        self._goto_source = track_sink
        self.config.goto_sample_rate = sample_rate
    
    def set_goto_audio_sink(
        self,
        track_source: AudioTrackSource,
    ) -> None:
        """
        Set the GoToConnect audio sink (for sending to caller).
        
        Args:
            track_source: Local track source for GoTo connection
        """
        self._goto_sink = track_source
    
    def set_livekit_publish(
        self,
        publish_func: Callable,
    ) -> None:
        """
        Set the LiveKit publish function.
        
        Args:
            publish_func: Async function to publish audio to LiveKit
        """
        self._livekit_publish = publish_func
    
    def set_livekit_callback(
        self,
        callback: Callable,
    ) -> None:
        """
        Set callback for receiving LiveKit audio (agent responses).
        
        This is called by LiveKit connection handler when audio arrives.
        """
        self._livekit_callback = callback
    
    async def start(self) -> None:
        """Start the audio bridge."""
        if self._running:
            return
        
        self._running = True
        
        # Start inbound routing (Caller → Agent)
        self._inbound_task = asyncio.create_task(
            self._inbound_loop()
        )
        
        # Start outbound routing (Agent → Caller)
        self._outbound_task = asyncio.create_task(
            self._outbound_loop()
        )
        
        logger.info(f"[{self.call_id}] Audio bridge started")
    
    async def stop(self) -> None:
        """Stop the audio bridge."""
        self._running = False
        
        # Cancel tasks
        for task in [self._inbound_task, self._outbound_task]:
            if task:
                task.cancel()
                try:
                    await task
                except asyncio.CancelledError:
                    pass
        
        logger.info(
            f"[{self.call_id}] Audio bridge stopped. "
            f"Inbound: {self._inbound_frames}, "
            f"Outbound: {self._outbound_frames}"
        )
    
    async def handle_goto_audio(self, frame: "AudioFrame") -> None:
        """
        Handle incoming audio from GoToConnect (caller audio).
        
        Called by GoTo connection handler when audio arrives.
        """
        # Convert frame to numpy
        audio_data = frame.to_ndarray()
        if audio_data.ndim == 2:
            audio_data = audio_data.T  # (samples, channels)
        
        # Flatten to mono if needed
        if audio_data.ndim == 2 and audio_data.shape[1] > 1:
            audio_data = audio_data.mean(axis=1).astype(np.int16)
        
        # Resample if needed
        if frame.sample_rate != self.config.livekit_sample_rate:
            audio_data = self._resamplers.resample(
                audio_data,
                frame.sample_rate,
                self.config.livekit_sample_rate,
            )
        
        # Buffer for smooth delivery
        await self._inbound_buffer.write(audio_data.reshape(-1, 1))
        
        self._inbound_frames += 1
    
    async def handle_livekit_audio(
        self,
        audio_data: np.ndarray,
        participant_id: str,
    ) -> None:
        """
        Handle incoming audio from LiveKit (agent responses).
        
        Called by LiveKit connection handler when audio arrives.
        """
        # Resample if needed
        if self.config.livekit_sample_rate != self.config.goto_sample_rate:
            audio_data = self._resamplers.resample(
                audio_data,
                self.config.livekit_sample_rate,
                self.config.goto_sample_rate,
            )
        
        # Buffer for smooth delivery
        await self._outbound_buffer.write(audio_data.reshape(-1, 1))
        
        self._outbound_frames += 1
    
    async def _inbound_loop(self) -> None:
        """
        Inbound routing loop (Caller → Agent).
        
        Reads from inbound buffer and publishes to LiveKit.
        """
        frame_duration = self.config.frame_size_samples / self.config.livekit_sample_rate
        
        while self._running:
            try:
                # Read frame from buffer
                audio = await self._inbound_buffer.read(
                    self.config.frame_size_samples
                )
                
                # Publish to LiveKit
                if self._livekit_publish:
                    await self._livekit_publish(audio.flatten())
                
                # Optional callback for STT processing
                if self.on_inbound_audio:
                    await self.on_inbound_audio(audio.flatten())
                
                # Pace to frame duration
                await asyncio.sleep(frame_duration)
                
            except asyncio.CancelledError:
                break
            except Exception as e:
                logger.error(f"[{self.call_id}] Inbound error: {e}")
    
    async def _outbound_loop(self) -> None:
        """
        Outbound routing loop (Agent → Caller).
        
        Reads from outbound buffer and sends to GoTo.
        """
        frame_duration = self.config.frame_size_samples / self.config.goto_sample_rate
        
        while self._running:
            try:
                # Read frame from buffer
                audio = await self._outbound_buffer.read(
                    self.config.frame_size_samples
                )
                
                # Send to GoTo
                if self._goto_sink:
                    await self._goto_sink.push_audio(audio)
                
                # Optional callback for logging
                if self.on_outbound_audio:
                    await self.on_outbound_audio(audio.flatten())
                
                # Pace to frame duration
                await asyncio.sleep(frame_duration)
                
            except asyncio.CancelledError:
                break
            except Exception as e:
                logger.error(f"[{self.call_id}] Outbound error: {e}")
    
    @property
    def inbound_buffer_ms(self) -> float:
        """Milliseconds of audio in inbound buffer."""
        return self._inbound_buffer.available_ms
    
    @property
    def outbound_buffer_ms(self) -> float:
        """Milliseconds of audio in outbound buffer."""
        return self._outbound_buffer.available_ms
    
    def get_stats(self) -> dict:
        """Get audio bridge statistics."""
        return {
            "inbound_frames": self._inbound_frames,
            "outbound_frames": self._outbound_frames,
            "inbound_buffer_ms": self.inbound_buffer_ms,
            "outbound_buffer_ms": self.outbound_buffer_ms,
            "running": self._running,
        }

32.3 Volume Normalization

Optional volume normalization to ensure consistent levels:
# bridge/audio/normalizer.py

import numpy as np
from typing import Optional


class AudioNormalizer:
    """
    Audio volume normalization.
    
    Ensures consistent audio levels across different
    callers and agents.
    """
    
    def __init__(
        self,
        target_db: float = -20.0,  # Target level in dB
        max_gain_db: float = 20.0,  # Maximum gain to apply
        attack_ms: float = 10.0,    # Attack time
        release_ms: float = 100.0,  # Release time
        sample_rate: int = 48000,
    ):
        self.target_db = target_db
        self.max_gain_db = max_gain_db
        
        # Convert times to coefficients
        self.attack_coeff = np.exp(
            -1.0 / (attack_ms * sample_rate / 1000)
        )
        self.release_coeff = np.exp(
            -1.0 / (release_ms * sample_rate / 1000)
        )
        
        # State
        self._envelope = 0.0
        self._current_gain = 1.0
    
    def process(self, audio: np.ndarray) -> np.ndarray:
        """
        Process audio through normalizer.
        
        Args:
            audio: Input audio samples (int16)
        
        Returns:
            Normalized audio samples (int16)
        """
        # Convert to float for processing
        audio_float = audio.astype(np.float32) / 32768.0
        
        # Calculate RMS level
        rms = np.sqrt(np.mean(audio_float ** 2) + 1e-10)
        
        # Update envelope (peak follower)
        if rms > self._envelope:
            self._envelope = (
                self.attack_coeff * self._envelope +
                (1 - self.attack_coeff) * rms
            )
        else:
            self._envelope = (
                self.release_coeff * self._envelope +
                (1 - self.release_coeff) * rms
            )
        
        # Calculate required gain
        if self._envelope > 1e-6:
            target_linear = 10 ** (self.target_db / 20)
            required_gain = target_linear / self._envelope
            
            # Limit gain
            max_gain_linear = 10 ** (self.max_gain_db / 20)
            required_gain = min(required_gain, max_gain_linear)
            
            # Smooth gain changes
            self._current_gain = (
                0.99 * self._current_gain +
                0.01 * required_gain
            )
        
        # Apply gain
        output = audio_float * self._current_gain
        
        # Clip and convert back to int16
        output = np.clip(output, -1.0, 1.0)
        return (output * 32767).astype(np.int16)
    
    def reset(self) -> None:
        """Reset normalizer state."""
        self._envelope = 0.0
        self._current_gain = 1.0

Section 33: Bridge Lifecycle

33.1 Complete Call Lifecycle

The bridge goes through distinct phases during a call:
┌─────────────────────────────────────────────────────────────────────────────┐
│                      COMPLETE CALL LIFECYCLE                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                        PHASE 1: SETUP                                │   │
│   │                                                                      │   │
│   │   1. Webhook received: call.ringing                                 │   │
│   │   2. Create Bridge instance                                         │   │
│   │   3. Initialize GoTo WebRTC peer                                    │   │
│   │   4. Initialize LiveKit connection                                  │   │
│   │   5. Answer call via GoTo API                                       │   │
│   │   6. Receive SDP offer from GoTo                                    │   │
│   │                                                                      │   │
│   │   Duration: ~500ms                                                   │   │
│   │                                                                      │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                       │                                     │
│                                       ▼                                     │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                     PHASE 2: NEGOTIATION                             │   │
│   │                                                                      │   │
│   │   7. Parse SDP offer, extract codecs                                │   │
│   │   8. Select preferred codec (Opus > G.711)                          │   │
│   │   9. Generate SDP answer                                            │   │
│   │   10. Send answer to GoTo                                           │   │
│   │   11. Begin ICE candidate exchange                                  │   │
│   │                                                                      │   │
│   │   Duration: ~300ms                                                   │   │
│   │                                                                      │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                       │                                     │
│                                       ▼                                     │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                     PHASE 3: CONNECTION                              │   │
│   │                                                                      │   │
│   │   12. ICE connectivity checks                                       │   │
│   │   13. DTLS handshake                                                │   │
│   │   14. SRTP session established                                      │   │
│   │   15. Connection state → CONNECTED                                  │   │
│   │   16. Join LiveKit room                                             │   │
│   │   17. Publish caller audio track                                    │   │
│   │   18. Subscribe to agent audio track                                │   │
│   │                                                                      │   │
│   │   Duration: ~500-2000ms (depends on network)                        │   │
│   │                                                                      │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                       │                                     │
│                                       ▼                                     │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                      PHASE 4: ACTIVE CALL                            │   │
│   │                                                                      │   │
│   │   19. Bidirectional audio streaming                                 │   │
│   │   20. Continuous health monitoring                                  │   │
│   │   21. Handle hold/resume if needed                                  │   │
│   │   22. Handle network transitions (ICE restart)                      │   │
│   │                                                                      │   │
│   │   Duration: Call duration (seconds to hours)                        │   │
│   │                                                                      │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                       │                                     │
│                                       ▼                                     │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                       PHASE 5: TEARDOWN                              │   │
│   │                                                                      │   │
│   │   23. Call ended (hangup, timeout, or error)                        │   │
│   │   24. Stop audio processing                                         │   │
│   │   25. Leave LiveKit room                                            │   │
│   │   26. Close GoTo WebRTC connection                                  │   │
│   │   27. Delete LiveKit room                                           │   │
│   │   28. Clean up resources                                            │   │
│   │   29. Log final metrics                                             │   │
│   │                                                                      │   │
│   │   Duration: ~200ms                                                   │   │
│   │                                                                      │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

33.2 Lifecycle Manager

The lifecycle manager coordinates all phases:
# bridge/lifecycle/manager.py

import asyncio
import time
from dataclasses import dataclass, field
from typing import Optional, Callable, Dict, Any
from enum import Enum
import logging

logger = logging.getLogger(__name__)


class BridgePhase(Enum):
    """Bridge lifecycle phases."""
    CREATED = "created"
    INITIALIZING = "initializing"
    NEGOTIATING = "negotiating"
    CONNECTING = "connecting"
    CONNECTED = "connected"
    ACTIVE = "active"
    DISCONNECTING = "disconnecting"
    TERMINATED = "terminated"
    FAILED = "failed"


@dataclass
class LifecycleMetrics:
    """Metrics collected during bridge lifecycle."""
    created_at: float = 0.0
    initialized_at: float = 0.0
    negotiation_started_at: float = 0.0
    negotiation_completed_at: float = 0.0
    connected_at: float = 0.0
    first_audio_at: float = 0.0
    terminated_at: float = 0.0
    
    total_audio_frames_received: int = 0
    total_audio_frames_sent: int = 0
    total_bytes_received: int = 0
    total_bytes_sent: int = 0
    
    ice_candidates_sent: int = 0
    ice_candidates_received: int = 0
    
    reconnection_attempts: int = 0
    
    @property
    def time_to_connect(self) -> Optional[float]:
        """Time from creation to connected state."""
        if self.connected_at and self.created_at:
            return self.connected_at - self.created_at
        return None
    
    @property
    def time_to_first_audio(self) -> Optional[float]:
        """Time from creation to first audio frame."""
        if self.first_audio_at and self.created_at:
            return self.first_audio_at - self.created_at
        return None
    
    @property
    def call_duration(self) -> Optional[float]:
        """Total call duration in seconds."""
        if self.terminated_at and self.connected_at:
            return self.terminated_at - self.connected_at
        return None
    
    def to_dict(self) -> dict:
        """Convert to dictionary for logging."""
        return {
            "time_to_connect_ms": (
                self.time_to_connect * 1000 if self.time_to_connect else None
            ),
            "time_to_first_audio_ms": (
                self.time_to_first_audio * 1000 if self.time_to_first_audio else None
            ),
            "call_duration_s": self.call_duration,
            "audio_frames_received": self.total_audio_frames_received,
            "audio_frames_sent": self.total_audio_frames_sent,
            "ice_candidates_sent": self.ice_candidates_sent,
            "ice_candidates_received": self.ice_candidates_received,
            "reconnection_attempts": self.reconnection_attempts,
        }


class BridgeLifecycleManager:
    """
    Manages the complete lifecycle of a WebRTC bridge.
    
    Coordinates initialization, connection, active state,
    and teardown across all bridge components.
    
    Also handles:
    - Phase timeouts (fail if stuck in a phase)
    - Metrics collection
    - Error recovery
    """
    
    def __init__(
        self,
        call_id: str,
        bridge: "AudioBridge",
        goto_handler: "GoToConnectionHandler",
        livekit_handler: "LiveKitConnectionHandler",
    ):
        self.call_id = call_id
        self.bridge = bridge
        self.goto_handler = goto_handler
        self.livekit_handler = livekit_handler
        
        # State
        self._phase = BridgePhase.CREATED
        self._phase_lock = asyncio.Lock()
        
        # Metrics
        self.metrics = LifecycleMetrics(created_at=time.time())
        
        # Callbacks
        self._on_phase_change: Optional[Callable] = None
        self._on_error: Optional[Callable] = None
        
        # Timeouts for each phase
        self._phase_timeouts: Dict[BridgePhase, float] = {
            BridgePhase.INITIALIZING: 5.0,   # 5 seconds
            BridgePhase.NEGOTIATING: 10.0,   # 10 seconds
            BridgePhase.CONNECTING: 30.0,    # 30 seconds
        }
        
        # Timeout task
        self._timeout_task: Optional[asyncio.Task] = None
    
    async def initialize(self) -> None:
        """Initialize all bridge components."""
        await self._transition_to(BridgePhase.INITIALIZING)
        self.metrics.initialized_at = time.time()
        
        try:
            # Initialize GoTo connection
            await self.goto_handler.initialize()
            
            # Connect to LiveKit
            await self.livekit_handler.connect()
            
            logger.info(f"[{self.call_id}] Bridge initialized")
            
        except Exception as e:
            logger.error(f"[{self.call_id}] Initialization failed: {e}")
            await self._transition_to(BridgePhase.FAILED)
            raise
    
    async def handle_inbound_call(self, sdp_offer: str) -> str:
        """
        Handle an inbound call.
        
        Args:
            sdp_offer: SDP offer from GoToConnect
        
        Returns:
            SDP answer to send back
        """
        await self._transition_to(BridgePhase.NEGOTIATING)
        self.metrics.negotiation_started_at = time.time()
        
        try:
            answer = await self.goto_handler.handle_inbound_call(sdp_offer)
            self.metrics.negotiation_completed_at = time.time()
            
            await self._transition_to(BridgePhase.CONNECTING)
            
            logger.info(f"[{self.call_id}] SDP negotiation completed")
            return answer
            
        except Exception as e:
            logger.error(f"[{self.call_id}] Negotiation failed: {e}")
            await self._transition_to(BridgePhase.FAILED)
            raise
    
    async def on_connected(self) -> None:
        """Called when WebRTC connection is established."""
        await self._transition_to(BridgePhase.CONNECTED)
        self.metrics.connected_at = time.time()
        
        # Wire up audio routing
        self.bridge.set_goto_audio_source(
            self.goto_handler._remote_audio,
            self.goto_handler.config.goto_sample_rate,
        )
        self.bridge.set_goto_audio_sink(self.goto_handler._local_audio)
        self.bridge.set_livekit_publish(self.livekit_handler.publish_audio)
        self.livekit_handler.on_agent_audio = self.bridge.handle_livekit_audio
        
        # Start audio bridge
        await self.bridge.start()
        await self._transition_to(BridgePhase.ACTIVE)
        
        logger.info(
            f"[{self.call_id}] Call active. "
            f"Time to connect: {self.metrics.time_to_connect:.2f}s"
        )
    
    async def on_first_audio(self) -> None:
        """Called when first audio frame is received."""
        if self.metrics.first_audio_at == 0:
            self.metrics.first_audio_at = time.time()
            
            logger.info(
                f"[{self.call_id}] First audio. "
                f"Time to audio: {self.metrics.time_to_first_audio:.2f}s"
            )
    
    async def terminate(self, reason: str = "normal") -> None:
        """
        Terminate the bridge.
        
        Args:
            reason: Termination reason for logging
        """
        if self._phase in (BridgePhase.TERMINATED, BridgePhase.FAILED):
            return
        
        await self._transition_to(BridgePhase.DISCONNECTING)
        
        # Stop audio bridge
        try:
            await self.bridge.stop()
        except Exception as e:
            logger.warning(f"[{self.call_id}] Error stopping bridge: {e}")
        
        # Disconnect from LiveKit
        try:
            await self.livekit_handler.disconnect()
            await self.livekit_handler.delete_room()
        except Exception as e:
            logger.warning(f"[{self.call_id}] Error disconnecting LiveKit: {e}")
        
        # Close GoTo connection
        try:
            await self.goto_handler.close()
        except Exception as e:
            logger.warning(f"[{self.call_id}] Error closing GoTo: {e}")
        
        self.metrics.terminated_at = time.time()
        await self._transition_to(BridgePhase.TERMINATED)
        
        logger.info(
            f"[{self.call_id}] Call terminated ({reason}). "
            f"Duration: {self.metrics.call_duration:.1f}s, "
            f"Metrics: {self.metrics.to_dict()}"
        )
    
    async def _transition_to(self, new_phase: BridgePhase) -> None:
        """Transition to a new phase."""
        async with self._phase_lock:
            old_phase = self._phase
            self._phase = new_phase
            
            # Cancel existing timeout
            if self._timeout_task:
                self._timeout_task.cancel()
                self._timeout_task = None
            
            # Set new timeout if applicable
            if new_phase in self._phase_timeouts:
                timeout = self._phase_timeouts[new_phase]
                self._timeout_task = asyncio.create_task(
                    self._phase_timeout(new_phase, timeout)
                )
            
            logger.debug(
                f"[{self.call_id}] Phase: {old_phase.value}{new_phase.value}"
            )
            
            if self._on_phase_change:
                await self._on_phase_change(old_phase, new_phase)
    
    async def _phase_timeout(
        self,
        phase: BridgePhase,
        timeout: float,
    ) -> None:
        """Handle phase timeout."""
        try:
            await asyncio.sleep(timeout)
            
            # Check if still in this phase
            if self._phase == phase:
                logger.error(
                    f"[{self.call_id}] Timeout in {phase.value} "
                    f"after {timeout}s"
                )
                await self._transition_to(BridgePhase.FAILED)
                
                if self._on_error:
                    await self._on_error(f"Timeout in {phase.value}")
                    
        except asyncio.CancelledError:
            pass
    
    @property
    def phase(self) -> BridgePhase:
        """Current lifecycle phase."""
        return self._phase
    
    @property
    def is_active(self) -> bool:
        """Whether bridge is in active call state."""
        return self._phase == BridgePhase.ACTIVE
    
    def on_phase_change(self, callback: Callable) -> None:
        """Set phase change callback."""
        self._on_phase_change = callback
    
    def on_error(self, callback: Callable) -> None:
        """Set error callback."""
        self._on_error = callback

33.3 Bridge Manager (Service Level)

Manages all active bridges across the service:
# bridge/lifecycle/bridge_manager.py

import asyncio
from typing import Dict, Optional
import logging

from bridge.lifecycle.manager import BridgeLifecycleManager, BridgePhase

logger = logging.getLogger(__name__)


class BridgeManager:
    """
    Manages all active bridges in the service.
    
    Provides:
    - Bridge creation and lookup
    - Concurrent bridge limit enforcement
    - Graceful shutdown of all bridges
    - Health monitoring
    """
    
    def __init__(
        self,
        max_concurrent_bridges: int = 1000,
    ):
        self.max_concurrent_bridges = max_concurrent_bridges
        
        # Active bridges by call_id
        self._bridges: Dict[str, BridgeLifecycleManager] = {}
        self._lock = asyncio.Lock()
        
        # Health check task
        self._health_task: Optional[asyncio.Task] = None
    
    async def start(self) -> None:
        """Start the bridge manager."""
        self._health_task = asyncio.create_task(self._health_check_loop())
        logger.info(
            f"Bridge manager started. "
            f"Max concurrent: {self.max_concurrent_bridges}"
        )
    
    async def stop(self) -> None:
        """Stop the bridge manager and all bridges."""
        # Stop health check
        if self._health_task:
            self._health_task.cancel()
            try:
                await self._health_task
            except asyncio.CancelledError:
                pass
        
        # Terminate all bridges
        async with self._lock:
            bridges = list(self._bridges.values())
        
        if bridges:
            logger.info(f"Terminating {len(bridges)} active bridges")
            await asyncio.gather(
                *[b.terminate("service_shutdown") for b in bridges],
                return_exceptions=True,
            )
        
        logger.info("Bridge manager stopped")
    
    async def create_bridge(
        self,
        call_id: str,
        tenant_id: str,
        goto_config: dict,
        livekit_config: dict,
    ) -> BridgeLifecycleManager:
        """
        Create a new bridge for a call.
        
        Args:
            call_id: Unique call identifier
            tenant_id: Tenant identifier
            goto_config: GoToConnect configuration
            livekit_config: LiveKit configuration
        
        Returns:
            BridgeLifecycleManager for the new bridge
        
        Raises:
            BridgeCapacityError: If at max capacity
            BridgeExistsError: If bridge already exists for call
        """
        async with self._lock:
            # Check capacity
            if len(self._bridges) >= self.max_concurrent_bridges:
                raise BridgeCapacityError(
                    f"At maximum capacity: {self.max_concurrent_bridges}"
                )
            
            # Check for existing bridge
            if call_id in self._bridges:
                raise BridgeExistsError(
                    f"Bridge already exists for call: {call_id}"
                )
            
            # Create bridge components
            from bridge.audio.audio_bridge import AudioBridge, AudioBridgeConfig
            from bridge.goto.connection_handler import GoToConnectionHandler
            from bridge.livekit.connection_handler import LiveKitConnectionHandler
            
            bridge = AudioBridge(
                config=AudioBridgeConfig(),
                call_id=call_id,
            )
            
            goto_handler = GoToConnectionHandler(
                call_info=goto_config,
                goto_client=goto_config["client"],
            )
            
            livekit_handler = LiveKitConnectionHandler(
                config=livekit_config,
                call_id=call_id,
                tenant_id=tenant_id,
            )
            
            # Create lifecycle manager
            manager = BridgeLifecycleManager(
                call_id=call_id,
                bridge=bridge,
                goto_handler=goto_handler,
                livekit_handler=livekit_handler,
            )
            
            self._bridges[call_id] = manager
            
            logger.info(
                f"Created bridge for call {call_id}. "
                f"Active bridges: {len(self._bridges)}"
            )
            
            return manager
    
    async def get_bridge(
        self,
        call_id: str,
    ) -> Optional[BridgeLifecycleManager]:
        """Get an existing bridge by call ID."""
        async with self._lock:
            return self._bridges.get(call_id)
    
    async def remove_bridge(self, call_id: str) -> None:
        """Remove a bridge (after termination)."""
        async with self._lock:
            if call_id in self._bridges:
                del self._bridges[call_id]
                logger.info(
                    f"Removed bridge for call {call_id}. "
                    f"Active bridges: {len(self._bridges)}"
                )
    
    async def _health_check_loop(self) -> None:
        """Periodic health check of all bridges."""
        while True:
            try:
                await asyncio.sleep(30)  # Check every 30 seconds
                
                async with self._lock:
                    bridges = list(self._bridges.items())
                
                failed_calls = []
                
                for call_id, manager in bridges:
                    # Check for stuck bridges
                    if manager.phase == BridgePhase.FAILED:
                        failed_calls.append(call_id)
                    elif manager.phase == BridgePhase.TERMINATED:
                        failed_calls.append(call_id)
                
                # Remove failed/terminated bridges
                for call_id in failed_calls:
                    await self.remove_bridge(call_id)
                
                if failed_calls:
                    logger.info(f"Cleaned up {len(failed_calls)} bridges")
                    
            except asyncio.CancelledError:
                break
            except Exception as e:
                logger.error(f"Health check error: {e}")
    
    @property
    def active_count(self) -> int:
        """Number of active bridges."""
        return len(self._bridges)
    
    @property
    def available_capacity(self) -> int:
        """Number of additional bridges that can be created."""
        return self.max_concurrent_bridges - len(self._bridges)
    
    def get_stats(self) -> dict:
        """Get bridge manager statistics."""
        return {
            "active_bridges": len(self._bridges),
            "max_capacity": self.max_concurrent_bridges,
            "available_capacity": self.available_capacity,
        }


class BridgeCapacityError(Exception):
    """Raised when bridge manager is at capacity."""
    pass


class BridgeExistsError(Exception):
    """Raised when trying to create duplicate bridge."""
    pass

33.4 Error Recovery

Handling various failure modes:
# bridge/lifecycle/recovery.py

import asyncio
from typing import Optional
from enum import Enum
import logging

logger = logging.getLogger(__name__)


class FailureType(Enum):
    """Types of failures that can occur."""
    ICE_FAILED = "ice_failed"
    DTLS_FAILED = "dtls_failed"
    GOTO_DISCONNECTED = "goto_disconnected"
    LIVEKIT_DISCONNECTED = "livekit_disconnected"
    AUDIO_TIMEOUT = "audio_timeout"
    API_ERROR = "api_error"
    UNKNOWN = "unknown"


class RecoveryAction(Enum):
    """Actions that can be taken to recover."""
    ICE_RESTART = "ice_restart"
    RECONNECT_GOTO = "reconnect_goto"
    RECONNECT_LIVEKIT = "reconnect_livekit"
    FULL_RECONNECT = "full_reconnect"
    TERMINATE = "terminate"


class BridgeRecoveryHandler:
    """
    Handles error recovery for bridges.
    
    Determines appropriate recovery action based on
    failure type and attempts recovery.
    """
    
    def __init__(
        self,
        lifecycle_manager: "BridgeLifecycleManager",
        max_recovery_attempts: int = 3,
    ):
        self.lifecycle_manager = lifecycle_manager
        self.max_recovery_attempts = max_recovery_attempts
        
        self._recovery_attempts = 0
        self._last_failure: Optional[FailureType] = None
    
    async def handle_failure(
        self,
        failure_type: FailureType,
        error_message: str = "",
    ) -> bool:
        """
        Handle a failure and attempt recovery.
        
        Args:
            failure_type: Type of failure
            error_message: Optional error details
        
        Returns:
            True if recovered, False if should terminate
        """
        call_id = self.lifecycle_manager.call_id
        
        logger.warning(
            f"[{call_id}] Failure: {failure_type.value}. "
            f"Message: {error_message}"
        )
        
        # Check if we've exceeded retry limit
        self._recovery_attempts += 1
        if self._recovery_attempts > self.max_recovery_attempts:
            logger.error(
                f"[{call_id}] Max recovery attempts exceeded. "
                f"Terminating."
            )
            return False
        
        # Determine recovery action
        action = self._determine_action(failure_type)
        
        logger.info(
            f"[{call_id}] Recovery action: {action.value} "
            f"(attempt {self._recovery_attempts}/{self.max_recovery_attempts})"
        )
        
        # Execute recovery
        try:
            if action == RecoveryAction.ICE_RESTART:
                await self._perform_ice_restart()
                return True
                
            elif action == RecoveryAction.RECONNECT_GOTO:
                await self._perform_goto_reconnect()
                return True
                
            elif action == RecoveryAction.RECONNECT_LIVEKIT:
                await self._perform_livekit_reconnect()
                return True
                
            elif action == RecoveryAction.FULL_RECONNECT:
                await self._perform_full_reconnect()
                return True
                
            else:  # TERMINATE
                return False
                
        except Exception as e:
            logger.error(f"[{call_id}] Recovery failed: {e}")
            return False
    
    def _determine_action(
        self,
        failure_type: FailureType,
    ) -> RecoveryAction:
        """Determine appropriate recovery action."""
        mapping = {
            FailureType.ICE_FAILED: RecoveryAction.ICE_RESTART,
            FailureType.DTLS_FAILED: RecoveryAction.FULL_RECONNECT,
            FailureType.GOTO_DISCONNECTED: RecoveryAction.RECONNECT_GOTO,
            FailureType.LIVEKIT_DISCONNECTED: RecoveryAction.RECONNECT_LIVEKIT,
            FailureType.AUDIO_TIMEOUT: RecoveryAction.ICE_RESTART,
            FailureType.API_ERROR: RecoveryAction.TERMINATE,
            FailureType.UNKNOWN: RecoveryAction.TERMINATE,
        }
        return mapping.get(failure_type, RecoveryAction.TERMINATE)
    
    async def _perform_ice_restart(self) -> None:
        """Perform ICE restart on GoTo connection."""
        call_id = self.lifecycle_manager.call_id
        logger.info(f"[{call_id}] Performing ICE restart")
        
        # Request ICE restart from GoTo
        # This triggers a new offer/answer exchange
        await self.lifecycle_manager.goto_handler.request_ice_restart()
        
        # Wait for reconnection
        await asyncio.sleep(5)
        
        # Check if connected
        if not self.lifecycle_manager.goto_handler.is_connected:
            raise RuntimeError("ICE restart failed")
        
        self.lifecycle_manager.metrics.reconnection_attempts += 1
    
    async def _perform_goto_reconnect(self) -> None:
        """Reconnect to GoToConnect."""
        call_id = self.lifecycle_manager.call_id
        logger.info(f"[{call_id}] Reconnecting to GoToConnect")
        
        # Close existing connection
        await self.lifecycle_manager.goto_handler.close()
        
        # Reinitialize
        await self.lifecycle_manager.goto_handler.initialize()
        
        self.lifecycle_manager.metrics.reconnection_attempts += 1
    
    async def _perform_livekit_reconnect(self) -> None:
        """Reconnect to LiveKit."""
        call_id = self.lifecycle_manager.call_id
        logger.info(f"[{call_id}] Reconnecting to LiveKit")
        
        # Disconnect
        await self.lifecycle_manager.livekit_handler.disconnect()
        
        # Wait briefly
        await asyncio.sleep(1)
        
        # Reconnect
        await self.lifecycle_manager.livekit_handler.connect()
        
        self.lifecycle_manager.metrics.reconnection_attempts += 1
    
    async def _perform_full_reconnect(self) -> None:
        """Perform full reconnection of both sides."""
        await self._perform_goto_reconnect()
        await self._perform_livekit_reconnect()
    
    def reset_attempts(self) -> None:
        """Reset recovery attempt counter (after successful period)."""
        self._recovery_attempts = 0

33.5 Testing the Bridge

Unit Tests

# tests/bridge/test_lifecycle.py

import pytest
import asyncio
from unittest.mock import AsyncMock, MagicMock

from bridge.lifecycle.manager import (
    BridgeLifecycleManager,
    BridgePhase,
    LifecycleMetrics,
)


@pytest.fixture
def mock_bridge():
    """Create a mock audio bridge."""
    bridge = MagicMock()
    bridge.start = AsyncMock()
    bridge.stop = AsyncMock()
    return bridge


@pytest.fixture
def mock_goto_handler():
    """Create a mock GoTo handler."""
    handler = MagicMock()
    handler.initialize = AsyncMock()
    handler.handle_inbound_call = AsyncMock(return_value="v=0\r\n...")
    handler.close = AsyncMock()
    return handler


@pytest.fixture
def mock_livekit_handler():
    """Create a mock LiveKit handler."""
    handler = MagicMock()
    handler.connect = AsyncMock()
    handler.disconnect = AsyncMock()
    handler.delete_room = AsyncMock()
    return handler


@pytest.mark.asyncio
async def test_lifecycle_phases(
    mock_bridge,
    mock_goto_handler,
    mock_livekit_handler,
):
    """Test that lifecycle progresses through correct phases."""
    manager = BridgeLifecycleManager(
        call_id="test-call",
        bridge=mock_bridge,
        goto_handler=mock_goto_handler,
        livekit_handler=mock_livekit_handler,
    )
    
    # Initial state
    assert manager.phase == BridgePhase.CREATED
    
    # Initialize
    await manager.initialize()
    assert manager.phase == BridgePhase.CONNECTING
    
    # Mock SDP negotiation
    await manager.handle_inbound_call("v=0\r\n...")
    
    # Mock connection
    await manager.on_connected()
    assert manager.phase == BridgePhase.ACTIVE
    
    # Terminate
    await manager.terminate()
    assert manager.phase == BridgePhase.TERMINATED


@pytest.mark.asyncio
async def test_metrics_collection(
    mock_bridge,
    mock_goto_handler,
    mock_livekit_handler,
):
    """Test that metrics are collected correctly."""
    manager = BridgeLifecycleManager(
        call_id="test-call",
        bridge=mock_bridge,
        goto_handler=mock_goto_handler,
        livekit_handler=mock_livekit_handler,
    )
    
    # Initialize and connect
    await manager.initialize()
    await manager.handle_inbound_call("v=0\r\n...")
    await manager.on_connected()
    
    # Verify timing metrics
    assert manager.metrics.time_to_connect is not None
    assert manager.metrics.time_to_connect > 0
    
    # Terminate
    await manager.terminate()
    
    # Verify duration
    assert manager.metrics.call_duration is not None


@pytest.mark.asyncio
async def test_phase_timeout():
    """Test that phases timeout correctly."""
    # This test would verify timeout behavior
    # Implementation left as exercise
    pass

Integration Tests

# tests/bridge/test_integration.py

import pytest
import asyncio

from bridge.lifecycle.bridge_manager import BridgeManager


@pytest.mark.asyncio
async def test_bridge_manager_capacity():
    """Test bridge manager enforces capacity limits."""
    manager = BridgeManager(max_concurrent_bridges=2)
    await manager.start()
    
    try:
        # Create two bridges
        await manager.create_bridge(
            call_id="call-1",
            tenant_id="tenant-1",
            goto_config={},
            livekit_config={},
        )
        await manager.create_bridge(
            call_id="call-2",
            tenant_id="tenant-1",
            goto_config={},
            livekit_config={},
        )
        
        # Third should fail
        with pytest.raises(Exception) as exc_info:
            await manager.create_bridge(
                call_id="call-3",
                tenant_id="tenant-1",
                goto_config={},
                livekit_config={},
            )
        
        assert "capacity" in str(exc_info.value).lower()
        
    finally:
        await manager.stop()

Part 5 Summary

In this part, you learned about the WebRTC Bridge Service:

Section 28: Bridge Architecture

  • The bridge connects GoToConnect (phone calls) to LiveKit (AI processing)
  • Uses aiortc (Python WebRTC) for direct control over audio
  • Multi-threaded design for performance
  • State machine ensures consistent behavior

Section 29: aiortc WebRTC Implementation

  • WebRTC uses offer/answer model for negotiation
  • SDP describes media capabilities
  • ICE handles NAT traversal for connectivity
  • Custom connection wrapper for simplified usage

Section 30: Audio Capture & Processing

  • Audio frames are processed at 20ms intervals
  • Sample rates vary: 8kHz (G.711), 16kHz (wideband), 48kHz (Opus)
  • Ring buffers handle timing variations
  • Resampling converts between rates

Section 31: LiveKit Connection

  • LiveKit provides room-based real-time communication
  • Token-based authentication with scoped permissions
  • Participants publish and subscribe to tracks
  • Audio flows from bridge to agent and back

Section 32: Audio Routing

  • Bidirectional audio: inbound (caller→agent) and outbound (agent→caller)
  • Buffers smooth out timing jitter
  • Optional volume normalization for consistency

Section 33: Bridge Lifecycle

  • Five phases: Setup → Negotiation → Connection → Active → Teardown
  • Lifecycle manager coordinates all components
  • Metrics collected for monitoring
  • Error recovery handles common failures

What’s Next

In Part 6: LiveKit Integration, you’ll learn:
  • LiveKit Cloud setup and configuration
  • Room management for calls
  • AI Agent framework integration
  • Recording with LiveKit Egress
  • Real-time events and monitoring

End of Part 5

Junior Developer PRD - Part 6: LiveKit Integration

Comprehensive Implementation Guide for Junior Developers


Document Information

FieldValue
Document TitleJunior Developer PRD - Part 6: LiveKit Integration
Version1.0.0
Last UpdatedJanuary 2026
AuthorVoice by aiConnected Technical Team
StatusDraft
AudienceJunior Developers
PrerequisitesParts 1-5 of this PRD
Estimated Reading Time45 minutes

Table of Contents


Section 34: LiveKit Cloud Setup

34.1 Account Creation

What is LiveKit?

LiveKit is an open-source platform for real-time audio and video communication. Think of it as the infrastructure that enables multiple people (or AI agents and phone callers) to talk to each other in real-time, similar to how Zoom or Google Meet works behind the scenes. For Voice by aiConnected, LiveKit serves as the central hub where:
  • Phone callers (via GoToConnect) connect
  • AI agents join to process speech
  • Human supervisors can monitor calls
  • All audio is routed between participants

Why LiveKit Cloud?

LiveKit offers two deployment options:
OptionDescriptionWhen to Use
Self-hostedYou run LiveKit servers yourselfLarge scale, strict data requirements
LiveKit CloudLiveKit manages servers for youFaster setup, automatic scaling, global reach
Voice by aiConnected uses LiveKit Cloud because it:
  • Eliminates server management overhead
  • Provides automatic global distribution
  • Scales automatically with call volume
  • Reduces operational complexity

Creating a LiveKit Cloud Account

Step 1: Sign Up Go to https://cloud.livekit.io and create an account. Step 2: Create a Project After signing in:
  1. Click “Create Project”
  2. Name it something like “voice-aiconnected-prod” or “voice-aiconnected-dev”
  3. Select a primary region (we use us-west-2)
Step 3: Get Your Credentials After creating the project, you’ll receive two critical pieces of information:
CredentialWhat It IsExample Format
API KeyPublic identifier for your projectAPIxxxxxxxx
API SecretPrivate key for signing tokensxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
⚠️ CRITICAL: Never commit your API Secret to source code. Always use environment variables.

LiveKit URLs

Your LiveKit Cloud project provides these URLs:
WebSocket URL: wss://aiconnected.livekit.cloud
API URL: https://aiconnected.livekit.cloud
The WebSocket URL is used for real-time connections (clients joining rooms), while the API URL is used for server-side operations (creating rooms, managing participants).

34.2 Project Configuration

Environment Variables

Create these environment variables for your LiveKit configuration:
# .env file (NEVER commit this file)

# LiveKit Core Credentials
LIVEKIT_API_KEY=APIxxxxxxxxx
LIVEKIT_API_SECRET=your-secret-key-here

# LiveKit URLs
LIVEKIT_WS_URL=wss://aiconnected.livekit.cloud
LIVEKIT_API_URL=https://aiconnected.livekit.cloud

# Optional: Region Configuration
LIVEKIT_PRIMARY_REGION=us-west-2
LIVEKIT_FALLBACK_REGIONS=us-east-1,eu-west-1

Configuration Data Class

In Python, we create a configuration class to manage LiveKit settings:
"""
LiveKit Cloud configuration for Voice by aiConnected.
"""
from dataclasses import dataclass
from typing import Optional, List
import os


@dataclass
class LiveKitConfig:
    """
    LiveKit Cloud configuration.
    
    This class holds all settings needed to connect to LiveKit Cloud.
    Values are loaded from environment variables for security.
    """
    
    # API credentials - these authenticate our platform with LiveKit
    api_key: str
    api_secret: str
    
    # WebSocket URL for real-time connections
    # Clients (agents, bridges) connect here to join rooms
    ws_url: str  # Example: wss://aiconnected.livekit.cloud
    
    # HTTP URL for REST API calls
    # Server uses this to create rooms, manage participants, etc.
    api_url: str  # Example: https://aiconnected.livekit.cloud
    
    # Region configuration for global deployment
    primary_region: str = "us-west-2"
    fallback_regions: List[str] = None
    
    # Connection settings
    max_reconnect_attempts: int = 5      # How many times to retry connection
    reconnect_interval_ms: int = 1000    # Wait 1 second between retries
    connection_timeout_ms: int = 10000   # Timeout after 10 seconds
    
    # Room defaults
    default_room_empty_timeout: int = 300  # 5 minutes - room closes if empty
    default_max_participants: int = 10     # Max people/agents in one room
    
    def __post_init__(self):
        """Set default fallback regions if not provided."""
        if self.fallback_regions is None:
            self.fallback_regions = ["us-east-1", "eu-west-1"]
    
    @classmethod
    def from_environment(cls) -> "LiveKitConfig":
        """
        Load configuration from environment variables.
        
        This is the recommended way to create a config instance
        because it keeps secrets out of source code.
        
        Raises:
            KeyError: If required environment variables are missing
        """
        return cls(
            api_key=os.environ["LIVEKIT_API_KEY"],
            api_secret=os.environ["LIVEKIT_API_SECRET"],
            ws_url=os.environ.get(
                "LIVEKIT_WS_URL", 
                "wss://aiconnected.livekit.cloud"
            ),
            api_url=os.environ.get(
                "LIVEKIT_API_URL",
                "https://aiconnected.livekit.cloud"
            ),
        )
    
    def get_region_url(self, region: str) -> str:
        """
        Get WebSocket URL for a specific region.
        
        Useful for connecting to specific geographic regions
        for lower latency.
        
        Args:
            region: AWS region code like 'us-west-2'
            
        Returns:
            WebSocket URL for that region
        """
        return f"wss://{region}.aiconnected.livekit.cloud"


# Global configuration instance (singleton pattern)
_config: Optional[LiveKitConfig] = None


def get_livekit_config() -> LiveKitConfig:
    """
    Get the global LiveKit configuration.
    
    Uses lazy initialization - config is only loaded
    from environment on first access.
    
    Returns:
        LiveKitConfig instance
    """
    global _config
    if _config is None:
        _config = LiveKitConfig.from_environment()
    return _config

Why These Settings Matter

SettingPurposeImpact if Wrong
api_keyIdentifies your projectCan’t authenticate
api_secretSigns tokensTokens rejected
ws_urlWhere clients connectConnection fails
api_urlWhere server calls goCan’t create rooms
reconnect_attemptsRetry limitDrops calls too easily OR hangs
room_empty_timeoutRoom cleanupWastes resources OR drops calls

34.3 API Credentials

Understanding API Keys vs API Secrets

Think of these like a username and password:
CredentialPublic/PrivateWhere UsedCan Be Shared?
API KeyPublicIn tokens, logs, debuggingYes
API SecretPRIVATEOnly on serverNEVER

How Credentials Are Used

┌─────────────────────────────────────────────────────────────────────────────┐
│                     API Credential Usage Flow                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐       │
│  │   Your Server   │     │  JWT Token      │     │  LiveKit Cloud  │       │
│  │                 │     │                 │     │                 │       │
│  │  api_key        │────▶│  Contains:      │────▶│  Validates:     │       │
│  │  api_secret     │     │  - api_key      │     │  - Signature    │       │
│  │                 │     │  - Permissions  │     │  - Expiration   │       │
│  │  Creates token  │     │  - Signed with  │     │  - API key      │       │
│  │  for client     │     │    api_secret   │     │                 │       │
│  └─────────────────┘     └─────────────────┘     └─────────────────┘       │
│                                                                             │
│  The api_secret NEVER leaves your server - it's only used to sign tokens   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Secure Credential Storage

DO NOT do this:
# ❌ WRONG - credentials in code
api_key = "APIabcdefgh"
api_secret = "supersecretkey123"
DO this instead:
# ✅ CORRECT - credentials from environment
import os

api_key = os.environ["LIVEKIT_API_KEY"]
api_secret = os.environ["LIVEKIT_API_SECRET"]

Credential Rotation

If you suspect your API secret has been compromised:
  1. Go to LiveKit Cloud dashboard
  2. Navigate to your project settings
  3. Click “Rotate API Secret”
  4. Update your environment variables immediately
  5. Restart all services

34.4 Webhook Configuration

What Are Webhooks?

Webhooks are HTTP callbacks that LiveKit sends to your server when events happen. Instead of constantly asking LiveKit “Did anything happen?”, LiveKit tells you when something happens.

Events LiveKit Can Notify You About

EventWhen It FiresWhat You Might Do
room_startedRoom is createdStart billing timer
room_finishedRoom closesStop billing, save analytics
participant_joinedSomeone joinsUpdate dashboard, log
participant_leftSomeone leavesCheck if call ended
track_publishedAudio/video startsVerify connection working
egress_startedRecording beginsLog for compliance
egress_endedRecording endsProcess/store recording

Setting Up Webhooks in LiveKit Cloud

Step 1: Configure Your Webhook Endpoint In your LiveKit Cloud project settings:
  1. Go to “Webhooks” section
  2. Add your endpoint URL: https://api.yourdomain.com/webhooks/livekit
  3. Select which events you want to receive
Step 2: Create a Webhook Handler
"""
LiveKit webhook receiver endpoint.
"""
from fastapi import FastAPI, Request, HTTPException
import hmac
import hashlib
import jwt
import json
import logging

logger = logging.getLogger(__name__)

app = FastAPI()


@app.post("/webhooks/livekit")
async def handle_livekit_webhook(request: Request):
    """
    Receive and process LiveKit webhook events.
    
    LiveKit sends POST requests to this endpoint when
    events occur (room created, participant joined, etc.)
    """
    # Get the raw body for signature verification
    body = await request.body()
    
    # Get the Authorization header (contains the JWT)
    auth_header = request.headers.get("Authorization")
    
    if not auth_header:
        logger.warning("Webhook received without Authorization header")
        raise HTTPException(status_code=401, detail="Missing Authorization")
    
    # Validate the webhook signature
    if not validate_webhook(body, auth_header):
        logger.warning("Invalid webhook signature")
        raise HTTPException(status_code=401, detail="Invalid signature")
    
    # Parse the webhook payload
    try:
        payload = json.loads(body)
        event_type = payload.get("event")
        
        logger.info(f"Received LiveKit webhook: {event_type}")
        
        # Route to appropriate handler based on event type
        if event_type == "room_started":
            await handle_room_started(payload)
        elif event_type == "room_finished":
            await handle_room_finished(payload)
        elif event_type == "participant_joined":
            await handle_participant_joined(payload)
        elif event_type == "participant_left":
            await handle_participant_left(payload)
        # ... handle other events
        
        return {"status": "ok"}
        
    except json.JSONDecodeError:
        logger.error("Failed to parse webhook JSON")
        raise HTTPException(status_code=400, detail="Invalid JSON")


def validate_webhook(body: bytes, auth_header: str) -> bool:
    """
    Validate that the webhook actually came from LiveKit.
    
    LiveKit signs webhooks using JWT with your API secret.
    This ensures attackers can't send fake events.
    """
    try:
        # Extract token from "Bearer <token>" format
        if auth_header.startswith("Bearer "):
            token = auth_header[7:]
        else:
            token = auth_header
        
        # Decode and validate the JWT
        config = get_livekit_config()
        
        payload = jwt.decode(
            token,
            config.api_secret,
            algorithms=["HS256"],
            options={"verify_exp": True}
        )
        
        # Verify the API key matches
        if payload.get("iss") != config.api_key:
            return False
        
        # If there's a body hash, verify it
        if "sha256" in payload:
            expected_hash = payload["sha256"]
            actual_hash = hashlib.sha256(body).hexdigest()
            
            if not hmac.compare_digest(expected_hash, actual_hash):
                return False
        
        return True
        
    except jwt.ExpiredSignatureError:
        return False
    except jwt.InvalidTokenError:
        return False

Webhook Security Best Practices

  1. Always validate signatures - Never process webhooks without checking the JWT
  2. Use HTTPS - LiveKit won’t send webhooks to HTTP endpoints
  3. Respond quickly - Return 200 OK within a few seconds
  4. Process asynchronously - Queue events for background processing
  5. Handle duplicates - Webhooks might be sent more than once

Section 35: Room Management

35.1 Room Naming Convention

Why Naming Conventions Matter

In LiveKit, rooms are identified by name. A good naming convention:
  • Makes debugging easier (you can tell what a room is for)
  • Enables filtering (find all rooms for a specific tenant)
  • Prevents collisions (two different calls won’t share a room name)
  • Supports multi-tenancy (isolate tenants from each other)

Voice by aiConnected Room Naming Format

Format: {type}-{tenant_id}-{call_id}[-{suffix}]

Examples:
- call-acme-550e8400-e29b-41d4-a716-446655440000
- outbound-bigco-7c9e6679-7425-40de-944b-e07fc1f90ae7
- transfer-acme-550e8400-e29b-41d4-a716-446655440000-warm

Breaking Down the Format

ComponentPurposeRulesExample
typeWhat kind of callcall, outbound, transfer, conference, testcall
tenant_idWhich customerLowercase, alphanumeric with hyphensacme-corp
call_idUnique call identifierUUID format550e8400-e29b-41d4...
suffixOptional variantLowercase, alphanumeric with hyphenswarm

Room Types

from enum import Enum


class RoomType(Enum):
    """Types of rooms in the Voice by aiConnected platform."""
    
    CALL = "call"           # Standard inbound/outbound calls
    OUTBOUND = "outbound"   # Explicitly outbound campaigns
    TRANSFER = "transfer"   # Call transfer staging rooms
    CONFERENCE = "conference"  # Multi-party conferences
    TEST = "test"           # Testing and development

Room Naming Implementation

"""
Room naming utilities and conventions.
"""
from dataclasses import dataclass
from typing import Optional
import uuid
import re


@dataclass
class RoomNameComponents:
    """
    Parsed components of a room name.
    
    When we receive a room name like "call-acme-550e8400...",
    this class holds each piece separately for easy access.
    """
    room_type: RoomType
    tenant_id: str
    call_id: str
    suffix: Optional[str] = None
    
    @property
    def full_name(self) -> str:
        """Reconstruct the full room name from components."""
        name = f"{self.room_type.value}-{self.tenant_id}-{self.call_id}"
        if self.suffix:
            name = f"{name}-{self.suffix}"
        return name


class RoomNaming:
    """
    Utilities for creating and parsing room names.
    
    This class ensures all room names follow our convention,
    making it easy to:
    - Generate consistent names
    - Parse names to get components
    - Validate names are correct
    """
    
    # Regular expression pattern for valid room names
    # This matches: type-tenant-uuid[-suffix]
    PATTERN = re.compile(
        r"^(?P<type>call|outbound|transfer|conference|test)-"
        r"(?P<tenant>[a-z0-9-]+)-"
        r"(?P<call_id>[a-f0-9-]{36})"
        r"(?:-(?P<suffix>[a-z0-9-]+))?$"
    )
    
    @classmethod
    def generate(
        cls,
        room_type: RoomType,
        tenant_id: str,
        call_id: str = None,
        suffix: str = None,
    ) -> str:
        """
        Generate a room name following our convention.
        
        Args:
            room_type: Type of room (CALL, OUTBOUND, etc.)
            tenant_id: Tenant identifier (must be lowercase alphanumeric)
            call_id: Call identifier (UUID, auto-generated if not provided)
            suffix: Optional suffix for variants like 'warm' transfers
            
        Returns:
            Formatted room name string
            
        Raises:
            ValueError: If tenant_id or suffix contains invalid characters
            
        Example:
            >>> RoomNaming.generate(RoomType.CALL, "acme")
            'call-acme-550e8400-e29b-41d4-a716-446655440000'
        """
        # Validate tenant_id - only lowercase letters, numbers, hyphens
        if not re.match(r"^[a-z0-9-]+$", tenant_id):
            raise ValueError(
                f"Invalid tenant_id: {tenant_id}. "
                "Must be lowercase alphanumeric with hyphens only."
            )
        
        # Generate or validate call_id
        if call_id is None:
            call_id = str(uuid.uuid4())
        else:
            # Make sure it's a valid UUID
            try:
                uuid.UUID(call_id)
            except ValueError:
                raise ValueError(f"Invalid call_id: {call_id}. Must be a valid UUID.")
        
        # Build the name
        name = f"{room_type.value}-{tenant_id}-{call_id}"
        
        # Add suffix if provided
        if suffix:
            if not re.match(r"^[a-z0-9-]+$", suffix):
                raise ValueError(
                    f"Invalid suffix: {suffix}. "
                    "Must be lowercase alphanumeric with hyphens only."
                )
            name = f"{name}-{suffix}"
        
        return name
    
    @classmethod
    def parse(cls, room_name: str) -> Optional[RoomNameComponents]:
        """
        Parse a room name into its components.
        
        Args:
            room_name: Room name to parse
            
        Returns:
            RoomNameComponents if valid, None if name doesn't match pattern
            
        Example:
            >>> result = RoomNaming.parse("call-acme-550e8400-e29b-41d4-a716-446655440000")
            >>> result.tenant_id
            'acme'
        """
        match = cls.PATTERN.match(room_name)
        if not match:
            return None
        
        return RoomNameComponents(
            room_type=RoomType(match.group("type")),
            tenant_id=match.group("tenant"),
            call_id=match.group("call_id"),
            suffix=match.group("suffix"),
        )
    
    @classmethod
    def is_valid(cls, room_name: str) -> bool:
        """Check if a room name follows our convention."""
        return cls.PATTERN.match(room_name) is not None
    
    @classmethod
    def for_tenant(cls, tenant_id: str) -> str:
        """
        Get a filter pattern for all rooms belonging to a tenant.
        
        Useful for listing or monitoring all rooms for a specific customer.
        """
        return f"*-{tenant_id}-*"

35.2 Room Creation Logic

When Rooms Are Created

Rooms are created at the start of a call:
┌─────────────────────────────────────────────────────────────────────────────┐
│                         Room Creation Flow                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  1. Call Arrives ──▶ 2. Create Room ──▶ 3. Participants Join               │
│                                                                             │
│  ┌─────────────┐     ┌─────────────────┐     ┌─────────────────────────┐   │
│  │ GoToConnect │     │ Room Service    │     │ Room: call-acme-xxx     │   │
│  │ webhook     │────▶│ creates room    │────▶│                         │   │
│  │ received    │     │ in LiveKit      │     │ ┌───────┐   ┌───────┐   │   │
│  └─────────────┘     └─────────────────┘     │ │Caller │   │ Agent │   │   │
│                                              │ └───────┘   └───────┘   │   │
│                                              └─────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Room Configuration

"""
Room configuration for different call scenarios.
"""
from dataclasses import dataclass, field
from typing import Dict, Any
from datetime import datetime
import json


@dataclass
class RoomConfig:
    """
    Configuration for creating a LiveKit room.
    
    This defines all the settings for a new room, including
    timeouts, participant limits, and metadata about the call.
    """
    
    # Basic settings
    name: str                          # Room name (follows our convention)
    empty_timeout: int = 300           # Seconds before empty room closes (5 min)
    max_participants: int = 10         # Maximum people/agents allowed
    
    # Call metadata - who's calling who
    tenant_id: str = ""
    call_direction: str = "inbound"    # inbound, outbound
    caller_number: str = ""            # Phone number of caller
    called_number: str = ""            # Phone number that was dialed
    agent_id: str = ""                 # Which AI agent handles this call
    
    # Feature flags
    enable_recording: bool = False     # Should we record this call?
    enable_transcription: bool = True  # Should we transcribe this call?
    
    # Additional custom metadata
    custom_metadata: Dict[str, Any] = field(default_factory=dict)
    
    def to_metadata_json(self) -> str:
        """
        Convert configuration to JSON for LiveKit room metadata.
        
        LiveKit stores metadata as a JSON string, so we need
        to serialize our configuration.
        """
        metadata = {
            "tenant_id": self.tenant_id,
            "call_direction": self.call_direction,
            "caller_number": self.caller_number,
            "called_number": self.called_number,
            "agent_id": self.agent_id,
            "enable_recording": self.enable_recording,
            "enable_transcription": self.enable_transcription,
            "created_at": datetime.utcnow().isoformat(),
            **self.custom_metadata,
        }
        return json.dumps(metadata)

Room Service Implementation

"""
Service for creating and managing LiveKit rooms.
"""
from livekit.api import LiveKitAPI, CreateRoomRequest, DeleteRoomRequest
from typing import Optional, Dict, List
import logging

logger = logging.getLogger(__name__)


class RoomService:
    """
    Service for managing LiveKit rooms.
    
    This class handles all room operations:
    - Creating rooms for new calls
    - Getting information about existing rooms
    - Deleting rooms when calls end
    - Listing rooms for a tenant
    
    It also maintains a local cache of room information
    to reduce API calls to LiveKit.
    """
    
    def __init__(self, config: LiveKitConfig):
        """
        Initialize the room service.
        
        Args:
            config: LiveKit configuration with API credentials
        """
        self.config = config
        self._api = LiveKitAPI(
            url=config.api_url,
            api_key=config.api_key,
            api_secret=config.api_secret,
        )
        
        # Cache of active rooms to avoid repeated API calls
        self._active_rooms: Dict[str, RoomInfo] = {}
    
    async def create_room(self, config: RoomConfig) -> RoomInfo:
        """
        Create a new LiveKit room.
        
        This is called when a new call starts to create
        the "meeting space" where participants will connect.
        
        Args:
            config: Room configuration with name, settings, metadata
            
        Returns:
            RoomInfo with details about the created room
            
        Raises:
            RoomCreationError: If LiveKit rejects the request
            
        Example:
            >>> room_config = RoomConfig(
            ...     name="call-acme-123...",
            ...     tenant_id="acme",
            ...     caller_number="+15551234567"
            ... )
            >>> room = await room_service.create_room(room_config)
            >>> print(room.sid)  # Server-assigned room ID
        """
        try:
            # Build the request for LiveKit
            request = CreateRoomRequest(
                name=config.name,
                empty_timeout=config.empty_timeout,
                max_participants=config.max_participants,
                metadata=config.to_metadata_json(),
            )
            
            logger.info(
                f"Creating room: {config.name}",
                extra={
                    "tenant_id": config.tenant_id,
                    "call_direction": config.call_direction,
                }
            )
            
            # Call LiveKit API
            room = await self._api.room.create_room(request)
            
            # Convert to our internal format
            room_info = RoomInfo(
                name=room.name,
                sid=room.sid,
                creation_time=datetime.fromtimestamp(room.creation_time),
                num_participants=room.num_participants,
                metadata=json.loads(room.metadata) if room.metadata else {},
            )
            
            # Cache it
            self._active_rooms[config.name] = room_info
            
            logger.info(
                f"Room created: {config.name} (sid: {room_info.sid})"
            )
            
            return room_info
            
        except Exception as e:
            logger.error(f"Failed to create room {config.name}: {e}")
            raise RoomCreationError(f"Failed to create room: {e}") from e
    
    async def get_room(self, room_name: str) -> Optional[RoomInfo]:
        """
        Get information about an existing room.
        
        First checks local cache, then queries LiveKit if needed.
        
        Args:
            room_name: Name of the room to look up
            
        Returns:
            RoomInfo if room exists, None otherwise
        """
        # Check cache first
        if room_name in self._active_rooms:
            return self._active_rooms[room_name]
        
        try:
            # Query LiveKit
            rooms = await self._api.room.list_rooms([room_name])
            
            if rooms and len(rooms) > 0:
                room = rooms[0]
                room_info = RoomInfo(
                    name=room.name,
                    sid=room.sid,
                    creation_time=datetime.fromtimestamp(room.creation_time),
                    num_participants=room.num_participants,
                    metadata=json.loads(room.metadata) if room.metadata else {},
                )
                
                # Update cache
                self._active_rooms[room_name] = room_info
                return room_info
            
            return None
            
        except Exception as e:
            logger.error(f"Failed to get room {room_name}: {e}")
            return None
    
    async def delete_room(self, room_name: str) -> bool:
        """
        Delete a room.
        
        Called when a call ends to clean up resources.
        Any participants still in the room will be disconnected.
        
        Args:
            room_name: Name of the room to delete
            
        Returns:
            True if deleted successfully, False otherwise
        """
        try:
            await self._api.room.delete_room(
                DeleteRoomRequest(room=room_name)
            )
            
            # Remove from cache
            self._active_rooms.pop(room_name, None)
            
            logger.info(f"Room deleted: {room_name}")
            return True
            
        except Exception as e:
            logger.error(f"Failed to delete room {room_name}: {e}")
            return False
    
    async def list_rooms_for_tenant(self, tenant_id: str) -> List[RoomInfo]:
        """
        List all active rooms for a specific tenant.
        
        Useful for dashboards showing current call activity.
        
        Args:
            tenant_id: Tenant identifier
            
        Returns:
            List of RoomInfo for all active rooms
        """
        try:
            # Get all rooms from LiveKit
            all_rooms = await self._api.room.list_rooms()
            
            # Filter to just this tenant's rooms
            tenant_rooms = []
            for room in all_rooms:
                # Parse room name to check tenant
                parsed = RoomNaming.parse(room.name)
                if parsed and parsed.tenant_id == tenant_id:
                    room_info = RoomInfo(
                        name=room.name,
                        sid=room.sid,
                        creation_time=datetime.fromtimestamp(room.creation_time),
                        num_participants=room.num_participants,
                        metadata=json.loads(room.metadata) if room.metadata else {},
                    )
                    tenant_rooms.append(room_info)
                    self._active_rooms[room.name] = room_info
            
            return tenant_rooms
            
        except Exception as e:
            logger.error(f"Failed to list rooms for tenant {tenant_id}: {e}")
            return []


class RoomCreationError(Exception):
    """Raised when room creation fails."""
    pass

35.3 Room Configuration Options

Configuration Options Explained

OptionTypeDefaultPurpose
empty_timeoutint (seconds)300How long room stays open with no participants
max_participantsint10Maximum concurrent participants
metadataJSON string-Custom data stored with the room

Different Configurations for Different Call Types

"""
Room configuration presets for different call scenarios.
"""
from dataclasses import dataclass
from typing import Optional
import uuid


@dataclass
class CallContext:
    """
    Context information about a call.
    
    Contains everything we know about a call that affects
    how we configure its room.
    """
    tenant_id: str
    caller_number: str
    called_number: str
    call_id: str = None
    agent_id: str = None
    campaign_id: str = None
    
    def __post_init__(self):
        # Auto-generate call_id if not provided
        if self.call_id is None:
            self.call_id = str(uuid.uuid4())


class RoomConfigFactory:
    """
    Factory for creating room configurations.
    
    Instead of manually setting up RoomConfig for each call,
    use this factory to get sensible defaults for different
    call types.
    """
    
    @staticmethod
    def for_inbound_call(
        context: CallContext,
        enable_recording: bool = False,
    ) -> RoomConfig:
        """
        Create room config for an inbound call.
        
        Inbound calls are customer-initiated. We use:
        - 5 minute empty timeout (in case of brief disconnections)
        - Up to 5 participants (caller, agent, potential supervisors)
        - Transcription enabled
        
        Args:
            context: Call context with tenant, phone numbers, etc.
            enable_recording: Whether to record this call
            
        Returns:
            RoomConfig optimized for inbound calls
        """
        return RoomConfig(
            name=RoomNaming.generate(
                room_type=RoomType.CALL,
                tenant_id=context.tenant_id,
                call_id=context.call_id,
            ),
            empty_timeout=300,       # 5 minutes
            max_participants=5,
            tenant_id=context.tenant_id,
            call_direction="inbound",
            caller_number=context.caller_number,
            called_number=context.called_number,
            agent_id=context.agent_id or "",
            enable_recording=enable_recording,
            enable_transcription=True,
            custom_metadata={
                "source": "goto_inbound",
            },
        )
    
    @staticmethod
    def for_outbound_call(
        context: CallContext,
        campaign_type: str = "general",
        enable_recording: bool = True,
    ) -> RoomConfig:
        """
        Create room config for an outbound call.
        
        Outbound calls are agent-initiated (calling customers).
        We use:
        - Shorter 2 minute timeout
        - Recording often enabled for compliance
        - Campaign metadata for tracking
        
        Args:
            context: Call context
            campaign_type: Type of outbound campaign
            enable_recording: Whether to record (usually True for compliance)
            
        Returns:
            RoomConfig optimized for outbound calls
        """
        return RoomConfig(
            name=RoomNaming.generate(
                room_type=RoomType.OUTBOUND,
                tenant_id=context.tenant_id,
                call_id=context.call_id,
            ),
            empty_timeout=120,       # 2 minutes (shorter for outbound)
            max_participants=5,
            tenant_id=context.tenant_id,
            call_direction="outbound",
            caller_number=context.called_number,  # Our number
            called_number=context.caller_number,  # Customer number
            agent_id=context.agent_id or "",
            enable_recording=enable_recording,
            enable_transcription=True,
            custom_metadata={
                "source": "outbound_campaign",
                "campaign_id": context.campaign_id or "",
                "campaign_type": campaign_type,
            },
        )
    
    @staticmethod
    def for_warm_transfer(
        context: CallContext,
        source_room: str,
        target_extension: str,
    ) -> RoomConfig:
        """
        Create room config for a warm transfer.
        
        Warm transfers connect the caller to a new recipient
        while the original agent introduces them. We use:
        - Short 1 minute timeout
        - Reference to original room
        - Always record transfers
        
        Args:
            context: Call context
            source_room: Name of the original room
            target_extension: Extension being transferred to
            
        Returns:
            RoomConfig for the transfer staging room
        """
        return RoomConfig(
            name=RoomNaming.generate(
                room_type=RoomType.TRANSFER,
                tenant_id=context.tenant_id,
                call_id=context.call_id,
                suffix="warm",
            ),
            empty_timeout=60,        # 1 minute
            max_participants=5,
            tenant_id=context.tenant_id,
            call_direction="transfer",
            caller_number=context.caller_number,
            called_number=target_extension,
            agent_id=context.agent_id or "",
            enable_recording=True,   # Always record transfers
            enable_transcription=True,
            custom_metadata={
                "transfer_type": "warm",
                "source_room": source_room,
                "target_extension": target_extension,
            },
        )
    
    @staticmethod
    def for_test(
        tenant_id: str = "test",
        test_name: str = "unit_test",
    ) -> RoomConfig:
        """
        Create room config for testing.
        
        Test rooms use minimal resources and short timeouts.
        
        Args:
            tenant_id: Tenant ID (default "test")
            test_name: Name of the test
            
        Returns:
            RoomConfig optimized for testing
        """
        return RoomConfig(
            name=RoomNaming.generate(
                room_type=RoomType.TEST,
                tenant_id=tenant_id,
            ),
            empty_timeout=30,        # Very short for tests
            max_participants=3,
            tenant_id=tenant_id,
            call_direction="test",
            caller_number="+15551234567",
            called_number="+15559876543",
            enable_recording=False,
            enable_transcription=False,
            custom_metadata={
                "test_name": test_name,
                "environment": "test",
            },
        )

35.4 Room Deletion/Cleanup

When Rooms Are Deleted

Rooms are deleted:
  1. Automatically - When empty_timeout expires with no participants
  2. Explicitly - When we call delete_room after a call ends
  3. On Error - When something goes wrong and we need to clean up

Room Lifecycle States

┌─────────────────────────────────────────────────────────────────────────────┐
│                          Room Lifecycle States                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐             │
│  │ CREATING │───▶│  ACTIVE  │───▶│ DRAINING │───▶│  CLOSED  │             │
│  └──────────┘    └──────────┘    └──────────┘    └──────────┘             │
│                                                                             │
│  CREATING:   Room is being created, not yet ready for participants         │
│  ACTIVE:     Room is accepting participants, call in progress              │
│  DRAINING:   Room is closing, no new participants allowed                  │
│  CLOSED:     Room has been terminated, all resources released              │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Room Lifecycle Manager

"""
Room lifecycle state management.
"""
from enum import Enum
from dataclasses import dataclass, field
from typing import Optional, Callable, Awaitable, List, Dict, Set
from datetime import datetime
import asyncio
import logging

logger = logging.getLogger(__name__)


class RoomState(Enum):
    """Room lifecycle states."""
    CREATING = "creating"
    ACTIVE = "active"
    DRAINING = "draining"
    CLOSED = "closed"


class RoomLifecycleManager:
    """
    Manages room lifecycle state transitions.
    
    This ensures rooms follow valid state progressions and
    notifies other parts of the system when state changes.
    
    Valid transitions:
    - None → CREATING (new room)
    - CREATING → ACTIVE (room ready)
    - CREATING → CLOSED (creation failed)
    - ACTIVE → DRAINING (call ending)
    - ACTIVE → CLOSED (abrupt end)
    - DRAINING → CLOSED (graceful end)
    """
    
    # Define which transitions are allowed
    VALID_TRANSITIONS = {
        None: {RoomState.CREATING},
        RoomState.CREATING: {RoomState.ACTIVE, RoomState.CLOSED},
        RoomState.ACTIVE: {RoomState.DRAINING, RoomState.CLOSED},
        RoomState.DRAINING: {RoomState.CLOSED},
        RoomState.CLOSED: set(),  # Terminal state - no transitions allowed
    }
    
    def __init__(self):
        # Track current state of each room
        self._room_states: Dict[str, RoomState] = {}
        
        # Callbacks to notify on state changes
        self._callbacks: List[Callable] = []
        
        # Lock for thread-safe state changes
        self._lock = asyncio.Lock()
    
    def subscribe(self, callback: Callable):
        """
        Subscribe to state change events.
        
        Your callback will be called whenever a room changes state.
        
        Args:
            callback: Async function that takes a RoomStateEvent
        """
        self._callbacks.append(callback)
    
    def get_state(self, room_name: str) -> Optional[RoomState]:
        """Get the current state of a room."""
        return self._room_states.get(room_name)
    
    async def transition(
        self,
        room_name: str,
        new_state: RoomState,
    ) -> bool:
        """
        Transition a room to a new state.
        
        Args:
            room_name: Name of the room
            new_state: State to transition to
            
        Returns:
            True if transition was valid and executed
            
        Raises:
            InvalidStateTransitionError: If transition is not allowed
        """
        async with self._lock:
            current_state = self._room_states.get(room_name)
            
            # Check if this transition is allowed
            valid_targets = self.VALID_TRANSITIONS.get(current_state, set())
            if new_state not in valid_targets:
                raise InvalidStateTransitionError(
                    f"Cannot transition room {room_name} "
                    f"from {current_state} to {new_state}"
                )
            
            # Execute the transition
            self._room_states[room_name] = new_state
            
            logger.info(
                f"Room state transition: {room_name} "
                f"{current_state} -> {new_state}"
            )
        
        # Notify callbacks (outside lock to prevent deadlocks)
        for callback in self._callbacks:
            try:
                await callback(room_name, current_state, new_state)
            except Exception as e:
                logger.error(f"Error in state change callback: {e}")
        
        # Clean up closed rooms from our tracking after a delay
        if new_state == RoomState.CLOSED:
            await asyncio.sleep(5)
            async with self._lock:
                self._room_states.pop(room_name, None)
        
        return True
    
    def get_rooms_by_state(self, state: RoomState) -> List[str]:
        """Get all rooms in a specific state."""
        return [
            name for name, s in self._room_states.items()
            if s == state
        ]


class InvalidStateTransitionError(Exception):
    """Raised when an invalid state transition is attempted."""
    pass

Section 36: Participant Management

36.1 Participant Types

Who Joins LiveKit Rooms?

In Voice by aiConnected, several types of participants can join a call room:
┌─────────────────────────────────────────────────────────────────────────────┐
│                         Participant Types                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐            │
│  │   SIP_CALLER    │  │    AI_AGENT     │  │   SUPERVISOR    │            │
│  │                 │  │                 │  │                 │            │
│  │ Phone caller    │  │ Voice AI bot    │  │ Human monitor   │            │
│  │ bridged via     │  │ processing      │  │ for quality     │            │
│  │ GoToConnect     │  │ speech          │  │ assurance       │            │
│  │                 │  │                 │  │                 │            │
│  │ ✓ Publish audio │  │ ✓ Publish audio │  │ ✓ Publish audio │            │
│  │ ✓ Subscribe all │  │ ✓ Subscribe all │  │ ✓ Subscribe all │            │
│  │ ✗ Hidden        │  │ ✗ Hidden        │  │ ✓ Hidden        │            │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘            │
│                                                                             │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐            │
│  │    OBSERVER     │  │    RECORDER     │  │   HUMAN_AGENT   │            │
│  │                 │  │                 │  │                 │            │
│  │ Silent monitor  │  │ Recording bot   │  │ Live human      │            │
│  │ for debugging   │  │ for archival    │  │ agent takeover  │            │
│  │ or training     │  │                 │  │                 │            │
│  │                 │  │                 │  │                 │            │
│  │ ✗ Publish       │  │ ✗ Publish       │  │ ✓ Publish audio │            │
│  │ ✓ Subscribe all │  │ ✓ Subscribe all │  │ ✓ Subscribe all │            │
│  │ ✓ Hidden        │  │ ✓ Hidden        │  │ ✗ Hidden        │            │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘            │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Participant Type Implementation

"""
Participant type definitions and permissions.
"""
from enum import Enum
from dataclasses import dataclass


class ParticipantType(Enum):
    """
    Types of participants that can join a call.
    
    Each type has different default permissions and capabilities.
    """
    SIP_CALLER = "sip_caller"       # Phone caller via GoToConnect
    AI_AGENT = "ai_agent"           # Voice AI processing bot
    SUPERVISOR = "supervisor"       # Human supervisor for QA
    OBSERVER = "observer"           # Silent observer for monitoring
    RECORDER = "recorder"           # Recording service
    HUMAN_AGENT = "human_agent"     # Live human agent
    WEBRTC_CALLER = "webrtc_caller" # Browser-based caller
    SYSTEM = "system"               # System services


@dataclass
class ParticipantPermissions:
    """
    Permissions that control what a participant can do.
    
    These permissions are encoded into the access token,
    so they're enforced by LiveKit itself.
    """
    can_publish: bool = True          # Can publish audio tracks
    can_subscribe: bool = True        # Can subscribe to others' tracks
    can_publish_data: bool = False    # Can send data messages
    can_update_metadata: bool = False # Can update own metadata
    hidden: bool = False              # Hidden from other participants
    
    @classmethod
    def for_type(cls, participant_type: ParticipantType) -> "ParticipantPermissions":
        """
        Get default permissions for a participant type.
        
        These are sensible defaults - you can customize them when
        creating tokens if needed.
        
        Args:
            participant_type: Type of participant
            
        Returns:
            ParticipantPermissions with appropriate defaults
        """
        if participant_type == ParticipantType.SIP_CALLER:
            # Phone callers can talk and listen, nothing else
            return cls(
                can_publish=True,
                can_subscribe=True,
                can_publish_data=False,
                hidden=False,
            )
            
        elif participant_type == ParticipantType.AI_AGENT:
            # AI agents need full access for processing
            return cls(
                can_publish=True,
                can_subscribe=True,
                can_publish_data=True,     # Send metadata updates
                can_update_metadata=True,  # Update status
                hidden=False,
            )
            
        elif participant_type == ParticipantType.SUPERVISOR:
            # Supervisors can speak but are hidden by default
            return cls(
                can_publish=True,   # Can intervene if needed
                can_subscribe=True,
                can_publish_data=True,
                hidden=True,        # Hidden from caller
            )
            
        elif participant_type == ParticipantType.OBSERVER:
            # Observers can only listen, always hidden
            return cls(
                can_publish=False,
                can_subscribe=True,
                can_publish_data=False,
                hidden=True,
            )
            
        elif participant_type == ParticipantType.RECORDER:
            # Recorders are silent, hidden, receive all audio
            return cls(
                can_publish=False,
                can_subscribe=True,
                can_publish_data=False,
                hidden=True,
            )
            
        elif participant_type == ParticipantType.HUMAN_AGENT:
            # Human agents taking over have full access, visible
            return cls(
                can_publish=True,
                can_subscribe=True,
                can_publish_data=True,
                can_update_metadata=True,
                hidden=False,
            )
        
        else:
            # Default - basic permissions
            return cls()

36.2 Participant Identity Format

Structured Identities

We use a structured format for participant identities that encodes:
  • What type of participant they are
  • Which tenant they belong to
  • A unique identifier
Format: {type}:{tenant}:{id}

Examples:
- sip_caller:acme:call-550e8400-e29b-41d4-a716-446655440000
- ai_agent:acme:agent-001
- supervisor:acme:user-john-smith

Why Structured Identities?

BenefitExplanation
DebuggingInstantly see who’s who in logs
FilteringFind all agents, all callers for a tenant
SecurityVerify participant belongs to correct tenant
AnalyticsTrack metrics by participant type

Identity Implementation

"""
Structured participant identity management.
"""
from dataclasses import dataclass
from typing import Optional


@dataclass
class ParticipantIdentity:
    """
    Structured participant identity.
    
    This provides a consistent way to identify participants
    across the system, encoding type and tenant information
    in the identity string.
    """
    participant_type: ParticipantType
    tenant_id: str
    unique_id: str
    
    @property
    def identity(self) -> str:
        """
        Get the full identity string.
        
        This is what gets stored in LiveKit and used throughout
        the system.
        """
        return f"{self.participant_type.value}:{self.tenant_id}:{self.unique_id}"
    
    @classmethod
    def parse(cls, identity: str) -> Optional["ParticipantIdentity"]:
        """
        Parse an identity string back into components.
        
        Args:
            identity: Identity string like "ai_agent:acme:agent-001"
            
        Returns:
            ParticipantIdentity if valid, None if format is wrong
        """
        parts = identity.split(":", 2)  # Split into max 3 parts
        if len(parts) != 3:
            return None
        
        try:
            participant_type = ParticipantType(parts[0])
            return cls(
                participant_type=participant_type,
                tenant_id=parts[1],
                unique_id=parts[2],
            )
        except ValueError:
            # Unknown participant type
            return None
    
    @classmethod
    def for_caller(cls, tenant_id: str, call_id: str) -> "ParticipantIdentity":
        """Create identity for a phone caller."""
        return cls(ParticipantType.SIP_CALLER, tenant_id, call_id)
    
    @classmethod
    def for_agent(cls, tenant_id: str, agent_id: str) -> "ParticipantIdentity":
        """Create identity for an AI agent."""
        return cls(ParticipantType.AI_AGENT, tenant_id, agent_id)
    
    @classmethod
    def for_supervisor(cls, tenant_id: str, user_id: str) -> "ParticipantIdentity":
        """Create identity for a supervisor."""
        return cls(ParticipantType.SUPERVISOR, tenant_id, user_id)

36.3 Permissions by Role

Permission Matrix

PermissionCallerAI AgentSupervisorObserverRecorderHuman Agent
can_publish
can_subscribe
can_publish_data
can_update_metadata
hidden

What Each Permission Means

can_publish
  • Allows publishing audio (and video if supported)
  • If false, participant can only listen
  • Callers and agents need this to speak
can_subscribe
  • Allows receiving others’ audio
  • Almost always true - everyone needs to hear
  • Could be false for one-way broadcast
can_publish_data
  • Allows sending data channel messages
  • Agents use this to send transcription updates
  • Not needed for basic voice calls
can_update_metadata
  • Allows changing own metadata
  • Agents update status (processing, responding)
  • Not needed for callers
hidden
  • Other participants don’t know you’re there
  • Perfect for supervisors monitoring calls
  • Observers and recorders are always hidden

36.4 Participant Lifecycle

States a Participant Goes Through

┌─────────────────────────────────────────────────────────────────────────────┐
│                      Participant Lifecycle States                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────┐    ┌───────────┐    ┌──────────────┐    ┌──────────────┐    │
│  │ JOINING  │───▶│ CONNECTED │───▶│ RECONNECTING │───▶│ DISCONNECTED │    │
│  └──────────┘    └───────────┘    └──────────────┘    └──────────────┘    │
│       │                │                                      ▲            │
│       │                └──────────────────────────────────────┘            │
│       │                      (normal disconnect)                           │
│       │                                                                     │
│       └─────────────────────────────────────────────────────────┘          │
│                       (failed to connect)                                  │
│                                                                             │
│  JOINING:       Token issued, connecting to room                           │
│  CONNECTED:     Successfully in room, audio flowing                        │
│  RECONNECTING:  Temporarily lost connection, trying to recover             │
│  DISCONNECTED:  Left the room (normal or abnormal)                         │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Participant Manager Implementation

"""
Participant lifecycle management.
"""
from dataclasses import dataclass, field
from typing import Optional, Dict, List
from datetime import datetime
from enum import Enum
import asyncio
import logging

logger = logging.getLogger(__name__)


class ParticipantState(Enum):
    """Participant connection states."""
    JOINING = "joining"           # Token issued, connecting
    CONNECTED = "connected"       # Successfully in room
    RECONNECTING = "reconnecting" # Temporarily disconnected
    DISCONNECTED = "disconnected" # Left the room


@dataclass
class ParticipantInfo:
    """
    Information about a participant in a room.
    
    Tracks everything we need to know about someone
    currently (or recently) in a call.
    """
    identity: str
    name: str
    participant_type: ParticipantType
    sid: Optional[str] = None      # Server-assigned ID from LiveKit
    state: ParticipantState = ParticipantState.JOINING
    room_name: Optional[str] = None
    joined_at: Optional[datetime] = None
    
    # Track status
    audio_track_published: bool = False
    
    @property
    def is_active(self) -> bool:
        """Check if participant is currently active in room."""
        return self.state in (
            ParticipantState.CONNECTED,
            ParticipantState.RECONNECTING,
        )


class ParticipantManager:
    """
    Manages participant lifecycle within rooms.
    
    Responsibilities:
    - Track who's in each room
    - Handle join/leave events
    - Generate tokens for new participants
    - Remove participants when needed
    """
    
    def __init__(self, token_service: 'TokenService'):
        self.token_service = token_service
        
        # Track participants: room_name -> identity -> ParticipantInfo
        self._participants: Dict[str, Dict[str, ParticipantInfo]] = {}
        
        # Lock for thread-safe updates
        self._lock = asyncio.Lock()
    
    async def create_token_for_participant(
        self,
        room_name: str,
        identity: ParticipantIdentity,
        display_name: str = None,
        ttl_seconds: int = 3600,
    ) -> str:
        """
        Create a token for a participant to join a room.
        
        This is the first step when someone needs to join a call.
        The token encodes their permissions and identity.
        
        Args:
            room_name: Room to join
            identity: Structured participant identity
            display_name: Human-readable name (optional)
            ttl_seconds: How long token is valid
            
        Returns:
            JWT access token string
        """
        # Get default permissions for this participant type
        permissions = ParticipantPermissions.for_type(identity.participant_type)
        
        # Generate the token
        token = await self.token_service.generate_token(
            room_name=room_name,
            participant_identity=identity.identity,
            participant_name=display_name or identity.unique_id,
            permissions=permissions,
            ttl_seconds=ttl_seconds,
        )
        
        # Track this participant as joining
        async with self._lock:
            if room_name not in self._participants:
                self._participants[room_name] = {}
            
            self._participants[room_name][identity.identity] = ParticipantInfo(
                identity=identity.identity,
                name=display_name or identity.unique_id,
                participant_type=identity.participant_type,
                room_name=room_name,
                state=ParticipantState.JOINING,
            )
        
        logger.info(
            f"Created token for {identity.identity} to join {room_name}"
        )
        
        return token
    
    async def handle_participant_joined(
        self,
        room_name: str,
        participant_identity: str,
        participant_sid: str,
    ):
        """
        Handle notification that a participant joined.
        
        Called when we receive a webhook from LiveKit saying
        someone connected to a room.
        """
        async with self._lock:
            if room_name not in self._participants:
                self._participants[room_name] = {}
            
            if participant_identity in self._participants[room_name]:
                # Update existing entry
                info = self._participants[room_name][participant_identity]
                info.sid = participant_sid
                info.state = ParticipantState.CONNECTED
                info.joined_at = datetime.utcnow()
            else:
                # Create new entry (in case we missed the token creation)
                parsed = ParticipantIdentity.parse(participant_identity)
                participant_type = parsed.participant_type if parsed else ParticipantType.SYSTEM
                
                self._participants[room_name][participant_identity] = ParticipantInfo(
                    identity=participant_identity,
                    name=participant_identity,
                    participant_type=participant_type,
                    sid=participant_sid,
                    state=ParticipantState.CONNECTED,
                    room_name=room_name,
                    joined_at=datetime.utcnow(),
                )
        
        logger.info(f"Participant joined: {participant_identity} in {room_name}")
    
    async def handle_participant_left(
        self,
        room_name: str,
        participant_identity: str,
    ):
        """
        Handle notification that a participant left.
        
        Called when we receive a webhook from LiveKit saying
        someone disconnected from a room.
        """
        async with self._lock:
            if (
                room_name in self._participants and
                participant_identity in self._participants[room_name]
            ):
                info = self._participants[room_name][participant_identity]
                info.state = ParticipantState.DISCONNECTED
        
        logger.info(f"Participant left: {participant_identity} from {room_name}")
    
    def get_room_participants(
        self,
        room_name: str,
        active_only: bool = True,
    ) -> List[ParticipantInfo]:
        """
        Get all participants in a room.
        
        Args:
            room_name: Room to query
            active_only: If True, only return connected participants
            
        Returns:
            List of ParticipantInfo objects
        """
        if room_name not in self._participants:
            return []
        
        participants = list(self._participants[room_name].values())
        
        if active_only:
            participants = [p for p in participants if p.is_active]
        
        return participants
    
    def get_participants_by_type(
        self,
        room_name: str,
        participant_type: ParticipantType,
    ) -> List[ParticipantInfo]:
        """Get all participants of a specific type in a room."""
        participants = self.get_room_participants(room_name)
        return [p for p in participants if p.participant_type == participant_type]

Section 37: Token Generation

37.1 JWT Structure

What is a JWT?

JWT (JSON Web Token) is a standard for securely transmitting information. LiveKit uses JWTs to authenticate participants and authorize their actions. A JWT has three parts:
  1. Header - Says it’s a JWT and which algorithm signed it
  2. Payload - Contains the actual data (claims)
  3. Signature - Proves the token wasn’t tampered with
┌─────────────────────────────────────────────────────────────────────────────┐
│                           JWT Token Structure                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                           JWT Token                                  │   │
│  │                                                                      │   │
│  │  Header (base64 encoded):                                           │   │
│  │  {                                                                  │   │
│  │    "alg": "HS256",     // Algorithm: HMAC SHA-256                   │   │
│  │    "typ": "JWT"        // Type: JSON Web Token                      │   │
│  │  }                                                                  │   │
│  │                                                                      │   │
│  │  Payload (base64 encoded):                                          │   │
│  │  {                                                                  │   │
│  │    "sub": "ai_agent:acme:agent-001",  // Subject (identity)        │   │
│  │    "iss": "APIxxxxxxxxx",              // Issuer (your API key)    │   │
│  │    "nbf": 1704067200,                  // Not before (Unix time)   │   │
│  │    "exp": 1704070800,                  // Expires (Unix time)      │   │
│  │    "name": "AI Assistant",             // Display name             │   │
│  │    "video": {                          // LiveKit permissions      │   │
│  │      "room": "call-acme-xxx",          // Which room               │   │
│  │      "roomJoin": true,                 // Can join                 │   │
│  │      "canPublish": true,               // Can publish audio        │   │
│  │      "canSubscribe": true,             // Can receive audio        │   │
│  │      "canPublishData": true,           // Can send data            │   │
│  │      "hidden": false                   // Visible to others        │   │
│  │    },                                                               │   │
│  │    "metadata": "{\"agent_id\":\"001\"}" // Custom JSON data        │   │
│  │  }                                                                  │   │
│  │                                                                      │   │
│  │  Signature:                                                         │   │
│  │  HMACSHA256(                                                        │   │
│  │    base64(header) + "." + base64(payload),                         │   │
│  │    api_secret                          // Signed with your secret  │   │
│  │  )                                                                  │   │
│  │                                                                      │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│  Final token: header.payload.signature (all base64 encoded)                │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

37.2 Claims & Grants

Standard JWT Claims

ClaimFull NamePurposeExample
subSubjectWho the token is forai_agent:acme:001
issIssuerWho created the tokenAPIxxxxxxxxx
nbfNot BeforeWhen token becomes valid1704067200
expExpirationWhen token expires1704070800
nameNameDisplay nameAI Assistant

LiveKit Video Grants

The video claim contains LiveKit-specific permissions:
GrantTypePurpose
roomstringWhich room this token is for
roomJoinboolCan join the room
canPublishboolCan publish tracks (audio/video)
canSubscribeboolCan receive others’ tracks
canPublishDataboolCan send data messages
hiddenboolInvisible to other participants
recorderboolSpecial recording permissions
roomCreateboolCan create new rooms
roomListboolCan list rooms
roomAdminboolFull admin access

37.3 Token Service Implementation

"""
Token generation service for LiveKit room access.
"""
from livekit import api
from dataclasses import dataclass, field
from typing import Optional, Dict, Any
from datetime import datetime
from enum import Enum
import json
import logging

logger = logging.getLogger(__name__)


class TokenPurpose(Enum):
    """
    Purpose of the token.
    
    Affects default permissions - each purpose has sensible
    defaults that can be overridden.
    """
    CALLER = "caller"           # Phone caller
    AGENT = "agent"             # AI agent
    SUPERVISOR = "supervisor"   # Human supervisor
    OBSERVER = "observer"       # Silent observer
    RECORDING = "recording"     # Recording service
    ADMIN = "admin"             # Administrative access


@dataclass
class TokenRequest:
    """
    Request for a new access token.
    
    Contains all the information needed to generate a token.
    """
    room_name: str                           # Room to join
    participant_identity: str                # Who is this token for
    participant_name: str = ""               # Display name
    purpose: TokenPurpose = TokenPurpose.CALLER
    metadata: Dict[str, Any] = field(default_factory=dict)
    ttl_seconds: int = 3600                  # 1 hour default
    
    def __post_init__(self):
        if not self.participant_name:
            self.participant_name = self.participant_identity


class TokenService:
    """
    Service for generating LiveKit access tokens.
    
    This is the central point for all token generation.
    It ensures consistent permissions and provides audit logging.
    
    Usage:
        service = TokenService(api_key, api_secret)
        token = await service.generate_token(request)
    """
    
    def __init__(
        self,
        api_key: str,
        api_secret: str,
        default_ttl: int = 3600,      # 1 hour
        max_ttl: int = 86400,         # 24 hours
    ):
        """
        Initialize the token service.
        
        Args:
            api_key: Your LiveKit API key
            api_secret: Your LiveKit API secret
            default_ttl: Default token lifetime in seconds
            max_ttl: Maximum allowed token lifetime
        """
        self.api_key = api_key
        self.api_secret = api_secret
        self.default_ttl = default_ttl
        self.max_ttl = max_ttl
        
        # Track tokens for auditing
        self._token_count = 0
    
    async def generate_token(self, request: TokenRequest) -> str:
        """
        Generate an access token for room access.
        
        Args:
            request: TokenRequest with participant details
            
        Returns:
            JWT token string that can be used to join the room
            
        Example:
            >>> request = TokenRequest(
            ...     room_name="call-acme-xxx",
            ...     participant_identity="ai_agent:acme:001",
            ...     purpose=TokenPurpose.AGENT
            ... )
            >>> token = await service.generate_token(request)
        """
        # Validate and cap TTL
        ttl = min(request.ttl_seconds, self.max_ttl)
        if ttl <= 0:
            ttl = self.default_ttl
        
        # Create the token object
        token = api.AccessToken(
            api_key=self.api_key,
            api_secret=self.api_secret,
        )
        
        # Set identity and name
        token.identity = request.participant_identity
        token.name = request.participant_name
        
        # Set expiration
        token.ttl = ttl
        
        # Set metadata if provided
        if request.metadata:
            token.metadata = json.dumps(request.metadata)
        
        # Configure permissions based on purpose
        video_grants = self._get_grants_for_purpose(
            request.purpose,
            request.room_name
        )
        token.video_grants = video_grants
        
        # Generate the JWT
        jwt_token = token.to_jwt()
        
        # Track for auditing
        self._token_count += 1
        
        logger.info(
            f"Generated token for {request.participant_identity}",
            extra={
                "room_name": request.room_name,
                "purpose": request.purpose.value,
                "ttl": ttl,
            }
        )
        
        return jwt_token
    
    def _get_grants_for_purpose(
        self,
        purpose: TokenPurpose,
        room_name: str
    ) -> api.VideoGrants:
        """
        Get appropriate permissions for a token purpose.
        
        Each purpose has sensible defaults that balance
        functionality with security.
        """
        grants = api.VideoGrants(
            room_join=True,
            room=room_name,
        )
        
        if purpose == TokenPurpose.CALLER:
            # Callers can talk and listen
            grants.can_publish = True
            grants.can_subscribe = True
            grants.can_publish_data = False
            grants.hidden = False
            
        elif purpose == TokenPurpose.AGENT:
            # Agents need full access
            grants.can_publish = True
            grants.can_subscribe = True
            grants.can_publish_data = True
            grants.hidden = False
            
        elif purpose == TokenPurpose.SUPERVISOR:
            # Supervisors are hidden but can speak if needed
            grants.can_publish = True
            grants.can_subscribe = True
            grants.can_publish_data = True
            grants.hidden = True
            
        elif purpose == TokenPurpose.OBSERVER:
            # Observers can only listen
            grants.can_publish = False
            grants.can_subscribe = True
            grants.hidden = True
            
        elif purpose == TokenPurpose.RECORDING:
            # Recording service receives all, publishes nothing
            grants.can_publish = False
            grants.can_subscribe = True
            grants.hidden = True
            grants.recorder = True
            
        elif purpose == TokenPurpose.ADMIN:
            # Admin has full access
            grants.can_publish = True
            grants.can_subscribe = True
            grants.can_publish_data = True
            grants.room_admin = True
        
        return grants
    
    async def generate_agent_token(
        self,
        room_name: str,
        agent_id: str,
        agent_name: str = "AI Assistant",
    ) -> str:
        """
        Convenience method for generating agent tokens.
        
        AI agents typically need:
        - 2 hour token lifetime (longer calls)
        - Full publish/subscribe access
        - Ability to send data messages
        """
        request = TokenRequest(
            room_name=room_name,
            participant_identity=f"ai_agent:{agent_id}",
            participant_name=agent_name,
            purpose=TokenPurpose.AGENT,
            metadata={"agent_id": agent_id},
            ttl_seconds=7200,  # 2 hours
        )
        return await self.generate_token(request)
    
    async def generate_supervisor_token(
        self,
        room_name: str,
        supervisor_id: str,
        supervisor_name: str = "Supervisor",
        visible: bool = False,
    ) -> str:
        """
        Convenience method for generating supervisor tokens.
        
        Supervisors can choose to be visible or hidden.
        """
        request = TokenRequest(
            room_name=room_name,
            participant_identity=f"supervisor:{supervisor_id}",
            participant_name=supervisor_name,
            purpose=TokenPurpose.SUPERVISOR,
            metadata={"supervisor_id": supervisor_id, "visible": visible},
        )
        return await self.generate_token(request)
    
    async def generate_recording_token(
        self,
        room_name: str,
        recording_id: str,
    ) -> str:
        """
        Convenience method for generating recording tokens.
        
        Recording tokens have very long lifetimes since recordings
        can run for hours.
        """
        request = TokenRequest(
            room_name=room_name,
            participant_identity=f"recorder:{recording_id}",
            participant_name="Recording Service",
            purpose=TokenPurpose.RECORDING,
            metadata={"recording_id": recording_id},
            ttl_seconds=86400,  # 24 hours
        )
        return await self.generate_token(request)

37.4 Token Refresh Strategy

Why Token Refresh Matters

Tokens have limited lifetimes for security. But what happens if a call lasts longer than the token’s TTL? We need to refresh tokens before they expire.

Token Refresh Manager

"""
Token refresh management for long-running connections.
"""
from typing import Optional, Callable, Awaitable
import asyncio
import logging
from datetime import datetime, timedelta

logger = logging.getLogger(__name__)


class TokenRefreshManager:
    """
    Manages automatic token refresh for long-running connections.
    
    Instead of waiting for tokens to expire (which would disconnect
    the participant), this proactively refreshes tokens before
    expiration.
    
    Usage:
        manager = TokenRefreshManager(token_service)
        manager.schedule_refresh(
            room_name="call-acme-xxx",
            identity="ai_agent:acme:001",
            current_ttl=3600,
            on_refresh=lambda new_token: use_new_token(new_token)
        )
    """
    
    def __init__(
        self,
        token_service: TokenService,
        refresh_before_expiry_seconds: int = 300,  # Refresh 5 min before expiry
    ):
        """
        Initialize the refresh manager.
        
        Args:
            token_service: Service for generating new tokens
            refresh_before_expiry_seconds: How long before expiry to refresh
        """
        self.token_service = token_service
        self.refresh_before_expiry = refresh_before_expiry_seconds
        
        # Track scheduled refreshes
        self._refresh_tasks: dict[str, asyncio.Task] = {}
    
    def schedule_refresh(
        self,
        room_name: str,
        identity: str,
        current_ttl: int,
        purpose: TokenPurpose,
        on_refresh: Callable[[str], Awaitable[None]],
    ):
        """
        Schedule a token refresh.
        
        Args:
            room_name: Room the token is for
            identity: Participant identity
            current_ttl: Current token's lifetime in seconds
            purpose: Token purpose (for regeneration)
            on_refresh: Callback to receive new token
        """
        # Calculate when to refresh
        refresh_in = current_ttl - self.refresh_before_expiry
        if refresh_in <= 0:
            # Token is about to expire or already expired
            refresh_in = 0
        
        # Create a key for tracking
        key = f"{room_name}:{identity}"
        
        # Cancel existing refresh task if any
        if key in self._refresh_tasks:
            self._refresh_tasks[key].cancel()
        
        # Schedule new refresh
        task = asyncio.create_task(
            self._refresh_after_delay(
                room_name, identity, purpose, refresh_in, on_refresh
            )
        )
        self._refresh_tasks[key] = task
        
        logger.info(
            f"Scheduled token refresh for {identity} in {refresh_in} seconds"
        )
    
    async def _refresh_after_delay(
        self,
        room_name: str,
        identity: str,
        purpose: TokenPurpose,
        delay_seconds: int,
        on_refresh: Callable[[str], Awaitable[None]],
    ):
        """Wait and then refresh the token."""
        try:
            # Wait until refresh time
            await asyncio.sleep(delay_seconds)
            
            # Generate new token
            request = TokenRequest(
                room_name=room_name,
                participant_identity=identity,
                purpose=purpose,
            )
            new_token = await self.token_service.generate_token(request)
            
            # Notify callback
            await on_refresh(new_token)
            
            logger.info(f"Token refreshed for {identity}")
            
            # Schedule next refresh
            self.schedule_refresh(
                room_name, identity,
                request.ttl_seconds, purpose,
                on_refresh
            )
            
        except asyncio.CancelledError:
            # Task was cancelled (participant left or explicit cancel)
            pass
        except Exception as e:
            logger.error(f"Token refresh failed for {identity}: {e}")
    
    def cancel_refresh(self, room_name: str, identity: str):
        """Cancel a scheduled refresh (e.g., when participant leaves)."""
        key = f"{room_name}:{identity}"
        if key in self._refresh_tasks:
            self._refresh_tasks[key].cancel()
            del self._refresh_tasks[key]

Section 38: Audio Track Handling

38.1 Track Publication

What is Track Publication?

When a participant wants to share audio (or video), they “publish” a track to the room. Other participants can then “subscribe” to that track to receive the audio.
┌─────────────────────────────────────────────────────────────────────────────┐
│                        Audio Track Publication Flow                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Participant A                    LiveKit                    Participant B  │
│  (Publisher)                      Room                       (Subscriber)   │
│                                                                             │
│  ┌──────────────┐              ┌──────────┐              ┌──────────────┐  │
│  │ Audio Source │              │          │              │    Audio     │  │
│  │ (Microphone) │              │          │              │    Sink      │  │
│  └──────┬───────┘              │          │              └──────▲───────┘  │
│         │                      │          │                     │          │
│         ▼                      │          │                     │          │
│  ┌──────────────┐              │          │              ┌──────────────┐  │
│  │ Encode Opus  │──────────────│ Route    │──────────────│ Decode Opus  │  │
│  │ + Publish    │   PUBLISH    │ Audio    │   SUBSCRIBE  │ + Playback   │  │
│  └──────────────┘              │          │              └──────────────┘  │
│                                │          │                                │
│                                └──────────┘                                │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Audio Track Options

"""
Audio track configuration and management.
"""
from livekit import rtc
from dataclasses import dataclass
from enum import Enum


class AudioQuality(Enum):
    """
    Audio quality presets.
    
    Different use cases need different quality/bandwidth tradeoffs.
    """
    VOICE = "voice"         # Optimized for speech (default for calls)
    MUSIC = "music"         # Higher quality for hold music
    TELEPHONY = "telephony" # Minimal bandwidth (8kHz equivalent)


@dataclass
class AudioTrackOptions:
    """
    Configuration options for audio tracks.
    
    These settings control how audio is encoded and transmitted.
    LiveKit uses Opus codec which is excellent for voice.
    """
    
    # Sample rate (Hz) - how many audio samples per second
    # 48000 is CD quality, 16000 is "wideband" voice
    sample_rate: int = 48000
    
    # Channels - 1 for mono (voice), 2 for stereo (music)
    channels: int = 1
    
    # Bitrate (bps) - how much data per second
    # 32kbps is good quality for voice, 96kbps for music
    bitrate: int = 32000
    
    # DTX (Discontinuous Transmission) - sends less data during silence
    # Saves bandwidth but adds tiny latency when speech resumes
    dtx: bool = True
    
    # FEC (Forward Error Correction) - adds redundancy for packet loss
    # Small bandwidth increase but much better quality on lossy networks
    fec: bool = True
    
    # RED (Redundant Encoding) - includes previous frame for recovery
    # Best protection against packet loss
    red: bool = True
    
    # Frame size in milliseconds - how much audio per packet
    # 20ms is standard, smaller = lower latency, larger = better compression
    frame_size_ms: int = 20
    
    @classmethod
    def for_voice(cls) -> "AudioTrackOptions":
        """
        Get voice-optimized settings.
        
        This is the default for phone calls - good quality
        at low bandwidth with protection against packet loss.
        """
        return cls(
            sample_rate=48000,
            channels=1,
            bitrate=32000,
            dtx=True,
            fec=True,
            red=True,
        )
    
    @classmethod
    def for_music(cls) -> "AudioTrackOptions":
        """
        Get music-quality settings.
        
        Use for hold music or when audio quality is paramount.
        Higher bandwidth but better audio.
        """
        return cls(
            sample_rate=48000,
            channels=2,      # Stereo
            bitrate=96000,   # Higher bitrate
            dtx=False,       # Don't reduce during "quiet" parts
            fec=True,
            red=True,
        )
    
    @classmethod
    def for_low_bandwidth(cls) -> "AudioTrackOptions":
        """
        Get minimal bandwidth settings.
        
        Use when bandwidth is constrained - still intelligible
        but lower quality.
        """
        return cls(
            sample_rate=16000,  # Wideband (telephone is 8kHz)
            channels=1,
            bitrate=16000,
            dtx=True,
            fec=False,  # Save bandwidth
            red=False,
        )

Audio Track Publisher

"""
Audio track publishing for AI agents.
"""
from livekit import rtc
from typing import Optional, AsyncIterator
import numpy as np
import asyncio
import logging

logger = logging.getLogger(__name__)


class AudioTrackPublisher:
    """
    Publishes audio tracks to LiveKit rooms.
    
    AI agents use this to send TTS (text-to-speech) audio
    back to callers.
    
    Usage:
        publisher = AudioTrackPublisher(room)
        await publisher.start()
        await publisher.write_frames(tts_audio_bytes)
        await publisher.stop()
    """
    
    def __init__(
        self,
        room: rtc.Room,
        options: AudioTrackOptions = None,
    ):
        """
        Initialize the publisher.
        
        Args:
            room: LiveKit room to publish to
            options: Audio configuration (defaults to voice-optimized)
        """
        self.room = room
        self.options = options or AudioTrackOptions.for_voice()
        
        self._source: Optional[rtc.AudioSource] = None
        self._track: Optional[rtc.LocalAudioTrack] = None
        self._published = False
    
    async def start(self) -> rtc.LocalAudioTrack:
        """
        Start publishing audio.
        
        Creates an audio source and track, then publishes it
        to the room. After calling this, you can write audio
        frames with write_frames().
        
        Returns:
            The published LocalAudioTrack
        """
        if self._published:
            return self._track
        
        # Create audio source
        # This is what we write audio data into
        self._source = rtc.AudioSource(
            sample_rate=self.options.sample_rate,
            num_channels=self.options.channels,
        )
        
        # Create local track from the source
        self._track = rtc.LocalAudioTrack.create_audio_track(
            "agent_audio",  # Track name
            self._source,
        )
        
        # Publish to the room
        options = rtc.TrackPublishOptions()
        options.dtx = self.options.dtx
        options.red = self.options.red
        
        await self.room.local_participant.publish_track(
            self._track,
            options,
        )
        
        self._published = True
        
        logger.info(
            f"Audio track published",
            extra={
                "sample_rate": self.options.sample_rate,
                "channels": self.options.channels,
            }
        )
        
        return self._track
    
    async def stop(self):
        """Stop publishing and clean up."""
        if not self._published:
            return
        
        if self._track:
            await self.room.local_participant.unpublish_track(
                self._track.sid
            )
            self._track = None
        
        self._source = None
        self._published = False
        
        logger.info("Audio track unpublished")
    
    async def write_frames(self, audio_data: bytes, sample_rate: int = None):
        """
        Write audio frames to be transmitted.
        
        Args:
            audio_data: Raw PCM audio (16-bit signed integers)
            sample_rate: Sample rate of the data (optional, uses configured rate)
            
        Example:
            # Get TTS output as bytes
            tts_audio = await tts_service.synthesize("Hello!")
            
            # Send to caller
            await publisher.write_frames(tts_audio)
        """
        if not self._published or not self._source:
            logger.warning("Attempted to write frames before publishing")
            return
        
        # Convert bytes to numpy array
        audio_array = np.frombuffer(audio_data, dtype=np.int16)
        
        # Create audio frame
        frame = rtc.AudioFrame(
            data=audio_array.tobytes(),
            sample_rate=sample_rate or self.options.sample_rate,
            num_channels=self.options.channels,
            samples_per_channel=len(audio_array) // self.options.channels,
        )
        
        # Send to source
        await self._source.capture_frame(frame)
    
    async def stream_audio(
        self,
        audio_iterator: AsyncIterator[bytes],
        sample_rate: int = None,
    ):
        """
        Stream audio from an async iterator.
        
        Useful for streaming TTS output directly without
        buffering the entire response.
        
        Args:
            audio_iterator: Async iterator yielding audio chunks
            sample_rate: Sample rate of the audio
        """
        async for chunk in audio_iterator:
            await self.write_frames(chunk, sample_rate)
    
    @property
    def is_published(self) -> bool:
        """Check if currently publishing."""
        return self._published

38.2 Track Subscription

Subscribing to Audio

When you join a room, you can subscribe to audio tracks published by other participants. This is how the AI agent hears the caller.
"""
Audio track subscription for processing caller audio.
"""
from livekit import rtc
from typing import Optional, Callable, Awaitable, List
import asyncio
import logging

logger = logging.getLogger(__name__)

# Type alias for audio frame callbacks
AudioFrameCallback = Callable[[rtc.AudioFrame, str], Awaitable[None]]


class AudioTrackSubscriber:
    """
    Subscribes to audio tracks from remote participants.
    
    AI agents use this to receive caller audio for speech-to-text
    processing.
    
    Usage:
        subscriber = AudioTrackSubscriber(room)
        
        async def process_audio(frame, participant_identity):
            # Send to STT service
            text = await stt.transcribe(frame)
        
        subscriber.on_audio_frame(process_audio)
    """
    
    def __init__(self, room: rtc.Room):
        """
        Initialize the subscriber.
        
        Args:
            room: LiveKit room to subscribe in
        """
        self.room = room
        self._callbacks: List[AudioFrameCallback] = []
        self._subscribed_tracks: dict[str, rtc.RemoteAudioTrack] = {}
        self._frame_tasks: dict[str, asyncio.Task] = {}
        
        # Register for track events
        self.room.on("track_subscribed", self._handle_track_subscribed)
        self.room.on("track_unsubscribed", self._handle_track_unsubscribed)
    
    def on_audio_frame(self, callback: AudioFrameCallback):
        """
        Register a callback for audio frames.
        
        Your callback will be called for each audio frame received
        from any subscribed participant.
        
        Args:
            callback: Async function(frame, participant_identity)
        """
        self._callbacks.append(callback)
    
    async def _handle_track_subscribed(
        self,
        track: rtc.Track,
        publication: rtc.RemoteTrackPublication,
        participant: rtc.RemoteParticipant,
    ):
        """Handle track subscription event."""
        # Only care about audio tracks
        if track.kind != rtc.TrackKind.KIND_AUDIO:
            return
        
        audio_track = track  # type: rtc.RemoteAudioTrack
        self._subscribed_tracks[participant.identity] = audio_track
        
        # Start processing frames from this track
        task = asyncio.create_task(
            self._process_audio_frames(audio_track, participant.identity)
        )
        self._frame_tasks[participant.identity] = task
        
        logger.info(
            f"Subscribed to audio from {participant.identity}"
        )
    
    async def _handle_track_unsubscribed(
        self,
        track: rtc.Track,
        publication: rtc.RemoteTrackPublication,
        participant: rtc.RemoteParticipant,
    ):
        """Handle track unsubscription event."""
        if track.kind != rtc.TrackKind.KIND_AUDIO:
            return
        
        # Stop processing frames
        self._subscribed_tracks.pop(participant.identity, None)
        
        task = self._frame_tasks.pop(participant.identity, None)
        if task:
            task.cancel()
            try:
                await task
            except asyncio.CancelledError:
                pass
        
        logger.info(f"Unsubscribed from audio of {participant.identity}")
    
    async def _process_audio_frames(
        self,
        track: rtc.RemoteAudioTrack,
        participant_identity: str,
    ):
        """
        Process audio frames from a track.
        
        This runs continuously while subscribed, forwarding
        each frame to registered callbacks.
        """
        audio_stream = rtc.AudioStream(track)
        
        try:
            async for frame_event in audio_stream:
                frame = frame_event.frame
                
                # Call all registered callbacks
                for callback in self._callbacks:
                    try:
                        await callback(frame, participant_identity)
                    except Exception as e:
                        logger.error(f"Error in audio frame callback: {e}")
                        
        except asyncio.CancelledError:
            pass
        except Exception as e:
            logger.error(f"Error processing audio frames: {e}")
        finally:
            await audio_stream.aclose()
    
    async def close(self):
        """Clean up all subscriptions."""
        # Cancel all frame processing tasks
        for task in self._frame_tasks.values():
            task.cancel()
        
        for task in self._frame_tasks.values():
            try:
                await task
            except asyncio.CancelledError:
                pass
        
        self._frame_tasks.clear()
        self._subscribed_tracks.clear()
        self._callbacks.clear()

38.3 Track Quality Settings

Quality vs Bandwidth Tradeoff

SettingVoice CallMusic/AdsLow Bandwidth
Sample Rate48000 Hz48000 Hz16000 Hz
Channels1 (mono)2 (stereo)1 (mono)
Bitrate32 kbps96 kbps16 kbps
DTXOnOffOn
FECOnOnOff
Latency~50ms~50ms~30ms

Adaptive Quality

LiveKit automatically adjusts quality based on network conditions:
┌─────────────────────────────────────────────────────────────────────────────┐
│                      Adaptive Quality Control                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Network Good ──────▶ Full Quality (48kHz, 32kbps, FEC on)                 │
│                                                                             │
│  Network Moderate ──▶ Reduced Quality (lower bitrate, FEC on)              │
│                                                                             │
│  Network Poor ──────▶ Minimal Quality (DTX aggressive, reduced FEC)        │
│                                                                             │
│  Network Very Poor ─▶ May pause/resume as needed                           │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

38.4 Mute/Unmute

Muting Tracks

Muting prevents audio from being transmitted without disconnecting:
"""
Track muting utilities.
"""
from livekit import rtc
from livekit.api import RoomServiceClient
import logging

logger = logging.getLogger(__name__)


class MuteController:
    """
    Controls muting/unmuting of audio tracks.
    
    There are two ways to mute:
    1. Local mute - stops sending audio from the source
    2. Server mute - server stops forwarding audio
    
    Server mute is used when you need to mute another participant
    (like muting a caller from the agent side).
    """
    
    def __init__(self, room_service: RoomServiceClient):
        self.room_service = room_service
    
    async def mute_participant(
        self,
        room_name: str,
        participant_identity: str,
        muted: bool = True,
    ) -> bool:
        """
        Mute or unmute a participant's audio track.
        
        Uses server-side muting which works even if the
        participant's client doesn't cooperate.
        
        Args:
            room_name: Room containing the participant
            participant_identity: Identity of participant to mute
            muted: True to mute, False to unmute
            
        Returns:
            True if successful
        """
        try:
            # Get participant info to find their track
            participants = await self.room_service.list_participants(room_name)
            
            for p in participants:
                if p.identity == participant_identity:
                    # Find audio track
                    for track in p.tracks:
                        if track.type == "audio":
                            await self.room_service.mute_published_track(
                                room=room_name,
                                identity=participant_identity,
                                track_sid=track.sid,
                                muted=muted,
                            )
                            
                            logger.info(
                                f"{'Muted' if muted else 'Unmuted'} "
                                f"{participant_identity} in {room_name}"
                            )
                            return True
            
            logger.warning(f"No audio track found for {participant_identity}")
            return False
            
        except Exception as e:
            logger.error(f"Failed to mute participant: {e}")
            return False
    
    @staticmethod
    async def local_mute(track: rtc.LocalAudioTrack, muted: bool = True):
        """
        Locally mute/unmute your own track.
        
        This is simpler but only works for your own tracks.
        
        Args:
            track: Your local audio track
            muted: True to mute, False to unmute
        """
        await track.set_muted(muted)

Section 39: LiveKit Webhooks

39.1 Room Started

When Room Started Fires

The room_started webhook fires when a new LiveKit room is created and ready for participants.
"""
Room started webhook handler.
"""

async def handle_room_started(event: WebhookEvent):
    """
    Handle room started event.
    
    This is our opportunity to:
    - Start billing timers
    - Log call start for analytics
    - Trigger agent dispatch
    - Update dashboards
    
    Args:
        event: Parsed webhook event
    """
    room_name = event.room_name
    room_sid = event.room_sid
    
    # Parse room name to get tenant and call info
    parsed = RoomNaming.parse(room_name)
    if not parsed:
        logger.warning(f"Could not parse room name: {room_name}")
        return
    
    logger.info(
        f"Room started: {room_name}",
        extra={
            "room_sid": room_sid,
            "tenant_id": parsed.tenant_id,
            "room_type": parsed.room_type.value,
        }
    )
    
    # Start billing timer
    await billing_service.start_call(
        tenant_id=parsed.tenant_id,
        call_id=parsed.call_id,
        room_name=room_name,
    )
    
    # Log for analytics
    await analytics_service.log_event(
        event_type="call_started",
        tenant_id=parsed.tenant_id,
        call_id=parsed.call_id,
        data={
            "room_name": room_name,
            "room_type": parsed.room_type.value,
        }
    )

39.2 Room Finished

When Room Finished Fires

The room_finished webhook fires when a room closes (all participants left and timeout expired, or explicitly deleted).
"""
Room finished webhook handler.
"""

async def handle_room_finished(event: WebhookEvent):
    """
    Handle room finished event.
    
    This is when we:
    - Stop billing timers
    - Calculate final call duration
    - Store call summary
    - Clean up resources
    - Trigger post-call processing
    
    Args:
        event: Parsed webhook event
    """
    room_name = event.room_name
    
    parsed = RoomNaming.parse(room_name)
    if not parsed:
        return
    
    logger.info(
        f"Room finished: {room_name}",
        extra={
            "tenant_id": parsed.tenant_id,
            "call_id": parsed.call_id,
        }
    )
    
    # Stop billing and get duration
    call_duration = await billing_service.end_call(
        tenant_id=parsed.tenant_id,
        call_id=parsed.call_id,
    )
    
    # Log final analytics
    await analytics_service.log_event(
        event_type="call_ended",
        tenant_id=parsed.tenant_id,
        call_id=parsed.call_id,
        data={
            "duration_seconds": call_duration,
            "room_name": room_name,
        }
    )
    
    # Trigger post-call processing (summarization, etc.)
    await post_call_service.process_call(
        tenant_id=parsed.tenant_id,
        call_id=parsed.call_id,
    )

39.3 Participant Joined

Webhook Payload Structure

{
  "event": "participant_joined",
  "room": {
    "name": "call-acme-550e8400...",
    "sid": "RM_xxxxx"
  },
  "participant": {
    "identity": "ai_agent:acme:agent-001",
    "sid": "PA_xxxxx",
    "name": "AI Assistant",
    "metadata": "{\"agent_id\":\"agent-001\"}"
  },
  "createdAt": 1704067200
}

Handler Implementation

"""
Participant joined webhook handler.
"""

async def handle_participant_joined(event: WebhookEvent):
    """
    Handle participant joined event.
    
    Triggered when any participant (caller, agent, supervisor)
    successfully joins a room.
    
    Args:
        event: Parsed webhook event
    """
    room_name = event.room_name
    identity = event.participant_identity
    
    # Parse to understand who joined
    parsed_identity = ParticipantIdentity.parse(identity)
    if not parsed_identity:
        logger.warning(f"Could not parse identity: {identity}")
        return
    
    logger.info(
        f"Participant joined: {identity} in {room_name}",
        extra={
            "participant_type": parsed_identity.participant_type.value,
        }
    )
    
    # Update participant tracking
    await participant_manager.handle_participant_joined(
        room_name=room_name,
        participant_identity=identity,
        participant_sid=event.participant_sid,
        participant_name=event.raw_data.get("participant", {}).get("name", ""),
    )
    
    # Type-specific handling
    if parsed_identity.participant_type == ParticipantType.SIP_CALLER:
        # Caller joined - make sure agent is dispatched
        await ensure_agent_dispatched(room_name, parsed_identity.tenant_id)
        
    elif parsed_identity.participant_type == ParticipantType.AI_AGENT:
        # Agent joined - call is now ready
        await update_call_status(room_name, "in_progress")

39.4 Participant Left

"""
Participant left webhook handler.
"""

async def handle_participant_left(event: WebhookEvent):
    """
    Handle participant left event.
    
    Triggered when a participant disconnects from a room.
    
    Args:
        event: Parsed webhook event
    """
    room_name = event.room_name
    identity = event.participant_identity
    
    parsed_identity = ParticipantIdentity.parse(identity)
    
    logger.info(
        f"Participant left: {identity} from {room_name}",
        extra={
            "participant_type": parsed_identity.participant_type.value if parsed_identity else "unknown",
        }
    )
    
    # Update tracking
    await participant_manager.handle_participant_left(
        room_name=room_name,
        participant_identity=identity,
    )
    
    # Check if call should end
    if parsed_identity and parsed_identity.participant_type == ParticipantType.SIP_CALLER:
        # Caller left - call is over
        await end_call(room_name)

39.5 Track Published/Unpublished

Track Events

"""
Track webhook handlers.
"""

async def handle_track_published(event: WebhookEvent):
    """
    Handle track published event.
    
    Fired when a participant starts publishing audio/video.
    This confirms media is flowing.
    
    Args:
        event: Parsed webhook event
    """
    room_name = event.room_name
    identity = event.participant_identity
    track_sid = event.track_sid
    
    # Get track type from raw data
    track_info = event.raw_data.get("track", {})
    track_type = track_info.get("type", "unknown")  # "audio" or "video"
    
    logger.info(
        f"Track published: {track_type} by {identity}",
        extra={
            "room_name": room_name,
            "track_sid": track_sid,
        }
    )
    
    # Update participant tracking
    await participant_manager.handle_track_published(
        room_name=room_name,
        participant_identity=identity,
        track_sid=track_sid,
        track_type=track_type,
    )


async def handle_track_unpublished(event: WebhookEvent):
    """
    Handle track unpublished event.
    
    Fired when a participant stops publishing.
    Could indicate mute, disconnect, or intentional stop.
    """
    room_name = event.room_name
    identity = event.participant_identity
    track_sid = event.track_sid
    
    logger.info(
        f"Track unpublished: {track_sid} by {identity}",
        extra={"room_name": room_name}
    )

Section 40: Recording with Egress

40.1 Egress Types

What is Egress?

Egress is LiveKit’s term for extracting media from a room for recording, streaming, or other processing. There are several types:
TypePurposeOutput
Room CompositeRecord entire room as single fileMP4/WebM video or audio-only
Track CompositeRecord specific tracksAudio/video file
Participant EgressRecord specific participantAudio/video file
Web EgressRender a web page with room contentVideo file
For Voice by aiConnected, we primarily use Room Composite for call recordings.

Egress Configuration

"""
Egress (recording) configuration and types.
"""
from dataclasses import dataclass
from enum import Enum
from typing import Optional


class EgressOutputType(Enum):
    """Output format for recordings."""
    MP4 = "mp4"           # Common video format
    OGG = "ogg"           # Open audio format  
    WEBM = "webm"         # Web-optimized video
    FILE = "file"         # Generic file output
    STREAM = "stream"     # RTMP stream to external service


@dataclass
class EgressConfig:
    """
    Configuration for call recording.
    
    Defines where and how recordings are stored.
    """
    # Output format
    output_type: EgressOutputType = EgressOutputType.OGG
    
    # Audio settings
    audio_bitrate: int = 128000   # 128 kbps
    audio_frequency: int = 48000  # 48 kHz
    
    # Storage settings
    s3_bucket: str = ""
    s3_region: str = "us-west-2"
    s3_prefix: str = "recordings/"
    
    # File naming
    filename_template: str = "{room_name}_{time}.{ext}"
    
    @classmethod
    def for_compliance_recording(cls) -> "EgressConfig":
        """
        Configuration for compliance/archival recordings.
        
        Higher quality, longer retention.
        """
        return cls(
            output_type=EgressOutputType.OGG,
            audio_bitrate=128000,
            s3_prefix="compliance/",
        )
    
    @classmethod
    def for_training_data(cls) -> "EgressConfig":
        """
        Configuration for AI training data.
        
        Consistent format for processing.
        """
        return cls(
            output_type=EgressOutputType.OGG,
            audio_bitrate=64000,
            audio_frequency=16000,  # Lower for STT processing
            s3_prefix="training/",
        )

40.2 Starting Recording

Recording Service Implementation

"""
Recording service using LiveKit Egress.
"""
from livekit.api import EgressServiceClient, RoomCompositeEgressRequest
from dataclasses import dataclass
from typing import Optional, Dict
import logging

logger = logging.getLogger(__name__)


@dataclass
class RecordingInfo:
    """Information about an active recording."""
    egress_id: str
    room_name: str
    tenant_id: str
    started_at: float
    status: str = "active"
    output_url: Optional[str] = None


class RecordingService:
    """
    Service for managing call recordings.
    
    Uses LiveKit Egress to record calls to S3 storage.
    
    Usage:
        service = RecordingService(egress_client, config)
        recording = await service.start_recording("call-acme-xxx")
        # ... call happens ...
        result = await service.stop_recording(recording.egress_id)
    """
    
    def __init__(
        self,
        egress_service: EgressServiceClient,
        config: EgressConfig,
    ):
        self.egress_service = egress_service
        self.config = config
        
        # Track active recordings
        self._active_recordings: Dict[str, RecordingInfo] = {}
    
    async def start_recording(
        self,
        room_name: str,
        tenant_id: str,
        custom_filename: str = None,
    ) -> RecordingInfo:
        """
        Start recording a room.
        
        Args:
            room_name: Room to record
            tenant_id: Tenant for organizing storage
            custom_filename: Optional custom filename
            
        Returns:
            RecordingInfo with egress ID for stopping later
        """
        try:
            # Build output path
            import time
            timestamp = int(time.time())
            
            if custom_filename:
                filename = custom_filename
            else:
                filename = self.config.filename_template.format(
                    room_name=room_name,
                    time=timestamp,
                    ext="ogg" if self.config.output_type == EgressOutputType.OGG else "mp4",
                )
            
            # Build S3 path
            s3_path = f"{self.config.s3_prefix}{tenant_id}/{filename}"
            
            # Start egress
            request = RoomCompositeEgressRequest(
                room_name=room_name,
                audio_only=True,  # Voice calls don't need video
                file_outputs=[
                    {
                        "file_type": "ogg",
                        "filepath": s3_path,
                        "s3": {
                            "bucket": self.config.s3_bucket,
                            "region": self.config.s3_region,
                        }
                    }
                ]
            )
            
            egress_info = await self.egress_service.start_room_composite_egress(
                request
            )
            
            # Track recording
            recording = RecordingInfo(
                egress_id=egress_info.egress_id,
                room_name=room_name,
                tenant_id=tenant_id,
                started_at=timestamp,
            )
            
            self._active_recordings[egress_info.egress_id] = recording
            
            logger.info(
                f"Started recording for {room_name}",
                extra={
                    "egress_id": egress_info.egress_id,
                    "tenant_id": tenant_id,
                    "output_path": s3_path,
                }
            )
            
            return recording
            
        except Exception as e:
            logger.error(f"Failed to start recording for {room_name}: {e}")
            raise RecordingError(f"Failed to start recording: {e}") from e

40.3 Stopping Recording

    async def stop_recording(self, egress_id: str) -> RecordingInfo:
        """
        Stop an active recording.
        
        Args:
            egress_id: ID of the egress to stop
            
        Returns:
            Updated RecordingInfo with final status and output URL
        """
        try:
            # Stop the egress
            egress_info = await self.egress_service.stop_egress(egress_id)
            
            # Update our tracking
            if egress_id in self._active_recordings:
                recording = self._active_recordings[egress_id]
                recording.status = "completed"
                
                # Extract output URL if available
                if egress_info.file_results:
                    recording.output_url = egress_info.file_results[0].location
                
                del self._active_recordings[egress_id]
                
                logger.info(
                    f"Stopped recording",
                    extra={
                        "egress_id": egress_id,
                        "output_url": recording.output_url,
                    }
                )
                
                return recording
            
            return RecordingInfo(
                egress_id=egress_id,
                room_name="unknown",
                tenant_id="unknown",
                started_at=0,
                status="completed",
            )
            
        except Exception as e:
            logger.error(f"Failed to stop recording {egress_id}: {e}")
            raise RecordingError(f"Failed to stop recording: {e}") from e


class RecordingError(Exception):
    """Raised when recording operations fail."""
    pass

40.4 Storage Configuration

S3 Configuration

LiveKit Egress can output directly to Amazon S3 (or S3-compatible storage like DigitalOcean Spaces, MinIO, etc.).
"""
Storage configuration for recordings.
"""
from dataclasses import dataclass
import os


@dataclass
class S3StorageConfig:
    """
    S3 storage configuration for recordings.
    
    LiveKit needs credentials to write to your bucket.
    The bucket should have a lifecycle policy to manage
    retention and costs.
    """
    bucket: str
    region: str
    access_key_id: str
    secret_access_key: str
    endpoint: str = None  # For S3-compatible services
    
    @classmethod
    def from_environment(cls) -> "S3StorageConfig":
        """Load from environment variables."""
        return cls(
            bucket=os.environ["RECORDING_S3_BUCKET"],
            region=os.environ.get("RECORDING_S3_REGION", "us-west-2"),
            access_key_id=os.environ["AWS_ACCESS_KEY_ID"],
            secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
            endpoint=os.environ.get("RECORDING_S3_ENDPOINT"),
        )


# Environment variables to set:
# RECORDING_S3_BUCKET=your-bucket-name
# RECORDING_S3_REGION=us-west-2
# AWS_ACCESS_KEY_ID=AKIAXXXXXXXX
# AWS_SECRET_ACCESS_KEY=your-secret-key

Bucket Structure

your-bucket/
├── recordings/
│   ├── tenant-acme/
│   │   ├── call-acme-xxx_1704067200.ogg
│   │   └── call-acme-yyy_1704067500.ogg
│   └── tenant-bigco/
│       └── call-bigco-zzz_1704068000.ogg
├── compliance/
│   └── ... (long-term retention)
└── training/
    └── ... (AI training data)

40.5 Recording Retrieval

Retrieving Recordings

"""
Recording retrieval service.
"""
import boto3
from typing import Optional, List
from dataclasses import dataclass


@dataclass
class RecordingMetadata:
    """Metadata about a stored recording."""
    key: str
    tenant_id: str
    room_name: str
    size_bytes: int
    created_at: float
    presigned_url: Optional[str] = None


class RecordingRetrieval:
    """
    Service for retrieving stored recordings.
    
    Provides methods to list, find, and generate download
    URLs for recordings.
    """
    
    def __init__(self, storage_config: S3StorageConfig):
        self.config = storage_config
        
        # Initialize S3 client
        self.s3 = boto3.client(
            's3',
            aws_access_key_id=storage_config.access_key_id,
            aws_secret_access_key=storage_config.secret_access_key,
            region_name=storage_config.region,
            endpoint_url=storage_config.endpoint,
        )
    
    async def list_recordings(
        self,
        tenant_id: str,
        prefix: str = "recordings/",
        limit: int = 100,
    ) -> List[RecordingMetadata]:
        """
        List recordings for a tenant.
        
        Args:
            tenant_id: Tenant to list recordings for
            prefix: Storage prefix (recordings/, compliance/, etc.)
            limit: Maximum recordings to return
            
        Returns:
            List of RecordingMetadata
        """
        full_prefix = f"{prefix}{tenant_id}/"
        
        response = self.s3.list_objects_v2(
            Bucket=self.config.bucket,
            Prefix=full_prefix,
            MaxKeys=limit,
        )
        
        recordings = []
        for obj in response.get('Contents', []):
            # Parse room name from key
            # Key format: recordings/tenant/call-tenant-xxx_timestamp.ogg
            filename = obj['Key'].split('/')[-1]
            room_name = filename.split('_')[0] if '_' in filename else filename
            
            recordings.append(RecordingMetadata(
                key=obj['Key'],
                tenant_id=tenant_id,
                room_name=room_name,
                size_bytes=obj['Size'],
                created_at=obj['LastModified'].timestamp(),
            ))
        
        return recordings
    
    async def get_download_url(
        self,
        key: str,
        expiry_seconds: int = 3600,
    ) -> str:
        """
        Generate a presigned URL for downloading a recording.
        
        Args:
            key: S3 key of the recording
            expiry_seconds: How long the URL is valid
            
        Returns:
            Presigned URL for download
        """
        url = self.s3.generate_presigned_url(
            'get_object',
            Params={
                'Bucket': self.config.bucket,
                'Key': key,
            },
            ExpiresIn=expiry_seconds,
        )
        
        return url
    
    async def find_recording_for_call(
        self,
        tenant_id: str,
        call_id: str,
    ) -> Optional[RecordingMetadata]:
        """
        Find recording for a specific call.
        
        Args:
            tenant_id: Tenant ID
            call_id: Call ID (UUID)
            
        Returns:
            RecordingMetadata if found, None otherwise
        """
        recordings = await self.list_recordings(tenant_id)
        
        for recording in recordings:
            if call_id in recording.room_name:
                # Generate download URL
                recording.presigned_url = await self.get_download_url(
                    recording.key
                )
                return recording
        
        return None

Part 6 Summary

What You Learned

In this part, you learned about LiveKit integration for Voice by aiConnected:
SectionKey Concepts
34. LiveKit Cloud SetupAccount creation, API credentials, webhooks
35. Room ManagementNaming conventions, creation, lifecycle
36. Participant ManagementTypes, permissions, identity format
37. Token GenerationJWT structure, claims, grants, refresh
38. Audio Track HandlingPublication, subscription, quality, muting
39. LiveKit WebhooksRoom/participant/track events
40. Recording with EgressRecording types, storage, retrieval

Key Takeaways

  1. LiveKit is the central hub where all participants meet
  2. Tokens control access - they encode identity and permissions
  3. Webhooks provide real-time updates about what’s happening
  4. Rooms have lifecycles - create, use, destroy
  5. Audio tracks are the core - publishing and subscribing is how voice flows
  6. Egress enables recording - important for compliance and training

Next Steps

In Part 7, you’ll learn about the Voice AI Pipeline:
  • Deepgram speech-to-text integration
  • Voice Activity Detection
  • Claude LLM integration
  • Chatterbox text-to-speech

Quick Reference

Environment Variables

# LiveKit
LIVEKIT_API_KEY=APIxxxxxxxxx
LIVEKIT_API_SECRET=your-secret
LIVEKIT_WS_URL=wss://aiconnected.livekit.cloud
LIVEKIT_API_URL=https://aiconnected.livekit.cloud

# Recording Storage
RECORDING_S3_BUCKET=your-bucket
RECORDING_S3_REGION=us-west-2
AWS_ACCESS_KEY_ID=AKIAXXXXXXXX
AWS_SECRET_ACCESS_KEY=your-secret

Common Operations

# Create a room
room = await room_service.create_room(RoomConfigFactory.for_inbound_call(context))

# Generate a token
token = await token_service.generate_agent_token(room_name, agent_id)

# Start recording
recording = await recording_service.start_recording(room_name, tenant_id)

# Stop recording
result = await recording_service.stop_recording(recording.egress_id)

Continue to Part 7 for Voice AI Pipeline details…

Junior Developer PRD — Part 7A: Pipeline Architecture & Deepgram STT

Document Version: 1.0
Last Updated: January 25, 2026
Part: 7A of 10 (Sub-part 1 of 3)
Sections: 41-42
Audience: Junior developers with no prior context
Estimated Reading Time: 20 minutes

How to Use This Document

This is Part 7A of the PRD series—the first of three sub-parts covering the Voice AI Pipeline. Part 7 was divided into sub-parts due to its comprehensive nature:
  • Part 7A (this document): Pipeline Architecture + Deepgram STT
  • Part 7B: Voice Activity Detection + Claude LLM Integration
  • Part 7C: Chatterbox TTS + Barge-In Handling + State Management
Prerequisites: Parts 1-6 of the PRD series.

Table of Contents


Section 41: Pipeline Architecture

41.1 What is the Voice Pipeline?

The voice pipeline is the heart of Voice by aiConnected. It’s the processing chain that transforms a caller’s spoken words into AI responses and back to synthesized speech. Think of it like a relay race with four runners:
  1. VAD (Voice Activity Detection) — Detects when someone is speaking
  2. STT (Speech-to-Text) — Converts speech to text
  3. LLM (Large Language Model) — Generates a response
  4. TTS (Text-to-Speech) — Converts the response back to speech
Each runner passes the baton to the next as fast as possible. The total time from when the caller stops speaking to when they hear the AI’s response is our latency. Our target is under 1 second.

Why Latency Matters

In human conversation, we naturally expect responses within 200-400ms. Here’s how different latencies feel:
LatencyUser Perception
&lt; 500msFeels instant, like talking to a human
500-1000msFeels responsive, acceptable
1000-1500msNoticeable delay, still usable
1500-2000msAwkward pause, frustrating
&gt; 2000msFeels broken, users hang up
Our target is &lt; 1000ms mouth-to-ear.

41.2 High-Level Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    VOICE PIPELINE OVERVIEW                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   CALLER                                                        │
│     │                                                           │
│     ▼                                                           │
│   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐        │
│   │   INPUT     │    │  PROCESSING │    │   OUTPUT    │        │
│   │   STAGE     │───▶│    STAGE    │───▶│   STAGE     │        │
│   └─────────────┘    └─────────────┘    └─────────────┘        │
│         │                  │                  │                 │
│         ▼                  ▼                  ▼                 │
│   • LiveKit audio    • Context assembly  • Sentence buffer     │
│   • VAD detection    • Claude LLM        • Chatterbox TTS      │
│   • Deepgram STT     • Tool execution    • Audio playback      │
│                                                                 │
│   ◀─────────────── BARGE-IN DETECTION ──────────────────▶      │
│   (VAD monitors for caller interruptions during playback)      │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Detailed Flow

┌─────────────────────────────────────────────────────────────────┐
│                         INPUT STAGE                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   LiveKit Room                                                  │
│        │                                                        │
│        ▼                                                        │
│   Raw PCM Audio (48kHz stereo)                                  │
│        │                                                        │
│        ▼                                                        │
│   Resample to 16kHz mono                                        │
│        │                                                        │
│        ├───────────────┐                                        │
│        ▼               ▼                                        │
│   [Silero VAD]    [Deepgram STT]                                │
│        │               │                                        │
│        ▼               ▼                                        │
│   Speech detected? Interim transcripts                          │
│        │               │                                        │
│        ▼               ▼                                        │
│   Endpointing     Final transcript                              │
│   (silence = done)     │                                        │
│        │               │                                        │
│        └───────┬───────┘                                        │
│                ▼                                                │
│        Ready for processing                                     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│                      PROCESSING STAGE                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   [Context Assembly]                                            │
│        │                                                        │
│        ├── System prompt (personality, instructions)            │
│        ├── Conversation history (last N turns)                  │
│        ├── Knowledge base context (RAG results)                 │
│        └── Tool definitions (available functions)               │
│        │                                                        │
│        ▼                                                        │
│   [Claude LLM] ──▶ Streaming tokens                             │
│        │                                                        │
│        ▼                                                        │
│   [Response Router]                                             │
│        │                                                        │
│        ├── Speech response ──▶ Continue to output               │
│        └── Tool call ──▶ Execute ──▶ Return to LLM              │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│                        OUTPUT STAGE                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   [Sentence Buffer]                                             │
│        │                                                        │
│        ▼ (accumulate until sentence boundary)                   │
│                                                                 │
│   [Chatterbox TTS] ──▶ Audio chunks (streaming)                 │
│        │                                                        │
│        ▼                                                        │
│   [Audio Queue] ──▶ LiveKit Room ──▶ Caller hears response      │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

41.3 Component Summary

ComponentTechnologyPurposeLatency
Audio TransportLiveKitReal-time audio streaming~40ms
VADSilero VADDetect speech activity~10ms
STTDeepgram Nova-2Transcribe speech~200-350ms
LLMClaude SonnetGenerate responses~300-500ms
TTSChatterboxSynthesize speech~100-200ms
State ManagerRedisTrack conversation state~5ms

41.4 Latency Budget

┌─────────────────────────────────────────────────────────────────┐
│                      LATENCY BUDGET                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   Component                         Target      P50      P95    │
│   ───────────────────────────────────────────────────────────   │
│   1. Endpointing delay              200ms      150ms    250ms   │
│   2. STT finalization               100ms       80ms    150ms   │
│   3. Context assembly                20ms       15ms     30ms   │
│   4. Network to LLM                  30ms       20ms     50ms   │
│   5. LLM TTFB (first token)         200ms      150ms    300ms   │
│   6. Sentence accumulation          100ms       80ms    150ms   │
│   7. Network to TTS                  20ms       15ms     30ms   │
│   8. TTS TTFB (first audio)         150ms      100ms    200ms   │
│   9. Return path                     70ms       50ms    100ms   │
│   ───────────────────────────────────────────────────────────   │
│   TOTAL                             890ms      660ms   1260ms   │
│                                                                 │
│   Target: < 1000ms (P50)                                        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

41.5 Latency Optimization Strategies

Strategy 1: Streaming Everything

Instead of waiting for complete results, we stream at every stage:
TRADITIONAL (SLOW):
  User speaks ──▶ [Wait for full transcription] ──▶ [Wait for full LLM response] ──▶ [Wait for full audio]
  Total: ~3000ms

STREAMING (FAST):
  User speaks ──▶ [Stream transcription] ──▶ [Stream LLM tokens] ──▶ [Stream audio chunks]
  First audio at: ~890ms

Strategy 2: Sentence-Level TTS

We don’t wait for the entire LLM response. As soon as we have a complete sentence, we send it to TTS:
LLM output: "Hello! How can I help you today?"

Traditional:
  Wait for full response ──▶ TTS ──▶ Play
  [──────── 800ms ────────][─200ms─][─300ms─]
  Total: 1300ms

Our approach:
  "Hello!" ──▶ TTS ──▶ Play    "How can I..." ──▶ TTS ──▶ Play
  [─100ms─][─100ms─][─150ms─]  [──── continues in parallel ────]
  First audio at: 350ms

Strategy 3: Warm Connections

Keep connections to external services pre-established:
# BAD: Cold connection on every request
async def transcribe(audio):
    client = await DeepgramClient.connect()  # 50-100ms overhead
    result = await client.transcribe(audio)
    await client.disconnect()
    return result

# GOOD: Reuse warm connection
class TranscriptionService:
    def __init__(self):
        self.client = None  # Connected on startup
    
    async def start(self):
        self.client = await DeepgramClient.connect()  # Once at startup
    
    async def transcribe(self, audio):
        return await self.client.transcribe(audio)  # No connection overhead

41.6 Pipeline States

The pipeline operates as a state machine:
┌─────────────────────────────────────────────────────────────────┐
│                    PIPELINE STATE MACHINE                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│                         ┌─────────┐                             │
│                         │  IDLE   │                             │
│                         └────┬────┘                             │
│                              │ Audio received                   │
│                              ▼                                  │
│                       ┌──────────────┐                          │
│         ┌─────────────│  LISTENING   │◀────────────┐            │
│         │             └──────┬───────┘             │            │
│         │                    │ Speech detected     │            │
│  Silence timeout             ▼                     │            │
│  (no speech)          ┌──────────────┐             │            │
│         │             │  CAPTURING   │             │            │
│         │             └──────┬───────┘             │            │
│         │                    │ Endpoint detected   │            │
│         │                    ▼                     │            │
│         │             ┌──────────────┐             │            │
│         │             │  PROCESSING  │             │            │
│         │             └──────┬───────┘             │            │
│         │                    │ First audio ready   │            │
│         │                    ▼                     │            │
│         │             ┌──────────────┐    Barge-in │            │
│         │             │   SPEAKING   │─────────────┘            │
│         │             └──────┬───────┘                          │
│         │                    │ Response complete                │
│         │                    ▼                                  │
│         └──────────────▶ LISTENING                              │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

State Definitions

StateDescriptionEntry Condition
IDLEPipeline initialized, waitingCall connected
LISTENINGWaiting for speechReady for input
CAPTURINGRecording utteranceVAD detected speech
PROCESSINGGenerating responseUser finished speaking
SPEAKINGPlaying AI responseTTS audio ready

41.7 Data Flow Example

When a caller says “What are your business hours?”:
Timeline (milliseconds):
─────────────────────────────────────────────────────────────────

0ms     Caller starts speaking "What..."
        └── VAD: Speech probability > 0.5 → CAPTURING
        └── STT: Connection open, receiving audio

100ms   "What are..."
        └── STT interim: "what are"

350ms   "What are your business..."
        └── STT interim: "what are your business"

500ms   "What are your business hours?"
        └── STT interim: "what are your business hours"

600ms   Caller stops speaking
        └── VAD: Speech probability drops
        └── Endpointing: Start silence timer (200ms)

800ms   Silence threshold reached
        └── STT final: "What are your business hours?"
        └── Transition to PROCESSING

820ms   Context assembly
        └── Load system prompt
        └── Fetch conversation history
        └── Query knowledge base

850ms   LLM request sent (streaming)

1050ms  First LLM token: "Our"
        └── Sentence buffer: "Our"

1200ms  Sentence complete: "Our business hours are Monday 
        through Friday, 9 AM to 5 PM."
        └── Send to TTS immediately

1280ms  First TTS audio chunk ready
        └── Transition to SPEAKING
        └── Caller hears "Our..."

        TTFB achieved: 480ms from end of speech ✓

2500ms  Full response complete
        └── Transition to LISTENING

41.8 Error Handling Strategy

Error Categories

CategoryExampleRecovery
TransientNetwork timeoutRetry with backoff
ProviderDeepgram API errorFailover to backup
FatalInvalid configurationEnd call gracefully

Fallback Chains

STT Fallback:
  Deepgram Nova-2 (primary)
       ↓ on failure
  Deepgram Nova-1 (fallback model)
       ↓ on failure
  Play "I'm having trouble hearing you"

LLM Fallback:
  Claude Sonnet (primary)
       ↓ on failure/timeout > 5s
  Claude Haiku (faster, less capable)
       ↓ on failure
  Play "Let me transfer you to a human"

TTS Fallback:
  Chatterbox (primary, self-hosted)
       ↓ on failure
  Cartesia Sonic (cloud backup)
       ↓ on failure
  Deepgram Aura (secondary backup)

Section 42: Deepgram STT Integration

42.1 What is Deepgram?

Deepgram is a speech-to-text (STT) service that converts spoken audio into written text. We chose Deepgram because:
  1. Low Latency: ~200ms for streaming transcription
  2. High Accuracy: 95%+ word accuracy
  3. Streaming API: Real-time results as the user speaks
  4. Interim Results: Preview of transcription before final result
  5. Automatic Punctuation: Adds periods, commas, question marks

Provider Comparison

ProviderLatencyStreamingCost/minNotes
Deepgram Nova-2~200ms$0.0043Our choice
Google Speech~300ms$0.006Higher latency
AWS Transcribe~500ms$0.024Too slow
Whisper (OpenAI)~1000ms$0.006No streaming
AssemblyAI~300ms$0.0065Backup option

42.2 Account Setup

Step 1: Create Deepgram Account

  1. Go to https://console.deepgram.com
  2. Sign up with email or Google
  3. Verify your email

Step 2: Create API Key

  1. In the console, go to API Keys
  2. Click Create New Key
  3. Name it: voice-aiconnected-production
  4. Select permissions:
    • usage:read
    • keys:read
    • transcription:read
  5. Copy the key (you won’t see it again)

Step 3: Configure Environment

# .env
DEEPGRAM_API_KEY=your_api_key_here
DEEPGRAM_MODEL=nova-2
DEEPGRAM_LANGUAGE=en-US

42.3 Configuration

"""
Deepgram STT configuration for Voice by aiConnected.

File: services/agent-service/config/deepgram.py
"""
from dataclasses import dataclass, field
from typing import List, Optional
from enum import Enum


class DeepgramModel(Enum):
    """Available Deepgram models."""
    NOVA_2 = "nova-2"      # Latest, most accurate
    NOVA_1 = "nova-1"      # Previous generation (fallback)
    ENHANCED = "enhanced"  # Older model
    BASE = "base"          # Fastest, least accurate


@dataclass
class DeepgramConfig:
    """
    Configuration for Deepgram STT.
    
    Attributes:
        api_key: Deepgram API key
        model: Which model to use
        language: Language code (e.g., "en-US")
        sample_rate: Audio sample rate in Hz
        channels: Number of audio channels (1 for mono)
        encoding: Audio encoding format
        punctuate: Add automatic punctuation
        smart_format: Format numbers, dates, etc.
        interim_results: Return results before utterance ends
        utterance_end_ms: Silence duration to end utterance
        endpointing: Milliseconds of silence to finalize
        keywords: Words to boost recognition accuracy
    """
    
    # API credentials
    api_key: str
    
    # Model selection
    model: DeepgramModel = DeepgramModel.NOVA_2
    
    # Language
    language: str = "en-US"
    
    # Audio format (must match what we send)
    sample_rate: int = 16000  # 16kHz
    channels: int = 1         # Mono
    encoding: str = "linear16"  # PCM 16-bit
    
    # Transcription options
    punctuate: bool = True
    smart_format: bool = True
    diarize: bool = False  # Speaker identification (not needed for 1:1)
    
    # Streaming options
    interim_results: bool = True
    utterance_end_ms: int = 1000
    vad_events: bool = True
    
    # Endpointing
    endpointing: int = 300  # ms of silence to finalize
    
    # Keywords boost (tenant-specific)
    keywords: List[str] = field(default_factory=list)
    
    def to_query_params(self) -> dict:
        """Convert to Deepgram WebSocket query parameters."""
        params = {
            "model": self.model.value,
            "language": self.language,
            "sample_rate": self.sample_rate,
            "channels": self.channels,
            "encoding": self.encoding,
            "punctuate": str(self.punctuate).lower(),
            "smart_format": str(self.smart_format).lower(),
            "diarize": str(self.diarize).lower(),
            "interim_results": str(self.interim_results).lower(),
            "utterance_end_ms": self.utterance_end_ms,
            "vad_events": str(self.vad_events).lower(),
            "endpointing": self.endpointing,
        }
        
        if self.keywords:
            params["keywords"] = ",".join(self.keywords)
        
        return params


# Default configuration
DEFAULT_DEEPGRAM_CONFIG = DeepgramConfig(
    api_key="",  # Set from environment
    model=DeepgramModel.NOVA_2,
    language="en-US",
    sample_rate=16000,
    punctuate=True,
    smart_format=True,
    interim_results=True,
    endpointing=300,
)

42.4 WebSocket Connection Flow

Deepgram uses WebSocket for real-time streaming:
┌─────────────────────────────────────────────────────────────────┐
│                   DEEPGRAM WEBSOCKET FLOW                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   Agent Service                           Deepgram API          │
│        │                                       │                │
│        │  1. Open WebSocket                    │                │
│        │     wss://api.deepgram.com/v1/listen  │                │
│        │     ?model=nova-2&language=en-US...   │                │
│        │ ─────────────────────────────────────▶│                │
│        │                                       │                │
│        │  2. Connection accepted               │                │
│        │◀───────────────────────────────────── │                │
│        │                                       │                │
│        │  3. Send audio chunks (binary)        │                │
│        │     [PCM 16-bit, 16kHz, mono]         │                │
│        │ ─────────────────────────────────────▶│                │
│        │ ─────────────────────────────────────▶│                │
│        │ ─────────────────────────────────────▶│                │
│        │                                       │                │
│        │  4. Receive interim results (JSON)    │                │
│        │     {"is_final": false, ...}          │                │
│        │◀───────────────────────────────────── │                │
│        │                                       │                │
│        │  5. Receive final result (JSON)       │                │
│        │     {"is_final": true, ...}           │                │
│        │◀───────────────────────────────────── │                │
│        │                                       │                │
│        │  6. Send CloseStream                  │                │
│        │     {"type": "CloseStream"}           │                │
│        │ ─────────────────────────────────────▶│                │
│        │                                       │                │
│        │  7. Connection closed                 │                │
│        │◀───────────────────────────────────── │                │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

42.5 Deepgram Client Implementation

"""
Deepgram STT client for Voice by aiConnected.

File: services/agent-service/integrations/deepgram_stt.py
"""
import asyncio
import json
import logging
from typing import Optional, Callable, Awaitable
from dataclasses import dataclass
import websockets
from websockets.client import WebSocketClientProtocol

logger = logging.getLogger(__name__)


@dataclass
class TranscriptResult:
    """
    A transcription result from Deepgram.
    
    Attributes:
        text: The transcribed text
        is_final: Whether this result won't change
        speech_final: Whether the utterance is complete
        confidence: Confidence score (0-1)
        words: Word-level timing and confidence
        start: Start time in seconds
        duration: Duration in seconds
    """
    text: str
    is_final: bool
    speech_final: bool
    confidence: float
    words: list
    start: float
    duration: float


class DeepgramSTTClient:
    """
    Streaming STT client for Deepgram.
    
    Example:
        async def handle_transcript(result: TranscriptResult):
            if result.is_final:
                print(f"Final: {result.text}")
            else:
                print(f"Interim: {result.text}")
        
        client = DeepgramSTTClient(config, on_transcript=handle_transcript)
        await client.connect()
        
        # Send audio chunks
        await client.send_audio(audio_chunk)
        
        # When done
        await client.close()
    """
    
    DEEPGRAM_URL = "wss://api.deepgram.com/v1/listen"
    
    def __init__(
        self,
        config: DeepgramConfig,
        on_transcript: Optional[Callable[[TranscriptResult], Awaitable[None]]] = None,
        on_error: Optional[Callable[[Exception], Awaitable[None]]] = None,
        on_close: Optional[Callable[[], Awaitable[None]]] = None,
    ):
        self.config = config
        self.on_transcript = on_transcript
        self.on_error = on_error
        self.on_close = on_close
        
        self._ws: Optional[WebSocketClientProtocol] = None
        self._receive_task: Optional[asyncio.Task] = None
        self._connected = False
    
    async def connect(self) -> None:
        """Open connection to Deepgram."""
        # Build URL with query parameters
        params = self.config.to_query_params()
        query_string = "&".join(f"{k}={v}" for k, v in params.items())
        url = f"{self.DEEPGRAM_URL}?{query_string}"
        
        # Connect with API key in header
        headers = {
            "Authorization": f"Token {self.config.api_key}"
        }
        
        try:
            self._ws = await websockets.connect(
                url,
                extra_headers=headers,
                ping_interval=20,
                ping_timeout=10,
            )
            self._connected = True
            
            # Start receiving messages
            self._receive_task = asyncio.create_task(self._receive_loop())
            
            logger.info("Connected to Deepgram STT")
            
        except Exception as e:
            logger.error(f"Failed to connect to Deepgram: {e}")
            raise
    
    async def send_audio(self, audio_data: bytes) -> None:
        """
        Send audio chunk to Deepgram.
        
        Args:
            audio_data: Raw PCM audio bytes (16-bit, mono, 16kHz)
        """
        if not self._connected or not self._ws:
            logger.warning("Cannot send audio: not connected")
            return
        
        try:
            await self._ws.send(audio_data)
        except Exception as e:
            logger.error(f"Error sending audio: {e}")
            if self.on_error:
                await self.on_error(e)
    
    async def close(self) -> None:
        """Close the connection gracefully."""
        if not self._connected:
            return
        
        self._connected = False
        
        if self._ws:
            try:
                await self._ws.send(json.dumps({"type": "CloseStream"}))
                await self._ws.close()
            except Exception as e:
                logger.warning(f"Error closing connection: {e}")
        
        if self._receive_task:
            self._receive_task.cancel()
            try:
                await self._receive_task
            except asyncio.CancelledError:
                pass
        
        self._ws = None
        self._receive_task = None
        
        if self.on_close:
            await self.on_close()
        
        logger.info("Disconnected from Deepgram STT")
    
    async def _receive_loop(self) -> None:
        """Background task that receives messages."""
        try:
            async for message in self._ws:
                try:
                    data = json.loads(message)
                    await self._handle_message(data)
                except json.JSONDecodeError:
                    logger.warning(f"Invalid JSON: {message[:100]}")
                    
        except websockets.ConnectionClosed as e:
            logger.info(f"Connection closed: {e.code} {e.reason}")
        except Exception as e:
            logger.error(f"Error in receive loop: {e}")
            if self.on_error:
                await self.on_error(e)
    
    async def _handle_message(self, data: dict) -> None:
        """Handle a message from Deepgram."""
        msg_type = data.get("type")
        
        if msg_type == "Results":
            channel = data.get("channel", {})
            alternatives = channel.get("alternatives", [])
            
            if alternatives:
                alt = alternatives[0]
                
                result = TranscriptResult(
                    text=alt.get("transcript", ""),
                    is_final=data.get("is_final", False),
                    speech_final=data.get("speech_final", False),
                    confidence=alt.get("confidence", 0.0),
                    words=alt.get("words", []),
                    start=data.get("start", 0.0),
                    duration=data.get("duration", 0.0),
                )
                
                if self.on_transcript:
                    await self.on_transcript(result)
        
        elif msg_type == "Metadata":
            logger.debug(f"Deepgram metadata: {data}")
        
        elif msg_type == "SpeechStarted":
            logger.debug("Deepgram: Speech started")
        
        elif msg_type == "UtteranceEnd":
            logger.debug("Deepgram: Utterance ended")
        
        elif msg_type == "Error":
            error_msg = data.get("message", "Unknown error")
            logger.error(f"Deepgram error: {error_msg}")
            if self.on_error:
                await self.on_error(Exception(error_msg))
    
    @property
    def is_connected(self) -> bool:
        """Whether the client is connected."""
        return self._connected

42.6 Interim vs Final Results

Deepgram sends two types of results:

Interim Results (is_final=False)

  • Sent while the user is still speaking
  • May change as more audio is processed
  • Use for displaying live transcription
  • Don’t send to LLM

Final Results (is_final=True)

  • Sent when Deepgram is confident the text won’t change
  • May still be mid-utterance
  • Use for building the complete transcript

Speech Final (speech_final=True)

  • Indicates the user has stopped speaking
  • Time to send to LLM
User says: "What are your business hours?"

Timeline:
─────────────────────────────────────────────────────────────────
100ms   interim: "what"
200ms   interim: "what are"
300ms   interim: "what are your"
400ms   interim: "what are your business"
500ms   interim: "what are your business hours"
600ms   final:   "What are your business hours?"  (is_final=true)
800ms   speech_final=true  ← Send to LLM now

Transcript Accumulator

"""
Transcript accumulator for handling interim/final results.

File: services/agent-service/pipeline/transcript_accumulator.py
"""
from dataclasses import dataclass, field
from typing import Optional, Callable, Awaitable, List


@dataclass
class TranscriptAccumulator:
    """
    Accumulates transcript results into complete utterances.
    
    Handles interim vs final results, building a coherent
    transcript from streaming results.
    """
    
    # Callbacks
    on_interim: Optional[Callable[[str], Awaitable[None]]] = None
    on_final: Optional[Callable[[str], Awaitable[None]]] = None
    
    # Internal state
    _final_text: str = ""
    _interim_text: str = ""
    _words: List[dict] = field(default_factory=list)
    
    async def process_result(self, result: TranscriptResult) -> None:
        """Process a transcript result from Deepgram."""
        if result.is_final:
            # Final result - append to accumulated text
            self._final_text += result.text
            self._words.extend(result.words)
            self._interim_text = ""
            
            if result.speech_final:
                # User is done speaking
                full_text = self._final_text.strip()
                
                if full_text and self.on_final:
                    await self.on_final(full_text)
                
                # Reset for next utterance
                self._final_text = ""
                self._words = []
        else:
            # Interim result - update preview
            self._interim_text = result.text
            
            if self.on_interim:
                full_preview = (self._final_text + self._interim_text).strip()
                await self.on_interim(full_preview)
    
    def get_current_transcript(self) -> str:
        """Get current transcript including interim text."""
        return (self._final_text + self._interim_text).strip()
    
    def clear(self) -> None:
        """Clear accumulated transcript."""
        self._final_text = ""
        self._interim_text = ""
        self._words = []

42.7 Audio Format Requirements

Deepgram expects audio in a specific format:
ParameterValueNotes
Sample Rate16000 HzOptimal for speech
Channels1 (mono)Stereo wastes bandwidth
Encodinglinear1616-bit signed PCM
Byte OrderLittle-endianStandard

Audio Conversion

LiveKit outputs 48kHz stereo. We need to convert:
"""
Audio resampling for Deepgram STT.

File: services/agent-service/pipeline/audio_utils.py
"""
import numpy as np
from scipy import signal


def resample_for_stt(
    audio: np.ndarray,
    input_rate: int = 48000,
    output_rate: int = 16000,
) -> np.ndarray:
    """
    Resample audio for Deepgram STT.
    
    Args:
        audio: Input audio (can be stereo or mono)
        input_rate: Input sample rate (48kHz from LiveKit)
        output_rate: Output sample rate (16kHz for Deepgram)
    
    Returns:
        Resampled mono audio as int16 numpy array
    """
    # Convert to mono if stereo
    if len(audio.shape) > 1 and audio.shape[1] == 2:
        audio = np.mean(audio, axis=1)
    
    # Resample
    if input_rate != output_rate:
        num_samples = int(len(audio) * output_rate / input_rate)
        audio = signal.resample(audio, num_samples)
    
    # Convert to int16
    if audio.dtype != np.int16:
        if audio.dtype in (np.float32, np.float64):
            audio = np.clip(audio, -1.0, 1.0)
            audio = (audio * 32767).astype(np.int16)
        else:
            audio = audio.astype(np.int16)
    
    return audio


def audio_to_bytes(audio: np.ndarray) -> bytes:
    """Convert numpy audio array to bytes."""
    return audio.tobytes()

42.8 Error Handling and Retry

Common Errors

ErrorCauseSolution
401 UnauthorizedInvalid API keyCheck DEEPGRAM_API_KEY
429 Too Many RequestsRate limitedBackoff, check plan
Connection droppedNetwork issueReconnect
Empty transcriptSilence or noiseCheck audio input

Retry Logic

"""
Retry logic for Deepgram connection.

File: services/agent-service/integrations/deepgram_retry.py
"""
import asyncio
import logging

logger = logging.getLogger(__name__)


async def connect_with_retry(
    client: DeepgramSTTClient,
    max_retries: int = 3,
    base_delay: float = 1.0,
) -> bool:
    """
    Connect to Deepgram with exponential backoff.
    
    Args:
        client: The Deepgram client
        max_retries: Maximum retry attempts
        base_delay: Initial delay (doubles each retry)
    
    Returns:
        True if connected, False otherwise
    """
    for attempt in range(max_retries):
        try:
            await client.connect()
            return True
        except Exception as e:
            delay = base_delay * (2 ** attempt)
            logger.warning(
                f"Attempt {attempt + 1} failed: {e}. "
                f"Retrying in {delay}s..."
            )
            await asyncio.sleep(delay)
    
    logger.error(f"Failed after {max_retries} attempts")
    return False

42.9 Integration with Pipeline

Here’s how Deepgram integrates with the voice pipeline:
"""
Example: Deepgram integration in the pipeline.

File: services/agent-service/pipeline/stt_handler.py
"""
import asyncio
import logging
from typing import Optional

logger = logging.getLogger(__name__)


class STTHandler:
    """
    Handles STT integration in the voice pipeline.
    
    Responsibilities:
    - Manage Deepgram connection lifecycle
    - Process audio frames
    - Accumulate transcripts
    - Signal when utterance is complete
    """
    
    def __init__(
        self,
        config: DeepgramConfig,
        on_utterance_complete: callable,
        on_interim_transcript: callable = None,
    ):
        self.config = config
        self.on_utterance_complete = on_utterance_complete
        self.on_interim_transcript = on_interim_transcript
        
        self._client: Optional[DeepgramSTTClient] = None
        self._accumulator = TranscriptAccumulator(
            on_interim=self._handle_interim,
            on_final=self._handle_final,
        )
    
    async def start(self) -> None:
        """Start the STT handler."""
        self._client = DeepgramSTTClient(
            config=self.config,
            on_transcript=self._accumulator.process_result,
            on_error=self._handle_error,
        )
        
        success = await connect_with_retry(self._client)
        if not success:
            raise RuntimeError("Failed to connect to Deepgram")
        
        logger.info("STT handler started")
    
    async def stop(self) -> None:
        """Stop the STT handler."""
        if self._client:
            await self._client.close()
        logger.info("STT handler stopped")
    
    async def process_audio(self, audio_bytes: bytes) -> None:
        """Process an audio chunk."""
        if self._client and self._client.is_connected:
            await self._client.send_audio(audio_bytes)
    
    async def _handle_interim(self, text: str) -> None:
        """Handle interim transcript."""
        if self.on_interim_transcript:
            await self.on_interim_transcript(text)
    
    async def _handle_final(self, text: str) -> None:
        """Handle final transcript (utterance complete)."""
        logger.info(f"Utterance complete: {text}")
        await self.on_utterance_complete(text)
    
    async def _handle_error(self, error: Exception) -> None:
        """Handle STT error."""
        logger.error(f"STT error: {error}")
        # Attempt reconnection
        if self._client:
            await self._client.close()
            await asyncio.sleep(1)
            await self.start()

Summary: What You’ve Learned in Part 7A

Section 41: Pipeline Architecture

  • The voice pipeline transforms speech → text → AI response → speech
  • Target latency: &lt;1000ms mouth-to-ear
  • Key optimization: streaming at every stage
  • Pipeline states: IDLE → LISTENING → CAPTURING → PROCESSING → SPEAKING

Section 42: Deepgram STT Integration

  • Deepgram Nova-2 provides low-latency streaming transcription
  • WebSocket connection for real-time audio streaming
  • Interim results for preview, final results for processing
  • Audio format: 16kHz, mono, 16-bit PCM

What’s Next

In Part 7B, you’ll learn:
  • Voice Activity Detection (VAD) with Silero
  • Claude LLM integration for conversation AI
  • System prompt design for voice
  • Function calling (tools) in voice context

Document Metadata

FieldValue
Document IDPRD-007A
TitleJunior Developer PRD — Part 7A
Version1.0
StatusComplete

End of Part 7A — Continue to Part 7B

Junior Developer PRD — Part 7B: VAD & Claude LLM Integration

Document Version: 1.0
Last Updated: January 25, 2026
Part: 7B of 10 (Sub-part 2 of 3)
Sections: 43-44
Audience: Junior developers with no prior context
Estimated Reading Time: 20 minutes

How to Use This Document

This is Part 7B—the second of three sub-parts covering the Voice AI Pipeline:
  • Part 7A: Pipeline Architecture + Deepgram STT ✓
  • Part 7B (this document): VAD + Claude LLM Integration
  • Part 7C: Chatterbox TTS + Barge-In + State Management
Prerequisites: Parts 1-6 and Part 7A.

Table of Contents


Section 43: Voice Activity Detection (VAD)

43.1 What is VAD?

Voice Activity Detection (VAD) determines when someone is speaking versus when there’s silence or background noise. It’s crucial for:
  1. Knowing when to transcribe: Don’t waste resources on silence
  2. Endpointing: Detecting when the user finished speaking
  3. Interruption handling: Detecting barge-in during TTS playback

Why Use Separate VAD (Not Just Deepgram)?

Deepgram has built-in VAD, but we use our own because:
  1. Lower latency: Local VAD is faster than waiting for Deepgram
  2. Interruption detection: Need VAD running during TTS playback
  3. More control: Tune sensitivity for our use case
  4. Redundancy: Don’t rely on single provider

43.2 Silero VAD

We use Silero VAD, a lightweight neural network that runs locally:
  • Fast: ~10ms per frame on CPU
  • Accurate: 95%+ accuracy
  • Small: ~2MB model size
  • Open source: MIT license

How It Works

Silero VAD outputs a probability from 0 to 1 for each audio frame:
ProbabilityMeaning
0.0 - 0.3Definitely not speech (silence, noise)
0.3 - 0.5Uncertain (background noise, breathing)
0.5 - 0.7Likely speech
0.7 - 1.0Definitely speech
We use a threshold (typically 0.5) to make the binary decision.

43.3 VAD Configuration

"""
VAD configuration for Voice by aiConnected.

File: services/agent-service/config/vad.py
"""
from dataclasses import dataclass
from enum import Enum


class VADSensitivity(Enum):
    """VAD sensitivity presets."""
    LOW = "low"       # Ignores noise, may miss quiet speech
    MEDIUM = "medium" # Balanced (default)
    HIGH = "high"     # Catches quiet speech, may trigger on noise


@dataclass
class VADConfig:
    """
    Configuration for Voice Activity Detection.
    
    Attributes:
        threshold: Speech detection threshold (0-1)
        min_speech_duration_ms: Minimum speech to consider valid
        min_silence_duration_ms: Silence duration to end utterance
        frame_duration_ms: Audio frame duration (30ms optimal)
        sample_rate: Expected sample rate (16kHz)
        smoothing_window: Frames to average for smoothing
        speech_pad_ms: Padding before/after detected speech
    """
    
    # Speech detection threshold (0-1)
    threshold: float = 0.5
    
    # Minimum speech duration (ms) - ignores coughs, clicks
    min_speech_duration_ms: int = 250
    
    # Silence duration to end utterance (ms)
    min_silence_duration_ms: int = 300
    
    # Frame duration (Silero works best with 30ms)
    frame_duration_ms: int = 30
    
    # Sample rate
    sample_rate: int = 16000
    
    # Smoothing window (number of frames)
    smoothing_window: int = 3
    
    # Padding before/after speech (ms)
    speech_pad_ms: int = 100
    
    @classmethod
    def from_sensitivity(cls, sensitivity: VADSensitivity) -> "VADConfig":
        """Create config from sensitivity preset."""
        if sensitivity == VADSensitivity.LOW:
            return cls(
                threshold=0.7,
                min_speech_duration_ms=300,
                min_silence_duration_ms=400,
            )
        elif sensitivity == VADSensitivity.HIGH:
            return cls(
                threshold=0.3,
                min_speech_duration_ms=150,
                min_silence_duration_ms=200,
            )
        else:  # MEDIUM
            return cls()


# Sensitivity settings for different scenarios
VAD_PRESETS = {
    "quiet_office": VADConfig(threshold=0.5, min_silence_duration_ms=300),
    "noisy_environment": VADConfig(threshold=0.7, min_silence_duration_ms=400),
    "fast_conversation": VADConfig(threshold=0.4, min_silence_duration_ms=200),
    "elderly_callers": VADConfig(threshold=0.4, min_silence_duration_ms=500),
}

43.4 VAD Implementation

"""
Voice Activity Detection using Silero VAD.

File: services/agent-service/pipeline/vad.py
"""
import numpy as np
import torch
import logging
from typing import Optional
from dataclasses import dataclass
from collections import deque

logger = logging.getLogger(__name__)


@dataclass
class VADEvent:
    """
    An event from the VAD processor.
    
    Attributes:
        event_type: "speech_start" or "speech_end"
        timestamp_ms: When the event occurred
        probability: Speech probability (0-1)
        duration_ms: Duration of speech (for speech_end)
    """
    event_type: str
    timestamp_ms: float
    probability: float = 0.0
    duration_ms: float = 0.0


class SileroVAD:
    """
    Voice Activity Detection using Silero VAD model.
    
    Example:
        vad = SileroVAD(config)
        vad.load_model()
        
        for frame in audio_frames:
            event = vad.process_frame(frame)
            if event and event.event_type == "speech_end":
                # User finished speaking
                process_utterance()
    """
    
    def __init__(self, config: VADConfig):
        self.config = config
        
        # Model (loaded lazily)
        self._model = None
        self._model_loaded = False
        
        # State tracking
        self._is_speaking = False
        self._speech_start_time: Optional[float] = None
        self._silence_start_time: Optional[float] = None
        self._current_time_ms: float = 0
        
        # Smoothing buffer
        self._probability_buffer: deque = deque(
            maxlen=config.smoothing_window
        )
        
        # Frame size in samples
        self._frame_samples = int(
            config.sample_rate * config.frame_duration_ms / 1000
        )
    
    def load_model(self) -> None:
        """Load the Silero VAD model (~2MB download on first run)."""
        if self._model_loaded:
            return
        
        try:
            self._model, _ = torch.hub.load(
                repo_or_dir='snakers4/silero-vad',
                model='silero_vad',
                force_reload=False,
                onnx=False,
            )
            self._model.eval()
            self._model_loaded = True
            logger.info("Silero VAD model loaded")
            
        except Exception as e:
            logger.error(f"Failed to load Silero VAD: {e}")
            raise
    
    def process_frame(self, audio_frame: np.ndarray) -> Optional[VADEvent]:
        """
        Process a single audio frame.
        
        Args:
            audio_frame: Audio samples (int16 or float32, mono, 16kHz)
        
        Returns:
            VADEvent if speech_start or speech_end detected
        """
        if not self._model_loaded:
            raise RuntimeError("Call load_model() first")
        
        # Convert to float32 tensor
        if audio_frame.dtype == np.int16:
            audio = audio_frame.astype(np.float32) / 32768.0
        else:
            audio = audio_frame.astype(np.float32)
        
        tensor = torch.from_numpy(audio)
        
        # Get speech probability
        with torch.no_grad():
            probability = self._model(tensor, self.config.sample_rate).item()
        
        # Smooth probability
        self._probability_buffer.append(probability)
        smoothed = sum(self._probability_buffer) / len(self._probability_buffer)
        
        # Update time
        self._current_time_ms += self.config.frame_duration_ms
        
        # Detect transitions
        return self._detect_transitions(smoothed)
    
    def _detect_transitions(self, probability: float) -> Optional[VADEvent]:
        """Detect speech start/end transitions."""
        is_speech = probability >= self.config.threshold
        
        if is_speech:
            self._silence_start_time = None
            
            if not self._is_speaking:
                # Potential speech start
                if self._speech_start_time is None:
                    self._speech_start_time = self._current_time_ms
                
                # Check duration threshold
                duration = self._current_time_ms - self._speech_start_time
                if duration >= self.config.min_speech_duration_ms:
                    self._is_speaking = True
                    logger.debug(f"Speech started at {self._speech_start_time}ms")
                    
                    return VADEvent(
                        event_type="speech_start",
                        timestamp_ms=self._speech_start_time,
                        probability=probability,
                    )
        else:
            if self._is_speaking:
                # Potential speech end
                if self._silence_start_time is None:
                    self._silence_start_time = self._current_time_ms
                
                # Check silence threshold
                silence_duration = self._current_time_ms - self._silence_start_time
                if silence_duration >= self.config.min_silence_duration_ms:
                    speech_duration = self._silence_start_time - self._speech_start_time
                    
                    self._is_speaking = False
                    self._speech_start_time = None
                    self._silence_start_time = None
                    
                    logger.debug(f"Speech ended. Duration: {speech_duration}ms")
                    
                    return VADEvent(
                        event_type="speech_end",
                        timestamp_ms=self._current_time_ms,
                        probability=probability,
                        duration_ms=speech_duration,
                    )
            else:
                self._speech_start_time = None
        
        return None
    
    def reset(self) -> None:
        """Reset VAD state for a new utterance."""
        self._is_speaking = False
        self._speech_start_time = None
        self._silence_start_time = None
        self._probability_buffer.clear()
        
        if self._model is not None:
            self._model.reset_states()
    
    @property
    def is_speaking(self) -> bool:
        """Whether speech is currently detected."""
        return self._is_speaking

43.5 Endpointing Strategies

“Endpointing” means detecting when the user finished their utterance.

Strategy 1: Fixed Silence Timeout

Simple: wait for N milliseconds of silence.
min_silence_duration_ms = 300

if silence_duration >= min_silence_duration_ms:
    process_utterance()
Pros: Simple, predictable
Cons: Cuts off slow speakers, waits too long for fast speakers

Strategy 2: Adaptive Endpointing

Adjust timeout based on speech patterns:
"""
Adaptive endpointing.

File: services/agent-service/pipeline/endpointing.py
"""
from dataclasses import dataclass


@dataclass
class AdaptiveEndpointer:
    """
    Adapts silence timeout based on speech patterns.
    
    Fast speakers with short pauses → shorter timeout
    Slow speakers with long pauses → longer timeout
    """
    
    base_timeout_ms: float = 300
    min_timeout_ms: float = 200
    max_timeout_ms: float = 800
    adaptation_rate: float = 0.3
    
    _avg_pause_duration: float = 300
    _pause_count: int = 0
    
    def get_timeout(self) -> float:
        """Get current silence timeout."""
        if self._pause_count < 3:
            return self.base_timeout_ms
        
        timeout = self._avg_pause_duration * 1.5
        return max(self.min_timeout_ms, min(self.max_timeout_ms, timeout))
    
    def record_pause(self, pause_duration_ms: float) -> None:
        """Record a mid-utterance pause to adapt timeout."""
        self._pause_count += 1
        
        # Exponential moving average
        self._avg_pause_duration = (
            self._avg_pause_duration * (1 - self.adaptation_rate) +
            pause_duration_ms * self.adaptation_rate
        )
    
    def reset(self) -> None:
        """Reset for a new call."""
        self._avg_pause_duration = self.base_timeout_ms
        self._pause_count = 0

Strategy 3: Semantic Endpointing (Advanced)

Use transcript content to help determine completion:
"""
Semantic endpointing using transcript content.

File: services/agent-service/pipeline/semantic_endpointing.py
"""
import re


def is_likely_complete(transcript: str) -> bool:
    """
    Check if transcript appears complete.
    
    Heuristic - not perfect, but reduces awkward cutoffs.
    """
    transcript = transcript.strip()
    
    if not transcript:
        return False
    
    # Sentence-ending punctuation
    if transcript[-1] in '.!?':
        return True
    
    # Trailing conjunctions = probably not done
    trailing_patterns = [
        r'\b(and|or|but|so|because|if|when|while)\s*$',
        r'\b(the|a|an|my|your)\s*$',
    ]
    for pattern in trailing_patterns:
        if re.search(pattern, transcript, re.IGNORECASE):
            return False
    
    # At least 2 words = probably complete
    return len(transcript.split()) >= 2

43.6 VAD Settings by Scenario

ScenarioThresholdMin SpeechMin Silence
Quiet office0.5250ms300ms
Noisy environment0.7300ms400ms
Fast conversation0.4150ms200ms
Elderly callers0.4200ms500ms
Call center0.5250ms350ms

Section 44: Claude LLM Integration

44.1 What is Claude?

Claude is Anthropic’s large language model—the “brain” of our voice AI. It understands the caller’s request and generates appropriate responses. We use Claude Sonnet for the best balance of:
  • Speed: Fast enough for real-time conversation
  • Quality: Intelligent, coherent responses
  • Cost: Reasonable per-token pricing

Model Comparison

ModelTTFBQualityCost (in/out per 1M)Use Case
Claude Sonnet~300msExcellent3/3 / 15Primary
Claude Haiku~150msGood0.25/0.25 / 1.25Fallback
GPT-4o~350msExcellent5/5 / 15Alternative
GPT-4o-mini~200msGood0.15/0.15 / 0.60Backup

44.2 API Setup

Step 1: Create Anthropic Account

  1. Go to https://console.anthropic.com
  2. Sign up with email
  3. Verify email

Step 2: Create API Key

  1. Go to API Keys in console
  2. Click Create Key
  3. Name it: voice-aiconnected-production
  4. Copy the key

Step 3: Configure Environment

# .env
ANTHROPIC_API_KEY=sk-ant-...your-key
ANTHROPIC_MODEL=claude-sonnet-4-20250514
ANTHROPIC_MAX_TOKENS=1024

44.3 LLM Configuration

"""
Claude LLM configuration for Voice by aiConnected.

File: services/agent-service/config/llm.py
"""
from dataclasses import dataclass, field
from typing import Optional, List
from enum import Enum


class ClaudeModel(Enum):
    """Available Claude models."""
    OPUS = "claude-opus-4-20250514"
    SONNET = "claude-sonnet-4-20250514"
    HAIKU = "claude-haiku-4-20250514"


@dataclass
class LLMConfig:
    """
    Configuration for Claude LLM.
    
    Attributes:
        api_key: Anthropic API key
        model: Which model to use
        max_tokens: Maximum response length
        temperature: Creativity (0=deterministic, 1=creative)
        stream: Whether to stream responses (always True for voice)
        timeout_seconds: Maximum wait time
        max_retries: Retry attempts on failure
    """
    
    api_key: str
    model: ClaudeModel = ClaudeModel.SONNET
    max_tokens: int = 1024
    temperature: float = 0.7
    top_p: float = 0.9
    stream: bool = True
    timeout_seconds: float = 30.0
    max_retries: int = 2
    retry_delay_seconds: float = 0.5
    stop_sequences: List[str] = field(default_factory=list)


DEFAULT_LLM_CONFIG = LLMConfig(
    api_key="",
    model=ClaudeModel.SONNET,
    max_tokens=1024,
    temperature=0.7,
    stream=True,
)

44.4 System Prompt Design for Voice

Voice conversations need special system prompt considerations:
"""
System prompt templates for voice AI.

File: services/agent-service/prompts/voice_system_prompt.py
"""
from dataclasses import dataclass
from typing import List, Optional


@dataclass
class VoiceSystemPrompt:
    """
    System prompt optimized for voice conversations.
    """
    
    business_name: str
    business_type: str
    personality_traits: List[str]
    speaking_style: str
    knowledge_context: str = ""
    available_tools: List[str] = None
    current_time: str = ""
    
    def build(self) -> str:
        """Build the complete system prompt."""
        
        parts = []
        
        # Core identity and voice-specific rules
        parts.append(f"""You are a voice AI assistant for {self.business_name}, a {self.business_type}.

CRITICAL: This is a VOICE conversation over the phone. The caller CANNOT see text, links, or formatting.

VOICE RULES:
1. SPEAK NATURALLY: Use conversational language. No bullet points, lists, or formatting.
2. BE CONCISE: Keep responses to 1-3 sentences unless more detail is requested.
3. SPELL OUT: Say "nine one one" not "911". Say "dollar" not "$".
4. CONFIRM: Periodically check if the caller understood.
5. HANDLE INTERRUPTIONS: If interrupted, stop immediately and listen.
6. SOUND HUMAN: Use occasional fillers ("Well,", "Let me see,"). Vary sentences.""")
        
        # Personality
        traits = ", ".join(self.personality_traits)
        parts.append(f"""
PERSONALITY: You are {traits}.
SPEAKING STYLE: {self.speaking_style}""")
        
        # Knowledge
        if self.knowledge_context:
            parts.append(f"""
BUSINESS INFORMATION:
{self.knowledge_context}

Use this to answer questions. If unsure, say so honestly.""")
        
        # Tools
        if self.available_tools:
            tools = ", ".join(self.available_tools)
            parts.append(f"""
AVAILABLE ACTIONS: You can {tools}.""")
        
        # Time context
        if self.current_time:
            parts.append(f"""
CURRENT TIME: {self.current_time}""")
        
        # Conversation guidelines
        parts.append("""
GUIDELINES:
- Start with a brief, friendly greeting
- Listen carefully to caller's needs
- Provide helpful, accurate information
- If you cannot help, offer to transfer to a human
- End calls politely when the caller is satisfied""")
        
        return "\n".join(parts)


# Example usage
def create_dental_office_prompt(knowledge: str, current_time: str) -> str:
    """Create prompt for a dental office."""
    return VoiceSystemPrompt(
        business_name="Smile Dental",
        business_type="dental office",
        personality_traits=["friendly", "professional", "patient"],
        speaking_style="warm and reassuring",
        knowledge_context=knowledge,
        available_tools=[
            "check appointment availability",
            "schedule appointments",
            "transfer to office manager",
        ],
        current_time=current_time,
    ).build()

44.5 Conversation History Management

Claude needs conversation history for context, but we must manage it carefully:
"""
Conversation history management for Claude.

File: services/agent-service/pipeline/conversation_history.py
"""
from dataclasses import dataclass, field
from typing import List, Dict
from enum import Enum


class MessageRole(Enum):
    USER = "user"
    ASSISTANT = "assistant"


@dataclass
class ConversationMessage:
    """A single message in the conversation."""
    role: MessageRole
    content: str
    timestamp: float = 0.0
    token_count: int = 0


@dataclass
class ConversationHistory:
    """
    Manages conversation history with token limits.
    
    Automatically trims old messages to stay within limits.
    """
    
    max_history_tokens: int = 4000
    max_turns: int = 20
    messages: List[ConversationMessage] = field(default_factory=list)
    
    def add_user_message(self, content: str) -> None:
        """Add a user message."""
        token_count = self._estimate_tokens(content)
        self.messages.append(ConversationMessage(
            role=MessageRole.USER,
            content=content,
            token_count=token_count,
        ))
        self._trim_if_needed()
    
    def add_assistant_message(self, content: str) -> None:
        """Add an assistant message."""
        token_count = self._estimate_tokens(content)
        self.messages.append(ConversationMessage(
            role=MessageRole.ASSISTANT,
            content=content,
            token_count=token_count,
        ))
        self._trim_if_needed()
    
    def get_messages_for_api(self) -> List[Dict[str, str]]:
        """Get messages in Claude API format."""
        return [
            {"role": msg.role.value, "content": msg.content}
            for msg in self.messages
        ]
    
    def get_total_tokens(self) -> int:
        """Get total token count."""
        return sum(msg.token_count for msg in self.messages)
    
    def _estimate_tokens(self, text: str) -> int:
        """Estimate token count (~4 chars per token)."""
        return len(text) // 4
    
    def _trim_if_needed(self) -> None:
        """Trim history to stay within limits."""
        # Trim by turn count
        while len(self.messages) > self.max_turns * 2:
            self.messages.pop(0)
        
        # Trim by token count
        while self.get_total_tokens() > self.max_history_tokens and len(self.messages) > 2:
            self.messages.pop(0)
    
    def clear(self) -> None:
        """Clear history."""
        self.messages.clear()

44.6 Streaming Responses

For voice, we must stream LLM responses:
"""
Streaming Claude client for Voice by aiConnected.

File: services/agent-service/integrations/claude_llm.py
"""
import anthropic
import logging
from typing import AsyncIterator, Optional, List, Dict, Any
from dataclasses import dataclass

logger = logging.getLogger(__name__)


@dataclass
class StreamingChunk:
    """A chunk from the streaming response."""
    text: str
    is_complete: bool = False
    stop_reason: Optional[str] = None


class ClaudeLLM:
    """
    Streaming Claude client optimized for voice.
    
    Example:
        llm = ClaudeLLM(config)
        
        async for chunk in llm.generate_streaming(
            system_prompt=prompt,
            messages=history,
        ):
            send_to_tts(chunk.text)
    """
    
    def __init__(self, config: LLMConfig):
        self.config = config
        self._client = anthropic.AsyncAnthropic(api_key=config.api_key)
    
    async def generate_streaming(
        self,
        system_prompt: str,
        messages: List[Dict[str, str]],
        tools: Optional[List[Dict[str, Any]]] = None,
    ) -> AsyncIterator[StreamingChunk]:
        """
        Generate a streaming response.
        
        Args:
            system_prompt: The system prompt
            messages: Conversation history
            tools: Optional tool definitions
        
        Yields:
            StreamingChunk objects as they arrive
        """
        try:
            request_kwargs = {
                "model": self.config.model.value,
                "max_tokens": self.config.max_tokens,
                "temperature": self.config.temperature,
                "system": system_prompt,
                "messages": messages,
                "stream": True,
            }
            
            if tools:
                request_kwargs["tools"] = tools
            
            if self.config.stop_sequences:
                request_kwargs["stop_sequences"] = self.config.stop_sequences
            
            async with self._client.messages.stream(**request_kwargs) as stream:
                async for event in stream:
                    if event.type == "content_block_delta":
                        if hasattr(event.delta, "text"):
                            yield StreamingChunk(
                                text=event.delta.text,
                                is_complete=False,
                            )
                    
                    elif event.type == "message_stop":
                        yield StreamingChunk(
                            text="",
                            is_complete=True,
                            stop_reason="end_turn",
                        )
        
        except anthropic.APITimeoutError:
            logger.error("Claude API timeout")
            yield StreamingChunk(
                text="I'm sorry, I'm having trouble right now.",
                is_complete=True,
                stop_reason="timeout",
            )
        
        except anthropic.APIError as e:
            logger.error(f"Claude API error: {e}")
            yield StreamingChunk(
                text="I apologize, I encountered an error. Let me transfer you.",
                is_complete=True,
                stop_reason="error",
            )

44.7 Function Calling (Tools)

Claude can execute functions during conversations:
"""
Tool definitions for voice AI.

File: services/agent-service/tools/definitions.py
"""
from typing import List, Dict, Any


def get_voice_ai_tools() -> List[Dict[str, Any]]:
    """Get tool definitions for Claude API."""
    return [
        {
            "name": "transfer_to_human",
            "description": "Transfer call to human agent when caller requests or you cannot help.",
            "input_schema": {
                "type": "object",
                "properties": {
                    "reason": {
                        "type": "string",
                        "description": "Reason for transfer"
                    },
                    "department": {
                        "type": "string",
                        "enum": ["general", "sales", "support", "billing"],
                    }
                },
                "required": ["reason"]
            }
        },
        {
            "name": "check_availability",
            "description": "Check appointment availability.",
            "input_schema": {
                "type": "object",
                "properties": {
                    "date": {
                        "type": "string",
                        "description": "Date (YYYY-MM-DD)"
                    },
                    "time_preference": {
                        "type": "string",
                        "enum": ["morning", "afternoon", "evening", "any"],
                    }
                },
                "required": ["date"]
            }
        },
        {
            "name": "schedule_appointment",
            "description": "Schedule an appointment.",
            "input_schema": {
                "type": "object",
                "properties": {
                    "date": {"type": "string"},
                    "time": {"type": "string"},
                    "service_type": {"type": "string"},
                    "caller_name": {"type": "string"},
                    "notes": {"type": "string"}
                },
                "required": ["date", "time", "caller_name"]
            }
        },
        {
            "name": "send_sms",
            "description": "Send SMS to caller with info.",
            "input_schema": {
                "type": "object",
                "properties": {
                    "message": {"type": "string"}
                },
                "required": ["message"]
            }
        },
        {
            "name": "end_call",
            "description": "End call politely.",
            "input_schema": {
                "type": "object",
                "properties": {
                    "farewell_message": {"type": "string"}
                }
            }
        }
    ]

Tool Executor

"""
Tool execution handler.

File: services/agent-service/tools/executor.py
"""
import logging
from typing import Dict, Any, Optional
from dataclasses import dataclass

logger = logging.getLogger(__name__)


@dataclass
class ToolResult:
    """Result from executing a tool."""
    success: bool
    result: Any
    error: Optional[str] = None


class ToolExecutor:
    """Executes tools requested by Claude."""
    
    def __init__(self, call_id: str, tenant_id: str, webhook_client):
        self.call_id = call_id
        self.tenant_id = tenant_id
        self.webhook_client = webhook_client
    
    async def execute(self, tool_name: str, tool_input: Dict[str, Any]) -> ToolResult:
        """Execute a tool by name."""
        methods = {
            "transfer_to_human": self._transfer_to_human,
            "check_availability": self._check_availability,
            "schedule_appointment": self._schedule_appointment,
            "send_sms": self._send_sms,
            "end_call": self._end_call,
        }
        
        method = methods.get(tool_name)
        if not method:
            return ToolResult(False, None, f"Unknown tool: {tool_name}")
        
        try:
            result = await method(tool_input)
            return ToolResult(True, result)
        except Exception as e:
            logger.error(f"Tool error: {tool_name} - {e}")
            return ToolResult(False, None, str(e))
    
    async def _transfer_to_human(self, params: Dict) -> Dict:
        """Transfer to human agent."""
        await self.webhook_client.trigger(
            event="transfer_requested",
            data={
                "call_id": self.call_id,
                "reason": params.get("reason"),
                "department": params.get("department", "general"),
            }
        )
        return {"status": "transfer_initiated"}
    
    async def _check_availability(self, params: Dict) -> Dict:
        """Check calendar availability."""
        return await self.webhook_client.call(
            endpoint="check_availability",
            data={"tenant_id": self.tenant_id, **params}
        )
    
    async def _schedule_appointment(self, params: Dict) -> Dict:
        """Schedule appointment."""
        return await self.webhook_client.call(
            endpoint="schedule_appointment",
            data={"tenant_id": self.tenant_id, **params}
        )
    
    async def _send_sms(self, params: Dict) -> Dict:
        """Send SMS."""
        return await self.webhook_client.call(
            endpoint="send_sms",
            data={"call_id": self.call_id, **params}
        )
    
    async def _end_call(self, params: Dict) -> Dict:
        """End call."""
        await self.webhook_client.trigger(
            event="call_end_requested",
            data={
                "call_id": self.call_id,
                "farewell": params.get("farewell_message", "Goodbye!"),
            }
        )
        return {"status": "ending"}

44.8 Sentence Accumulator

We send text to TTS sentence-by-sentence for lowest latency:
"""
Sentence accumulator for LLM to TTS streaming.

File: services/agent-service/pipeline/sentence_accumulator.py
"""
import re
from typing import Optional, Callable, Awaitable
from dataclasses import dataclass


@dataclass
class SentenceAccumulator:
    """
    Accumulates LLM tokens into complete sentences for TTS.
    
    As tokens stream in, this class:
    1. Buffers until a sentence boundary
    2. Calls callback with each complete sentence
    3. Handles abbreviations (Mr., Dr., etc.)
    """
    
    on_sentence: Optional[Callable[[str], Awaitable[None]]] = None
    min_sentence_length: int = 10
    _buffer: str = ""
    
    _abbreviations = (
        "Mr.", "Mrs.", "Ms.", "Dr.", "Jr.", "Sr.", "vs.", "etc.",
        "i.e.", "e.g.", "St.", "Ave.", "Inc.", "Corp.",
    )
    
    async def add_token(self, token: str) -> Optional[str]:
        """Add a token, return sentence if complete."""
        self._buffer += token
        
        sentence = self._extract_sentence()
        if sentence and self.on_sentence:
            await self.on_sentence(sentence)
        
        return sentence
    
    async def flush(self) -> Optional[str]:
        """Flush remaining text as final sentence."""
        if self._buffer.strip():
            sentence = self._buffer.strip()
            self._buffer = ""
            
            if self.on_sentence:
                await self.on_sentence(sentence)
            
            return sentence
        return None
    
    def _extract_sentence(self) -> Optional[str]:
        """Extract complete sentence from buffer."""
        for i, char in enumerate(self._buffer):
            if char in '.!?':
                potential = self._buffer[:i+1]
                
                if len(potential.strip()) < self.min_sentence_length:
                    continue
                
                if self._is_abbreviation(potential):
                    continue
                
                # Check next char isn't lowercase
                if i + 1 < len(self._buffer):
                    next_char = self._buffer[i + 1]
                    if next_char.isalpha() and next_char.islower():
                        continue
                
                sentence = potential.strip()
                self._buffer = self._buffer[i+1:].lstrip()
                return sentence
        
        return None
    
    def _is_abbreviation(self, text: str) -> bool:
        """Check if text ends with abbreviation."""
        for abbr in self._abbreviations:
            if text.rstrip().endswith(abbr):
                return True
        return False
    
    def clear(self) -> None:
        """Clear buffer."""
        self._buffer = ""

44.9 Token Tracking

"""
Token usage tracking.

File: services/agent-service/pipeline/token_tracking.py
"""
from dataclasses import dataclass


@dataclass
class TokenUsage:
    """Track token usage for billing."""
    
    call_id: str
    tenant_id: str
    input_tokens: int = 0
    output_tokens: int = 0
    
    # Cost per million tokens (Sonnet)
    input_cost_per_million: float = 3.0
    output_cost_per_million: float = 15.0
    
    def add_usage(self, input_tokens: int, output_tokens: int) -> None:
        """Add usage from a request."""
        self.input_tokens += input_tokens
        self.output_tokens += output_tokens
    
    @property
    def total_tokens(self) -> int:
        return self.input_tokens + self.output_tokens
    
    @property
    def estimated_cost(self) -> float:
        """Cost in USD."""
        input_cost = (self.input_tokens / 1_000_000) * self.input_cost_per_million
        output_cost = (self.output_tokens / 1_000_000) * self.output_cost_per_million
        return input_cost + output_cost

Summary: What You’ve Learned in Part 7B

Section 43: Voice Activity Detection

  • Silero VAD provides local, low-latency speech detection
  • Key parameters: threshold, min_speech_duration, min_silence_duration
  • Endpointing strategies: fixed timeout, adaptive, semantic

Section 44: Claude LLM Integration

  • Claude Sonnet is primary model for voice conversations
  • Voice-optimized system prompts: concise, conversational, no formatting
  • Streaming responses essential for low latency
  • Sentence accumulation for TTS optimization
  • Function calling enables actions during conversation

What’s Next

In Part 7C, you’ll learn:
  • Chatterbox TTS integration (self-hosted on RunPod)
  • Barge-in handling (interruption detection)
  • Conversation state management with Redis

Document Metadata

FieldValue
Document IDPRD-007B
TitleJunior Developer PRD — Part 7B
Version1.0
StatusComplete

End of Part 7B — Continue to Part 7C

Junior Developer PRD — Part 7C: TTS, Barge-In & State Management

Document Version: 1.0
Last Updated: January 25, 2026
Part: 7C of 10 (Sub-part 3 of 3)
Sections: 45-47

Table of Contents


Section 45: Chatterbox TTS Integration

45.1 What is Chatterbox?

Chatterbox is an open-source TTS that produces remarkably natural speech:
  • Voice Quality: Nearly indistinguishable from human
  • Self-Hosted: No per-minute API costs
  • Open Source: MIT license

TTS Comparison

ProviderTTFBQualityCostNotes
Chatterbox~150msExcellentSelf-hostedPrimary
Cartesia Sonic~40msExcellent$30/1M charsBackup
Deepgram Aura~100msGood$30/1M charsBackup

45.2 RunPod Deployment

Chatterbox requires GPU. Deploy on RunPod:
# infrastructure/runpod/handler.py
import runpod
import torch
import torchaudio
import base64
import io
from chatterbox.tts import ChatterboxTTS

model = ChatterboxTTS.from_pretrained(device="cuda")

def handler(event):
    input_data = event.get("input", {})
    text = input_data.get("text", "")
    voice_id = input_data.get("voice_id", "default")
    
    if not text:
        return {"error": "No text"}
    
    with torch.no_grad():
        audio = model.synthesize(text=text, voice=voice_id)
    
    buffer = io.BytesIO()
    torchaudio.save(buffer, audio, 24000, format="wav")
    buffer.seek(0)
    
    return {
        "audio_base64": base64.b64encode(buffer.read()).decode(),
        "duration_ms": (audio.shape[1] / 24000) * 1000,
        "sample_rate": 24000,
    }

runpod.serverless.start({"handler": handler})

45.3 TTS Configuration

# services/agent-service/config/tts.py
from dataclasses import dataclass
from enum import Enum

class TTSVoice(Enum):
    DEFAULT = "default"
    PROFESSIONAL_FEMALE = "professional_female"
    PROFESSIONAL_MALE = "professional_male"

@dataclass
class ChatterboxConfig:
    api_key: str
    endpoint_url: str
    voice: TTSVoice = TTSVoice.PROFESSIONAL_FEMALE
    speed: float = 1.0
    sample_rate: int = 24000
    timeout_seconds: float = 10.0

45.4 TTS Client

# services/agent-service/integrations/chatterbox_tts.py
import aiohttp
import base64
from typing import AsyncIterator
from dataclasses import dataclass

@dataclass
class TTSAudioChunk:
    audio_data: bytes
    sample_rate: int
    duration_ms: float
    is_final: bool
    text: str

class ChatterboxTTSClient:
    def __init__(self, config: ChatterboxConfig):
        self.config = config
        self._session = None
    
    async def synthesize(self, text: str) -> TTSAudioChunk:
        if not self._session:
            self._session = aiohttp.ClientSession()
        
        payload = {
            "input": {
                "text": text,
                "voice_id": self.config.voice.value,
                "speed": self.config.speed,
            }
        }
        headers = {"Authorization": f"Bearer {self.config.api_key}"}
        
        async with self._session.post(
            self.config.endpoint_url, json=payload, headers=headers
        ) as resp:
            result = await resp.json()
            audio = base64.b64decode(result["output"]["audio_base64"])
            
            return TTSAudioChunk(
                audio_data=audio,
                sample_rate=self.config.sample_rate,
                duration_ms=result["output"]["duration_ms"],
                is_final=True,
                text=text,
            )
    
    async def synthesize_streaming(self, text: str) -> AsyncIterator[TTSAudioChunk]:
        """Split into sentences and synthesize each."""
        import re
        sentences = re.split(r'(?<=[.!?])\s+', text)
        
        for i, sentence in enumerate(sentences):
            if sentence.strip():
                chunk = await self.synthesize(sentence)
                chunk.is_final = (i == len(sentences) - 1)
                yield chunk

45.5 TTS Fallback

# services/agent-service/integrations/tts_fallback.py
class TTSWithFallback:
    """Fallback: Chatterbox → Cartesia → Deepgram"""
    
    def __init__(self, chatterbox, cartesia=None, deepgram=None):
        self._chatterbox = chatterbox
        self._cartesia = cartesia
        self._deepgram = deepgram
    
    async def synthesize_streaming(self, text: str):
        # Try Chatterbox
        try:
            async for chunk in self._chatterbox.synthesize_streaming(text):
                yield chunk
            return
        except Exception:
            pass
        
        # Try Cartesia
        if self._cartesia:
            try:
                async for chunk in self._cartesia.synthesize_streaming(text):
                    yield chunk
                return
            except Exception:
                pass
        
        # Try Deepgram
        if self._deepgram:
            async for chunk in self._deepgram.synthesize_streaming(text):
                yield chunk

Section 46: Barge-In Handling

46.1 What is Barge-In?

Barge-in = caller interrupts AI while it’s speaking. Natural in conversation.
ScenarioExampleResponse
CorrectionAI: “Tuesday—” Caller: “No, Wednesday”Stop, process
AgreementAI: “Would you—” Caller: “Yes”Stop, continue
FrustrationAI: long explanation Caller: “Transfer me”Stop, transfer

46.2 Barge-In Detection

# services/agent-service/pipeline/barge_in.py
from dataclasses import dataclass

@dataclass
class BargeInConfig:
    enabled: bool = True
    min_speech_duration_ms: int = 150
    vad_threshold: float = 0.6
    cooldown_ms: int = 500

class BargeInDetector:
    def __init__(self, config, vad, on_barge_in=None):
        self.config = config
        self.vad = vad
        self.on_barge_in = on_barge_in
        self._monitoring = False
        self._speech_start = None
        self._last_barge_in = 0
        self._time_ms = 0
    
    def start_monitoring(self):
        self._monitoring = True
        self._speech_start = None
    
    def stop_monitoring(self):
        self._monitoring = False
    
    async def process_frame(self, audio) -> bool:
        if not self._monitoring:
            return False
        
        if self._time_ms - self._last_barge_in < self.config.cooldown_ms:
            return False
        
        self._time_ms += len(audio) / 16  # 16kHz
        
        self.vad.config.threshold = self.config.vad_threshold
        self.vad.process_frame(audio)
        
        if self.vad.is_speaking:
            if self._speech_start is None:
                self._speech_start = self._time_ms
            
            duration = self._time_ms - self._speech_start
            if duration >= self.config.min_speech_duration_ms:
                self._last_barge_in = self._time_ms
                self._speech_start = None
                
                if self.on_barge_in:
                    await self.on_barge_in()
                return True
        else:
            self._speech_start = None
        
        return False

46.3 Barge-In Handler

# services/agent-service/pipeline/barge_in_handler.py
import asyncio

class BargeInHandler:
    def __init__(self, audio_player, state_machine):
        self.audio_player = audio_player
        self.state_machine = state_machine
        self._llm_task = None
    
    async def handle_barge_in(self):
        # 1. Stop TTS
        await self.audio_player.stop_immediately()
        
        # 2. Clear queue
        self.audio_player.clear_queue()
        
        # 3. Cancel LLM
        if self._llm_task:
            self._llm_task.cancel()
        
        # 4. Switch to listening
        await self.state_machine.transition_to("LISTENING")

class AudioPlayer:
    def __init__(self):
        self._queue = asyncio.Queue()
        self._stop_event = asyncio.Event()
    
    async def play(self, chunk):
        await self._queue.put(chunk)
    
    async def stop_immediately(self):
        self._stop_event.set()
    
    def clear_queue(self):
        while not self._queue.empty():
            self._queue.get_nowait()
        self._stop_event.clear()

Section 47: Conversation State Management

47.1 Why State Management?

Track: pipeline state, conversation history, call context, tool results.

47.2 Redis Data Model

call:{call_id}:state      → Pipeline state (Hash)
call:{call_id}:history    → Conversation turns (List)
call:{call_id}:context    → Call metadata (Hash)
tenant:{tenant_id}:calls  → Active calls (Set)

47.3 State Models

# services/agent-service/state/models.py
from dataclasses import dataclass, field
from datetime import datetime
from enum import Enum

class PipelineState(Enum):
    IDLE = "idle"
    LISTENING = "listening"
    CAPTURING = "capturing"
    PROCESSING = "processing"
    SPEAKING = "speaking"
    ENDED = "ended"

@dataclass
class CallContext:
    call_id: str
    tenant_id: str
    caller_phone: str
    started_at: datetime = field(default_factory=datetime.utcnow)

@dataclass
class CallState:
    call_id: str
    pipeline_state: PipelineState = PipelineState.IDLE
    turn_count: int = 0
    barge_in_count: int = 0

@dataclass
class ConversationTurn:
    role: str  # "user" or "assistant"
    content: str
    timestamp: datetime = field(default_factory=datetime.utcnow)

47.4 Redis State Manager

# services/agent-service/state/manager.py
import json
import redis.asyncio as redis

class CallStateManager:
    def __init__(self, redis_url: str):
        self.redis_url = redis_url
        self._redis = None
    
    async def connect(self):
        self._redis = await redis.from_url(self.redis_url)
    
    async def create_call(self, context: CallContext):
        key = f"call:{context.call_id}:context"
        await self._redis.hset(key, mapping={
            "call_id": context.call_id,
            "tenant_id": context.tenant_id,
            "caller_phone": context.caller_phone,
            "started_at": context.started_at.isoformat(),
        })
        
        state_key = f"call:{context.call_id}:state"
        await self._redis.hset(state_key, mapping={
            "pipeline_state": "idle",
            "turn_count": 0,
        })
        
        tenant_key = f"tenant:{context.tenant_id}:calls"
        await self._redis.sadd(tenant_key, context.call_id)
    
    async def get_state(self, call_id: str) -> CallState:
        key = f"call:{call_id}:state"
        data = await self._redis.hgetall(key)
        return CallState(
            call_id=call_id,
            pipeline_state=PipelineState(data["pipeline_state"]),
            turn_count=int(data.get("turn_count", 0)),
        )
    
    async def transition_pipeline(self, call_id: str, new_state: PipelineState):
        key = f"call:{call_id}:state"
        await self._redis.hset(key, "pipeline_state", new_state.value)
    
    async def add_turn(self, call_id: str, turn: ConversationTurn):
        key = f"call:{call_id}:history"
        await self._redis.rpush(key, json.dumps({
            "role": turn.role,
            "content": turn.content,
            "timestamp": turn.timestamp.isoformat(),
        }))
        
        state_key = f"call:{call_id}:state"
        await self._redis.hincrby(state_key, "turn_count", 1)
    
    async def get_history(self, call_id: str, limit: int = 20):
        key = f"call:{call_id}:history"
        data = await self._redis.lrange(key, -limit, -1)
        return [json.loads(item) for item in data]
    
    async def end_call(self, call_id: str):
        await self.transition_pipeline(call_id, PipelineState.ENDED)
        
        # Set 24h TTL
        for pattern in ["state", "context", "history"]:
            await self._redis.expire(f"call:{call_id}:{pattern}", 86400)

47.5 State Machine

# services/agent-service/state/state_machine.py
class PipelineStateMachine:
    TRANSITIONS = {
        PipelineState.IDLE: {PipelineState.LISTENING, PipelineState.ENDED},
        PipelineState.LISTENING: {PipelineState.CAPTURING, PipelineState.ENDED},
        PipelineState.CAPTURING: {PipelineState.PROCESSING, PipelineState.LISTENING},
        PipelineState.PROCESSING: {PipelineState.SPEAKING, PipelineState.LISTENING},
        PipelineState.SPEAKING: {PipelineState.LISTENING, PipelineState.CAPTURING},  # Barge-in
        PipelineState.ENDED: set(),
    }
    
    def __init__(self, call_id: str, state_manager: CallStateManager):
        self.call_id = call_id
        self.state_manager = state_manager
        self._current = PipelineState.IDLE
    
    def can_transition(self, new_state: PipelineState) -> bool:
        return new_state in self.TRANSITIONS.get(self._current, set())
    
    async def transition_to(self, new_state: PipelineState) -> bool:
        if not self.can_transition(new_state):
            return False
        
        self._current = new_state
        await self.state_manager.transition_pipeline(self.call_id, new_state)
        return True

Part 7 Summary

Sub-PartContent
7APipeline architecture, latency budget, Deepgram STT
7BSilero VAD, Claude LLM, function calling
7CChatterbox TTS, barge-in, Redis state
Key Metrics:
  • Target latency: &lt;1000ms mouth-to-ear
  • STT: Deepgram Nova-2 (~200ms)
  • LLM: Claude Sonnet (~300ms TTFB)
  • TTS: Chatterbox (~150ms TTFB)

What’s Next

Part 8: Knowledge Base & RAG covers:
  • Document processing and chunking
  • Vector embeddings with pgvector
  • Retrieval-Augmented Generation
  • Context injection

End of Part 7C

Junior Developer PRD — Part 8A: Document Processing & Chunking

Document Version: 1.0
Last Updated: January 25, 2026
Part: 8A of 10 (Sub-part 1 of 3)
Sections: 48-49
Audience: Junior developers with no prior context
Estimated Reading Time: 20 minutes

How to Use This Document

This is Part 8A—the first of three sub-parts covering Knowledge Base & RAG:
  • Part 8A (this document): Document Processing & Chunking
  • Part 8B: Vector Embeddings & pgvector
  • Part 8C: RAG Pipeline & Context Injection
Prerequisites: Parts 1-7 of the PRD series.

Table of Contents


Section 48: Knowledge Base Overview

48.1 What is a Knowledge Base?

A knowledge base is a collection of information that the AI can reference when answering questions. Without it, the AI only knows:
  1. What’s in its training data (general knowledge)
  2. What’s in the current conversation
With a knowledge base, the AI can answer questions about:
  • Business-specific information (hours, services, policies)
  • Product details and pricing
  • FAQs and common procedures
  • Historical data and records

Real-World Example

Without Knowledge Base:
Caller: "What are your Saturday hours?"
AI: "I don't have specific information about business hours. 
     Would you like me to transfer you to someone who can help?"
With Knowledge Base:
Caller: "What are your Saturday hours?"
AI: "We're open Saturday from 9 AM to 2 PM. Would you like 
     to schedule an appointment?"

48.2 RAG: Retrieval-Augmented Generation

RAG is the technique that connects the knowledge base to the AI:
┌─────────────────────────────────────────────────────────────────┐
│                         RAG OVERVIEW                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   User Question                                                 │
│        │                                                        │
│        ▼                                                        │
│   ┌─────────────┐                                               │
│   │  RETRIEVAL  │  ← Find relevant documents                    │
│   └──────┬──────┘                                               │
│          │                                                      │
│          ▼                                                      │
│   ┌─────────────┐                                               │
│   │ AUGMENTATION│  ← Add documents to prompt                    │
│   └──────┬──────┘                                               │
│          │                                                      │
│          ▼                                                      │
│   ┌─────────────┐                                               │
│   │ GENERATION  │  ← AI generates answer using context          │
│   └──────┬──────┘                                               │
│          │                                                      │
│          ▼                                                      │
│   Answer with accurate, specific information                    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Why RAG Instead of Fine-Tuning?

ApproachProsCons
RAGEasy to update, no training needed, cites sourcesRetrieval can fail, adds latency
Fine-TuningFast inference, deeply learnedExpensive, hard to update, can hallucinate
We use RAG because:
  1. Tenants can update their knowledge anytime
  2. No GPU training required
  3. AI can cite specific sources
  4. Multi-tenant isolation is straightforward

48.3 Knowledge Base Architecture

┌─────────────────────────────────────────────────────────────────┐
│                  KNOWLEDGE BASE ARCHITECTURE                    │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   INGESTION PIPELINE                                            │
│   ─────────────────                                             │
│                                                                 │
│   Documents    ┌─────────────┐    ┌─────────────┐               │
│   (PDF, DOCX,  │   PARSER    │    │  CHUNKER    │               │
│    TXT, HTML)  │             │───▶│             │               │
│       │        │ Extract text│    │ Split into  │               │
│       │        └─────────────┘    │ segments    │               │
│       │                           └──────┬──────┘               │
│       │                                  │                      │
│       │                                  ▼                      │
│       │                          ┌─────────────┐                │
│       │                          │  EMBEDDER   │                │
│       │                          │             │                │
│       │                          │ Convert to  │                │
│       │                          │ vectors     │                │
│       │                          └──────┬──────┘                │
│       │                                 │                       │
│       ▼                                 ▼                       │
│   ┌─────────────────────────────────────────────────────┐       │
│   │                    POSTGRESQL                        │       │
│   │  ┌─────────────┐    ┌─────────────┐                 │       │
│   │  │  documents  │    │   chunks    │                 │       │
│   │  │  (metadata) │    │ (text +     │                 │       │
│   │  │             │    │  embeddings)│                 │       │
│   │  └─────────────┘    └─────────────┘                 │       │
│   │                           │                          │       │
│   │                     pgvector                         │       │
│   │                     extension                        │       │
│   └─────────────────────────────────────────────────────┘       │
│                                                                 │
│   RETRIEVAL PIPELINE                                            │
│   ──────────────────                                            │
│                                                                 │
│   User Query    ┌─────────────┐    ┌─────────────┐              │
│       │         │  EMBEDDER   │    │  VECTOR     │              │
│       │────────▶│             │───▶│  SEARCH     │              │
│                 │ Query to    │    │             │              │
│                 │ vector      │    │ Find similar│              │
│                 └─────────────┘    │ chunks      │              │
│                                    └──────┬──────┘              │
│                                           │                     │
│                                           ▼                     │
│                                    ┌─────────────┐              │
│                                    │  RERANKER   │              │
│                                    │             │              │
│                                    │ Score &     │              │
│                                    │ filter      │              │
│                                    └──────┬──────┘              │
│                                           │                     │
│                                           ▼                     │
│                                    Relevant chunks              │
│                                    for LLM context              │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

48.4 Database Schema

-- Knowledge base tables

-- Document metadata
CREATE TABLE kb_documents (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    knowledge_base_id UUID NOT NULL REFERENCES knowledge_bases(id),
    
    -- File info
    filename VARCHAR(255) NOT NULL,
    file_type VARCHAR(50) NOT NULL,  -- pdf, docx, txt, html, md
    file_size_bytes INTEGER,
    file_hash VARCHAR(64),  -- SHA-256 for deduplication
    
    -- Processing status
    status VARCHAR(50) DEFAULT 'pending',  -- pending, processing, ready, error
    error_message TEXT,
    
    -- Metadata
    title VARCHAR(500),
    description TEXT,
    source_url VARCHAR(2000),
    
    -- Timestamps
    uploaded_at TIMESTAMPTZ DEFAULT NOW(),
    processed_at TIMESTAMPTZ,
    
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

-- Document chunks with embeddings
CREATE TABLE kb_chunks (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    document_id UUID NOT NULL REFERENCES kb_documents(id) ON DELETE CASCADE,
    tenant_id UUID NOT NULL,  -- Denormalized for fast filtering
    
    -- Content
    content TEXT NOT NULL,
    content_hash VARCHAR(64),  -- For deduplication
    
    -- Position in document
    chunk_index INTEGER NOT NULL,
    start_char INTEGER,
    end_char INTEGER,
    
    -- Metadata
    section_title VARCHAR(500),
    page_number INTEGER,
    
    -- Vector embedding (1536 dimensions for OpenAI ada-002)
    embedding vector(1536),
    
    -- Token count for context budgeting
    token_count INTEGER,
    
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Indexes for fast retrieval
CREATE INDEX idx_chunks_tenant ON kb_chunks(tenant_id);
CREATE INDEX idx_chunks_document ON kb_chunks(document_id);

-- Vector similarity index (IVFFlat for approximate nearest neighbor)
CREATE INDEX idx_chunks_embedding ON kb_chunks 
    USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);

-- Knowledge base configuration
CREATE TABLE knowledge_bases (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    
    name VARCHAR(255) NOT NULL,
    description TEXT,
    
    -- Configuration
    chunk_size INTEGER DEFAULT 512,
    chunk_overlap INTEGER DEFAULT 50,
    embedding_model VARCHAR(100) DEFAULT 'text-embedding-3-small',
    
    -- Stats
    document_count INTEGER DEFAULT 0,
    chunk_count INTEGER DEFAULT 0,
    total_tokens INTEGER DEFAULT 0,
    
    is_active BOOLEAN DEFAULT true,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

48.5 Supported Document Types

File TypeExtensionParserNotes
PDF.pdfPyMuPDFText + tables, OCR optional
Word.docxpython-docxPreserves structure
Text.txtNativeDirect read
Markdown.mdmarkdown-itPreserves headers
HTML.htmlBeautifulSoupStrips tags
CSV.csvpandasRow-based chunks

Section 49: Document Processing & Chunking

49.1 The Ingestion Pipeline

When a tenant uploads a document:
┌─────────────────────────────────────────────────────────────────┐
│                    INGESTION PIPELINE                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   1. UPLOAD                                                     │
│      └── Receive file via API                                   │
│      └── Validate file type and size                            │
│      └── Store in S3/MinIO                                      │
│      └── Create kb_documents record (status: pending)           │
│                                                                 │
│   2. PARSE                                                      │
│      └── Download from storage                                  │
│      └── Extract text based on file type                        │
│      └── Extract metadata (title, pages, etc.)                  │
│      └── Update status: processing                              │
│                                                                 │
│   3. CHUNK                                                      │
│      └── Split text into segments                               │
│      └── Preserve context (overlap)                             │
│      └── Track position in original                             │
│                                                                 │
│   4. EMBED                                                      │
│      └── Convert chunks to vectors                              │
│      └── Batch API calls for efficiency                         │
│      └── Store in kb_chunks table                               │
│                                                                 │
│   5. INDEX                                                      │
│      └── Update vector index                                    │
│      └── Update document stats                                  │
│      └── Update status: ready                                   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

49.2 Document Parsers

"""
Document parsers for knowledge base ingestion.

File: services/kb-service/parsers/__init__.py
"""
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import List, Optional
import logging

logger = logging.getLogger(__name__)


@dataclass
class ParsedDocument:
    """Result of parsing a document."""
    text: str
    title: Optional[str] = None
    page_count: Optional[int] = None
    metadata: dict = None
    
    def __post_init__(self):
        if self.metadata is None:
            self.metadata = {}


class DocumentParser(ABC):
    """Base class for document parsers."""
    
    @abstractmethod
    def parse(self, file_path: str) -> ParsedDocument:
        """Parse a document and extract text."""
        pass
    
    @abstractmethod
    def supports(self, file_type: str) -> bool:
        """Check if parser supports this file type."""
        pass


class PDFParser(DocumentParser):
    """Parse PDF documents using PyMuPDF."""
    
    def supports(self, file_type: str) -> bool:
        return file_type.lower() in ('pdf', 'application/pdf')
    
    def parse(self, file_path: str) -> ParsedDocument:
        import fitz  # PyMuPDF
        
        doc = fitz.open(file_path)
        
        text_parts = []
        for page_num, page in enumerate(doc):
            text = page.get_text()
            if text.strip():
                text_parts.append(f"[Page {page_num + 1}]\n{text}")
        
        # Extract title from metadata or first line
        title = doc.metadata.get('title')
        if not title and text_parts:
            first_line = text_parts[0].split('\n')[1] if '\n' in text_parts[0] else None
            if first_line and len(first_line) < 200:
                title = first_line.strip()
        
        return ParsedDocument(
            text='\n\n'.join(text_parts),
            title=title,
            page_count=len(doc),
            metadata={
                'author': doc.metadata.get('author'),
                'created': doc.metadata.get('creationDate'),
            }
        )


class DocxParser(DocumentParser):
    """Parse Word documents."""
    
    def supports(self, file_type: str) -> bool:
        return file_type.lower() in ('docx', 'application/vnd.openxmlformats-officedocument.wordprocessingml.document')
    
    def parse(self, file_path: str) -> ParsedDocument:
        from docx import Document
        
        doc = Document(file_path)
        
        text_parts = []
        title = None
        
        for para in doc.paragraphs:
            text = para.text.strip()
            if not text:
                continue
            
            # Check if this is a heading
            if para.style.name.startswith('Heading'):
                if para.style.name == 'Heading 1' and not title:
                    title = text
                text_parts.append(f"\n## {text}\n")
            else:
                text_parts.append(text)
        
        # Also extract tables
        for table in doc.tables:
            table_text = []
            for row in table.rows:
                row_text = ' | '.join(cell.text.strip() for cell in row.cells)
                table_text.append(row_text)
            text_parts.append('\n'.join(table_text))
        
        return ParsedDocument(
            text='\n\n'.join(text_parts),
            title=title,
            metadata={'paragraph_count': len(doc.paragraphs)}
        )


class TextParser(DocumentParser):
    """Parse plain text and markdown."""
    
    def supports(self, file_type: str) -> bool:
        return file_type.lower() in ('txt', 'md', 'text/plain', 'text/markdown')
    
    def parse(self, file_path: str) -> ParsedDocument:
        with open(file_path, 'r', encoding='utf-8') as f:
            text = f.read()
        
        # Try to extract title from first line
        lines = text.split('\n')
        title = None
        if lines:
            first_line = lines[0].strip()
            # Check for markdown heading
            if first_line.startswith('# '):
                title = first_line[2:].strip()
            elif len(first_line) < 200 and not first_line.startswith(('*', '-', '1.')):
                title = first_line
        
        return ParsedDocument(
            text=text,
            title=title,
        )


class HTMLParser(DocumentParser):
    """Parse HTML documents."""
    
    def supports(self, file_type: str) -> bool:
        return file_type.lower() in ('html', 'htm', 'text/html')
    
    def parse(self, file_path: str) -> ParsedDocument:
        from bs4 import BeautifulSoup
        
        with open(file_path, 'r', encoding='utf-8') as f:
            html = f.read()
        
        soup = BeautifulSoup(html, 'html.parser')
        
        # Remove script and style elements
        for element in soup(['script', 'style', 'nav', 'footer', 'header']):
            element.decompose()
        
        # Extract title
        title = None
        title_tag = soup.find('title')
        if title_tag:
            title = title_tag.get_text().strip()
        
        # Get text content
        text = soup.get_text(separator='\n')
        
        # Clean up whitespace
        lines = [line.strip() for line in text.split('\n')]
        text = '\n'.join(line for line in lines if line)
        
        return ParsedDocument(
            text=text,
            title=title,
        )


class ParserFactory:
    """Factory for getting the right parser."""
    
    _parsers = [
        PDFParser(),
        DocxParser(),
        TextParser(),
        HTMLParser(),
    ]
    
    @classmethod
    def get_parser(cls, file_type: str) -> DocumentParser:
        """Get parser for file type."""
        for parser in cls._parsers:
            if parser.supports(file_type):
                return parser
        raise ValueError(f"Unsupported file type: {file_type}")
    
    @classmethod
    def parse(cls, file_path: str, file_type: str) -> ParsedDocument:
        """Parse a document."""
        parser = cls.get_parser(file_type)
        return parser.parse(file_path)

49.3 Chunking Strategies

Chunking is how we split documents into smaller pieces for embedding and retrieval.

Why Chunk?

  1. Embedding models have limits: Most handle ~8000 tokens max
  2. Precise retrieval: Smaller chunks = more specific matches
  3. Context efficiency: Don’t waste LLM context on irrelevant text
  4. Cost: Embedding fewer tokens is cheaper

Chunking Strategies

┌─────────────────────────────────────────────────────────────────┐
│                   CHUNKING STRATEGIES                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   1. FIXED SIZE                                                 │
│      ────────────                                               │
│      Split every N characters/tokens                            │
│                                                                 │
│      "The quick brown fox jumps over the lazy dog..."           │
│       │──── chunk 1 ────│──── chunk 2 ────│                     │
│                                                                 │
│      ✓ Simple, predictable                                      │
│      ✗ Splits mid-sentence, loses context                       │
│                                                                 │
│   2. SENTENCE-BASED                                             │
│      ───────────────                                            │
│      Split on sentence boundaries                               │
│                                                                 │
│      "The fox jumps. The dog sleeps. The cat watches."          │
│       │─ chunk 1 ─│─ chunk 2 ─│─ chunk 3 ─│                     │
│                                                                 │
│      ✓ Complete thoughts                                        │
│      ✗ Uneven sizes, may be too small                           │
│                                                                 │
│   3. PARAGRAPH/SECTION-BASED                                    │
│      ───────────────────────                                    │
│      Split on structural boundaries                             │
│                                                                 │
│      "# Hours                    # Services                     │
│       We are open 9-5.           We offer X, Y, Z."             │
│       │──── chunk 1 ────│        │──── chunk 2 ────│            │
│                                                                 │
│      ✓ Preserves document structure                             │
│      ✗ Sections may be too large                                │
│                                                                 │
│   4. SEMANTIC (OUR APPROACH)                                    │
│      ────────────────────────                                   │
│      Combine strategies:                                        │
│      - Respect sentence boundaries                              │
│      - Target size with tolerance                               │
│      - Overlap for context                                      │
│                                                                 │
│      ✓ Best of all worlds                                       │
│      ✓ Configurable per use case                                │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Overlap: Why It Matters

Overlap ensures context isn’t lost at chunk boundaries:
WITHOUT OVERLAP:
"...our Saturday hours are | 9 AM to 2 PM. We are closed..."
        chunk 1 ends ─────┘ └───── chunk 2 starts

Query: "Saturday hours" → chunk 1 matches, but answer is in chunk 2!

WITH OVERLAP (50 tokens):
"...our Saturday hours are 9 AM to 2 PM. We are closed..."
        chunk 1 ──────────────────────────┘
                     └────────────────────── chunk 2 (overlaps)

Query: "Saturday hours" → Both chunks contain the answer!

49.4 Chunking Implementation

"""
Text chunking for knowledge base.

File: services/kb-service/chunking/chunker.py
"""
from dataclasses import dataclass, field
from typing import List, Optional, Iterator
import re
import tiktoken


@dataclass
class Chunk:
    """A chunk of text from a document."""
    content: str
    index: int
    start_char: int
    end_char: int
    token_count: int
    section_title: Optional[str] = None
    page_number: Optional[int] = None
    
    def __str__(self):
        preview = self.content[:50] + "..." if len(self.content) > 50 else self.content
        return f"Chunk {self.index}: {preview}"


@dataclass
class ChunkerConfig:
    """Configuration for text chunking."""
    
    # Target chunk size in tokens
    chunk_size: int = 512
    
    # Overlap between chunks in tokens
    chunk_overlap: int = 50
    
    # Minimum chunk size (don't create tiny chunks)
    min_chunk_size: int = 100
    
    # Maximum chunk size (hard limit)
    max_chunk_size: int = 1000
    
    # Tokenizer model (for counting)
    tokenizer_model: str = "cl100k_base"  # GPT-4/Claude tokenizer
    
    # Separators in order of preference
    separators: List[str] = field(default_factory=lambda: [
        "\n\n",      # Paragraph
        "\n",        # Line
        ". ",        # Sentence
        "? ",        # Question
        "! ",        # Exclamation
        "; ",        # Semicolon
        ", ",        # Comma
        " ",         # Word
    ])


class SemanticChunker:
    """
    Semantic text chunker that respects boundaries.
    
    Example:
        chunker = SemanticChunker(config)
        chunks = chunker.chunk(document_text)
        
        for chunk in chunks:
            print(f"Chunk {chunk.index}: {chunk.token_count} tokens")
    """
    
    def __init__(self, config: ChunkerConfig = None):
        self.config = config or ChunkerConfig()
        self._tokenizer = tiktoken.get_encoding(self.config.tokenizer_model)
    
    def chunk(self, text: str, metadata: dict = None) -> List[Chunk]:
        """
        Split text into chunks.
        
        Args:
            text: The text to chunk
            metadata: Optional metadata (page numbers, sections)
        
        Returns:
            List of Chunk objects
        """
        if not text.strip():
            return []
        
        # Clean text
        text = self._clean_text(text)
        
        # Extract sections if present
        sections = self._extract_sections(text)
        
        chunks = []
        chunk_index = 0
        
        for section_title, section_text, section_start in sections:
            section_chunks = self._chunk_section(
                section_text,
                start_offset=section_start,
                section_title=section_title,
                start_index=chunk_index,
            )
            chunks.extend(section_chunks)
            chunk_index += len(section_chunks)
        
        return chunks
    
    def _clean_text(self, text: str) -> str:
        """Clean and normalize text."""
        # Normalize whitespace
        text = re.sub(r'\s+', ' ', text)
        # Normalize line breaks
        text = re.sub(r'\n\s*\n', '\n\n', text)
        return text.strip()
    
    def _extract_sections(self, text: str) -> List[tuple]:
        """
        Extract sections from text.
        
        Returns list of (title, content, start_position) tuples.
        """
        # Look for markdown-style headers
        header_pattern = r'^(#{1,3})\s+(.+)$'
        
        sections = []
        current_pos = 0
        
        for match in re.finditer(header_pattern, text, re.MULTILINE):
            # Add text before this header as a section
            if match.start() > current_pos:
                prior_text = text[current_pos:match.start()].strip()
                if prior_text:
                    sections.append((None, prior_text, current_pos))
            
            current_pos = match.start()
        
        # Add remaining text
        if current_pos < len(text):
            remaining = text[current_pos:].strip()
            if remaining:
                # Check if it starts with a header
                header_match = re.match(header_pattern, remaining, re.MULTILINE)
                if header_match:
                    title = header_match.group(2)
                    content = remaining[header_match.end():].strip()
                    sections.append((title, content, current_pos))
                else:
                    sections.append((None, remaining, current_pos))
        
        # If no sections found, return entire text
        if not sections:
            sections = [(None, text, 0)]
        
        return sections
    
    def _chunk_section(
        self,
        text: str,
        start_offset: int = 0,
        section_title: str = None,
        start_index: int = 0,
    ) -> List[Chunk]:
        """Chunk a single section of text."""
        chunks = []
        
        # Split by separators
        segments = self._split_by_separators(text)
        
        current_chunk_text = ""
        current_chunk_start = start_offset
        chunk_index = start_index
        
        for segment in segments:
            segment_tokens = self._count_tokens(segment)
            current_tokens = self._count_tokens(current_chunk_text)
            
            # Check if adding this segment exceeds target
            if current_tokens + segment_tokens > self.config.chunk_size:
                # Save current chunk if it meets minimum
                if current_tokens >= self.config.min_chunk_size:
                    chunks.append(Chunk(
                        content=current_chunk_text.strip(),
                        index=chunk_index,
                        start_char=current_chunk_start,
                        end_char=current_chunk_start + len(current_chunk_text),
                        token_count=current_tokens,
                        section_title=section_title,
                    ))
                    chunk_index += 1
                    
                    # Start new chunk with overlap
                    overlap_text = self._get_overlap_text(current_chunk_text)
                    current_chunk_text = overlap_text + segment
                    current_chunk_start = start_offset + len(current_chunk_text) - len(segment) - len(overlap_text)
                else:
                    # Chunk too small, keep adding
                    current_chunk_text += segment
            else:
                current_chunk_text += segment
        
        # Don't forget the last chunk
        if current_chunk_text.strip():
            chunks.append(Chunk(
                content=current_chunk_text.strip(),
                index=chunk_index,
                start_char=current_chunk_start,
                end_char=current_chunk_start + len(current_chunk_text),
                token_count=self._count_tokens(current_chunk_text),
                section_title=section_title,
            ))
        
        return chunks
    
    def _split_by_separators(self, text: str) -> List[str]:
        """Split text by separators, keeping separators."""
        segments = [text]
        
        for separator in self.config.separators:
            new_segments = []
            for segment in segments:
                if separator in segment:
                    parts = segment.split(separator)
                    for i, part in enumerate(parts):
                        if i > 0:
                            new_segments.append(separator + part)
                        elif part:
                            new_segments.append(part)
                else:
                    new_segments.append(segment)
            segments = new_segments
        
        return segments
    
    def _get_overlap_text(self, text: str) -> str:
        """Get overlap text from end of chunk."""
        tokens = self._tokenizer.encode(text)
        overlap_tokens = tokens[-self.config.chunk_overlap:]
        return self._tokenizer.decode(overlap_tokens)
    
    def _count_tokens(self, text: str) -> int:
        """Count tokens in text."""
        return len(self._tokenizer.encode(text))


# Convenience function
def chunk_document(text: str, config: ChunkerConfig = None) -> List[Chunk]:
    """Chunk a document with default settings."""
    chunker = SemanticChunker(config)
    return chunker.chunk(text)

49.5 Chunking Configuration by Document Type

Different document types benefit from different chunking strategies:
Document TypeChunk SizeOverlapNotes
FAQ256 tokens25Small, focused answers
Policy docs512 tokens50Balanced
Technical docs768 tokens100Preserve context
Conversations256 tokens50Turn-based
Product specs512 tokens75Detailed info
"""
Chunking presets for different document types.

File: services/kb-service/chunking/presets.py
"""

CHUNKING_PRESETS = {
    "faq": ChunkerConfig(
        chunk_size=256,
        chunk_overlap=25,
        min_chunk_size=50,
        separators=["\n\n", "\n", "? ", ". "],
    ),
    
    "policy": ChunkerConfig(
        chunk_size=512,
        chunk_overlap=50,
        min_chunk_size=100,
    ),
    
    "technical": ChunkerConfig(
        chunk_size=768,
        chunk_overlap=100,
        min_chunk_size=150,
    ),
    
    "conversation": ChunkerConfig(
        chunk_size=256,
        chunk_overlap=50,
        min_chunk_size=50,
        separators=["\n\n", "\n", ": "],
    ),
}


def get_chunker(document_type: str) -> SemanticChunker:
    """Get chunker with preset for document type."""
    config = CHUNKING_PRESETS.get(document_type, ChunkerConfig())
    return SemanticChunker(config)

49.6 Processing Pipeline

"""
Document processing pipeline.

File: services/kb-service/pipeline/processor.py
"""
import asyncio
import logging
from typing import Optional
from dataclasses import dataclass
import hashlib

logger = logging.getLogger(__name__)


@dataclass
class ProcessingResult:
    """Result of document processing."""
    document_id: str
    chunk_count: int
    total_tokens: int
    success: bool
    error: Optional[str] = None


class DocumentProcessor:
    """
    Processes documents through the ingestion pipeline.
    
    Example:
        processor = DocumentProcessor(
            storage=s3_client,
            db=database,
            embedder=embedding_service,
        )
        
        result = await processor.process(document_id)
    """
    
    def __init__(self, storage, db, embedder):
        self.storage = storage
        self.db = db
        self.embedder = embedder
    
    async def process(self, document_id: str) -> ProcessingResult:
        """Process a document through the pipeline."""
        try:
            # 1. Get document record
            doc = await self.db.get_document(document_id)
            if not doc:
                raise ValueError(f"Document not found: {document_id}")
            
            # 2. Update status
            await self.db.update_document_status(document_id, "processing")
            
            # 3. Download file
            file_path = await self.storage.download(doc.storage_key)
            
            # 4. Parse document
            parsed = ParserFactory.parse(file_path, doc.file_type)
            
            # 5. Get chunker config
            kb = await self.db.get_knowledge_base(doc.knowledge_base_id)
            config = ChunkerConfig(
                chunk_size=kb.chunk_size,
                chunk_overlap=kb.chunk_overlap,
            )
            
            # 6. Chunk document
            chunker = SemanticChunker(config)
            chunks = chunker.chunk(parsed.text)
            
            logger.info(f"Document {document_id}: {len(chunks)} chunks")
            
            # 7. Generate embeddings (batch)
            texts = [chunk.content for chunk in chunks]
            embeddings = await self.embedder.embed_batch(texts)
            
            # 8. Store chunks
            chunk_records = []
            for chunk, embedding in zip(chunks, embeddings):
                chunk_records.append({
                    "document_id": document_id,
                    "tenant_id": doc.tenant_id,
                    "content": chunk.content,
                    "content_hash": hashlib.sha256(chunk.content.encode()).hexdigest(),
                    "chunk_index": chunk.index,
                    "start_char": chunk.start_char,
                    "end_char": chunk.end_char,
                    "section_title": chunk.section_title,
                    "embedding": embedding,
                    "token_count": chunk.token_count,
                })
            
            await self.db.insert_chunks(chunk_records)
            
            # 9. Update document status
            total_tokens = sum(c.token_count for c in chunks)
            await self.db.update_document_status(
                document_id,
                "ready",
                chunk_count=len(chunks),
                total_tokens=total_tokens,
            )
            
            # 10. Update knowledge base stats
            await self.db.update_kb_stats(doc.knowledge_base_id)
            
            return ProcessingResult(
                document_id=document_id,
                chunk_count=len(chunks),
                total_tokens=total_tokens,
                success=True,
            )
        
        except Exception as e:
            logger.error(f"Processing failed for {document_id}: {e}")
            await self.db.update_document_status(
                document_id,
                "error",
                error_message=str(e),
            )
            return ProcessingResult(
                document_id=document_id,
                chunk_count=0,
                total_tokens=0,
                success=False,
                error=str(e),
            )

Summary: What You’ve Learned in Part 8A

Section 48: Knowledge Base Overview

  • Knowledge bases store business-specific information
  • RAG = Retrieval + Augmentation + Generation
  • Architecture: Documents → Chunks → Embeddings → Vector DB

Section 49: Document Processing & Chunking

  • Parsers extract text from PDF, DOCX, TXT, HTML, MD
  • Chunking splits documents into embeddable segments
  • Semantic chunking respects sentence/paragraph boundaries
  • Overlap prevents context loss at chunk boundaries
  • Different document types need different chunk sizes

What’s Next

In Part 8B, you’ll learn:
  • Vector embeddings and embedding models
  • pgvector extension for PostgreSQL
  • Similarity search algorithms
  • Index optimization

Document Metadata

FieldValue
Document IDPRD-008A
TitleJunior Developer PRD — Part 8A
Version1.0
StatusComplete

End of Part 8A — Continue to Part 8B

Junior Developer PRD — Part 8B: Vector Embeddings & pgvector

Document Version: 1.0
Last Updated: January 25, 2026
Part: 8B of 10 (Sub-part 2 of 3)
Sections: 50-51
Audience: Junior developers with no prior context
Estimated Reading Time: 20 minutes

How to Use This Document

This is Part 8B—the second of three sub-parts covering Knowledge Base & RAG:
  • Part 8A: Document Processing & Chunking ✓
  • Part 8B (this document): Vector Embeddings & pgvector
  • Part 8C: RAG Pipeline & Context Injection
Prerequisites: Parts 1-7 and Part 8A.

Table of Contents


Section 50: Vector Embeddings

50.1 What Are Embeddings?

Embeddings are numerical representations of text that capture semantic meaning. They convert words and sentences into vectors (lists of numbers) that computers can compare mathematically.

The Key Insight

Similar meanings → Similar vectors → Close in vector space
┌─────────────────────────────────────────────────────────────────┐
│                    EMBEDDING CONCEPT                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   TEXT                          VECTOR (simplified)             │
│   ────                          ──────────────────              │
│                                                                 │
│   "dog"          ───────▶       [0.2, 0.8, 0.1, ...]           │
│   "puppy"        ───────▶       [0.2, 0.7, 0.1, ...]  ← Similar│
│   "cat"          ───────▶       [0.3, 0.6, 0.2, ...]           │
│   "car"          ───────▶       [0.9, 0.1, 0.4, ...]  ← Different│
│                                                                 │
│   In vector space:                                              │
│                                                                 │
│              puppy •  • dog                                     │
│                    •                                            │
│                   cat                                           │
│                                                                 │
│                                                                 │
│                                                                 │
│                        car •                                    │
│                                                                 │
│   "dog" and "puppy" are close because they're semantically     │
│   related. "car" is far away because it's unrelated.           │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
Traditional keyword search fails when words don’t match exactly:
KEYWORD SEARCH:
Document: "Our business hours are 9 AM to 5 PM"
Query: "When are you open?"
Result: NO MATCH ❌ (no word overlap)

SEMANTIC SEARCH (with embeddings):
Document embedding: [0.2, 0.5, 0.8, ...]  ← "business hours"
Query embedding:    [0.2, 0.4, 0.7, ...]  ← "when are you open"
Similarity: 0.92 (very similar!)
Result: MATCH ✓

50.3 Embedding Models

ModelDimensionsMax TokensCost/1MSpeedQuality
text-embedding-3-small15368191$0.02FastGood
text-embedding-3-large30728191$0.13MediumExcellent
text-embedding-ada-00215368191$0.10FastGood
Cohere embed-v31024512$0.10FastGood
Voyage-210244000$0.10MediumExcellent
Our choice: text-embedding-3-small
  • Best price/performance ratio
  • 1536 dimensions (good balance)
  • Fast inference
  • Works well with pgvector

50.4 Embedding Service Implementation

"""
Embedding service for knowledge base.

File: services/kb-service/embeddings/service.py
"""
import asyncio
import logging
from typing import List, Optional
from dataclasses import dataclass
import openai

logger = logging.getLogger(__name__)


@dataclass
class EmbeddingConfig:
    """Configuration for embedding service."""
    api_key: str
    model: str = "text-embedding-3-small"
    dimensions: int = 1536
    batch_size: int = 100  # Max texts per API call
    max_retries: int = 3
    retry_delay: float = 1.0


class EmbeddingService:
    """
    Service for generating text embeddings.
    
    Example:
        service = EmbeddingService(config)
        
        # Single text
        embedding = await service.embed("Hello world")
        
        # Batch
        embeddings = await service.embed_batch(["Hello", "World"])
    """
    
    def __init__(self, config: EmbeddingConfig):
        self.config = config
        self._client = openai.AsyncOpenAI(api_key=config.api_key)
    
    async def embed(self, text: str) -> List[float]:
        """
        Generate embedding for a single text.
        
        Args:
            text: Text to embed
        
        Returns:
            List of floats (embedding vector)
        """
        embeddings = await self.embed_batch([text])
        return embeddings[0]
    
    async def embed_batch(self, texts: List[str]) -> List[List[float]]:
        """
        Generate embeddings for multiple texts.
        
        Automatically batches requests to stay within API limits.
        
        Args:
            texts: List of texts to embed
        
        Returns:
            List of embedding vectors
        """
        if not texts:
            return []
        
        # Clean texts
        texts = [self._prepare_text(t) for t in texts]
        
        # Process in batches
        all_embeddings = []
        
        for i in range(0, len(texts), self.config.batch_size):
            batch = texts[i:i + self.config.batch_size]
            batch_embeddings = await self._embed_with_retry(batch)
            all_embeddings.extend(batch_embeddings)
        
        return all_embeddings
    
    async def _embed_with_retry(self, texts: List[str]) -> List[List[float]]:
        """Embed with retry logic."""
        last_error = None
        
        for attempt in range(self.config.max_retries):
            try:
                response = await self._client.embeddings.create(
                    model=self.config.model,
                    input=texts,
                    dimensions=self.config.dimensions,
                )
                
                # Sort by index to maintain order
                sorted_data = sorted(response.data, key=lambda x: x.index)
                return [item.embedding for item in sorted_data]
            
            except openai.RateLimitError as e:
                logger.warning(f"Rate limited, waiting... ({attempt + 1})")
                await asyncio.sleep(self.config.retry_delay * (attempt + 1))
                last_error = e
            
            except openai.APIError as e:
                logger.error(f"API error: {e}")
                last_error = e
                await asyncio.sleep(self.config.retry_delay)
        
        raise last_error or Exception("Embedding failed")
    
    def _prepare_text(self, text: str) -> str:
        """Prepare text for embedding."""
        # Truncate if too long (model max is ~8000 tokens)
        if len(text) > 30000:  # Rough char limit
            text = text[:30000]
        
        # Clean whitespace
        text = " ".join(text.split())
        
        return text


# Query embedding with prefix (improves retrieval)
class QueryEmbeddingService(EmbeddingService):
    """
    Embedding service optimized for queries.
    
    Some models work better with a query prefix.
    """
    
    query_prefix: str = "search_query: "
    
    async def embed_query(self, query: str) -> List[float]:
        """Embed a search query."""
        prefixed = self.query_prefix + query
        return await self.embed(prefixed)

50.5 Similarity Metrics

How do we measure if two vectors are similar?

Cosine Similarity (Our Choice)

Measures the angle between vectors. Ignores magnitude, focuses on direction.
cosine_similarity(A, B) = (A · B) / (||A|| × ||B||)

Range: -1 to 1
  1 = identical direction (same meaning)
  0 = perpendicular (unrelated)
 -1 = opposite direction (opposite meaning)

Other Metrics

MetricFormulaBest For
Cosineangle between vectorsSemantic similarity
Euclideanstraight-line distanceDense vectors
Dot Productraw multiplicationNormalized vectors
Manhattansum of differencesSparse vectors

Why Cosine for RAG?

  1. Normalized: Length doesn’t matter (short and long texts comparable)
  2. Intuitive: Higher = more similar
  3. Fast: Optimized in pgvector
  4. Standard: Most embedding models are trained for cosine
"""
Similarity calculation utilities.

File: services/kb-service/embeddings/similarity.py
"""
import numpy as np
from typing import List


def cosine_similarity(a: List[float], b: List[float]) -> float:
    """
    Calculate cosine similarity between two vectors.
    
    Args:
        a: First vector
        b: Second vector
    
    Returns:
        Similarity score between -1 and 1
    """
    a = np.array(a)
    b = np.array(b)
    
    dot_product = np.dot(a, b)
    norm_a = np.linalg.norm(a)
    norm_b = np.linalg.norm(b)
    
    if norm_a == 0 or norm_b == 0:
        return 0.0
    
    return dot_product / (norm_a * norm_b)


def euclidean_distance(a: List[float], b: List[float]) -> float:
    """Calculate Euclidean distance (L2)."""
    a = np.array(a)
    b = np.array(b)
    return np.linalg.norm(a - b)


def find_most_similar(
    query_embedding: List[float],
    embeddings: List[List[float]],
    top_k: int = 5,
) -> List[tuple]:
    """
    Find most similar embeddings to query.
    
    Returns list of (index, similarity) tuples.
    """
    similarities = [
        (i, cosine_similarity(query_embedding, emb))
        for i, emb in enumerate(embeddings)
    ]
    
    # Sort by similarity (descending)
    similarities.sort(key=lambda x: x[1], reverse=True)
    
    return similarities[:top_k]

Section 51: pgvector Integration

51.1 What is pgvector?

pgvector is a PostgreSQL extension that adds vector data types and similarity search. It lets us store embeddings directly in our database and perform efficient similarity queries.

Why pgvector?

OptionProsCons
pgvectorIntegrated with PostgreSQL, ACID, familiarScaling limits
PineconeManaged, scalableSeparate service, cost
WeaviateFeature-richComplex setup
QdrantFast, open sourceAnother database
MilvusHighly scalableOperational overhead
Our choice: pgvector because:
  1. We already use PostgreSQL
  2. No additional infrastructure
  3. Joins with other data
  4. Simpler architecture
  5. Good enough for our scale

51.2 pgvector Setup

Install Extension

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Verify installation
SELECT * FROM pg_extension WHERE extname = 'vector';

Create Vector Column

-- Add vector column to chunks table
ALTER TABLE kb_chunks 
ADD COLUMN embedding vector(1536);

-- Or create table with vector column
CREATE TABLE kb_chunks (
    id UUID PRIMARY KEY,
    document_id UUID NOT NULL,
    tenant_id UUID NOT NULL,
    content TEXT NOT NULL,
    embedding vector(1536),  -- 1536 dimensions
    token_count INTEGER,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

51.3 Vector Indexes

Without an index, similarity search scans all rows (slow). pgvector offers two index types:

IVFFlat Index

Inverted File Flat - clusters vectors, searches relevant clusters.
-- Create IVFFlat index
CREATE INDEX idx_chunks_embedding_ivf ON kb_chunks 
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- lists = number of clusters
-- Rule of thumb: sqrt(row_count) to row_count/1000
Pros: Fast to build, good for &lt; 1M vectors Cons: Approximate (may miss some results)

HNSW Index

Hierarchical Navigable Small World - graph-based.
-- Create HNSW index
CREATE INDEX idx_chunks_embedding_hnsw ON kb_chunks 
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- m = connections per node (higher = more accurate, slower)
-- ef_construction = build-time search width
Pros: More accurate, faster queries Cons: Slower to build, more memory

Index Comparison

IndexBuild TimeQuery TimeAccuracyMemory
None-O(n)100%Low
IVFFlatFast~10ms95%+Medium
HNSWSlow~5ms99%+High
Our choice: IVFFlat for initial deployment, HNSW when accuracy matters more.

51.4 Similarity Search Queries

-- Find 5 most similar chunks to a query embedding
SELECT 
    id,
    content,
    1 - (embedding <=> $1::vector) AS similarity
FROM kb_chunks
WHERE tenant_id = $2
ORDER BY embedding <=> $1::vector
LIMIT 5;

-- <=> is cosine distance (1 - similarity)
-- Lower distance = more similar

Distance Operators

OperatorMetricUsage
<=>Cosine distanceORDER BY embedding <=> query
<->Euclidean (L2)ORDER BY embedding <-> query
<#>Inner productORDER BY embedding <#> query
-- Search within specific knowledge base
SELECT 
    c.id,
    c.content,
    c.section_title,
    d.filename,
    1 - (c.embedding <=> $1::vector) AS similarity
FROM kb_chunks c
JOIN kb_documents d ON c.document_id = d.id
WHERE 
    c.tenant_id = $2
    AND d.knowledge_base_id = $3
    AND d.status = 'ready'
ORDER BY c.embedding <=> $1::vector
LIMIT 10;

Similarity Threshold

-- Only return chunks above similarity threshold
SELECT 
    id,
    content,
    1 - (embedding <=> $1::vector) AS similarity
FROM kb_chunks
WHERE 
    tenant_id = $2
    AND (1 - (embedding <=> $1::vector)) > 0.7  -- 70% similarity threshold
ORDER BY embedding <=> $1::vector
LIMIT 10;

51.5 Vector Repository Implementation

"""
Vector repository for knowledge base.

File: services/kb-service/repositories/vector_repository.py
"""
import logging
from typing import List, Optional
from dataclasses import dataclass
import asyncpg

logger = logging.getLogger(__name__)


@dataclass
class ChunkMatch:
    """A chunk that matched a similarity search."""
    chunk_id: str
    document_id: str
    content: str
    similarity: float
    section_title: Optional[str] = None
    filename: Optional[str] = None
    page_number: Optional[int] = None


@dataclass
class SearchParams:
    """Parameters for similarity search."""
    tenant_id: str
    knowledge_base_id: Optional[str] = None
    top_k: int = 5
    min_similarity: float = 0.5
    max_tokens: Optional[int] = None


class VectorRepository:
    """
    Repository for vector similarity searches.
    
    Example:
        repo = VectorRepository(pool)
        
        matches = await repo.search(
            query_embedding=embedding,
            params=SearchParams(tenant_id="123", top_k=5)
        )
        
        for match in matches:
            print(f"{match.similarity:.2f}: {match.content[:50]}...")
    """
    
    def __init__(self, pool: asyncpg.Pool):
        self.pool = pool
    
    async def search(
        self,
        query_embedding: List[float],
        params: SearchParams,
    ) -> List[ChunkMatch]:
        """
        Search for similar chunks.
        
        Args:
            query_embedding: Query vector
            params: Search parameters
        
        Returns:
            List of matching chunks sorted by similarity
        """
        # Build query
        query = """
            SELECT 
                c.id AS chunk_id,
                c.document_id,
                c.content,
                c.section_title,
                c.token_count,
                d.filename,
                1 - (c.embedding <=> $1::vector) AS similarity
            FROM kb_chunks c
            JOIN kb_documents d ON c.document_id = d.id
            WHERE 
                c.tenant_id = $2
                AND d.status = 'ready'
                AND (1 - (c.embedding <=> $1::vector)) > $3
        """
        
        args = [
            str(query_embedding),  # $1
            params.tenant_id,      # $2
            params.min_similarity, # $3
        ]
        
        # Add knowledge base filter if specified
        if params.knowledge_base_id:
            query += " AND d.knowledge_base_id = $4"
            args.append(params.knowledge_base_id)
        
        # Order and limit
        query += f"""
            ORDER BY c.embedding <=> $1::vector
            LIMIT {params.top_k}
        """
        
        async with self.pool.acquire() as conn:
            rows = await conn.fetch(query, *args)
        
        # Convert to ChunkMatch objects
        matches = []
        total_tokens = 0
        
        for row in rows:
            # Check token budget
            if params.max_tokens:
                if total_tokens + row['token_count'] > params.max_tokens:
                    break
                total_tokens += row['token_count']
            
            matches.append(ChunkMatch(
                chunk_id=str(row['chunk_id']),
                document_id=str(row['document_id']),
                content=row['content'],
                similarity=float(row['similarity']),
                section_title=row['section_title'],
                filename=row['filename'],
            ))
        
        return matches
    
    async def search_hybrid(
        self,
        query_embedding: List[float],
        query_text: str,
        params: SearchParams,
    ) -> List[ChunkMatch]:
        """
        Hybrid search combining vector and keyword.
        
        Uses RRF (Reciprocal Rank Fusion) to combine results.
        """
        # Vector search
        vector_query = """
            SELECT 
                c.id AS chunk_id,
                c.document_id,
                c.content,
                c.section_title,
                d.filename,
                1 - (c.embedding <=> $1::vector) AS similarity,
                ROW_NUMBER() OVER (ORDER BY c.embedding <=> $1::vector) AS vector_rank
            FROM kb_chunks c
            JOIN kb_documents d ON c.document_id = d.id
            WHERE c.tenant_id = $2 AND d.status = 'ready'
            ORDER BY c.embedding <=> $1::vector
            LIMIT 20
        """
        
        # Keyword search using full-text
        keyword_query = """
            SELECT 
                c.id AS chunk_id,
                c.document_id,
                c.content,
                c.section_title,
                d.filename,
                ts_rank(to_tsvector('english', c.content), plainto_tsquery('english', $3)) AS text_rank,
                ROW_NUMBER() OVER (ORDER BY ts_rank(to_tsvector('english', c.content), plainto_tsquery('english', $3)) DESC) AS keyword_rank
            FROM kb_chunks c
            JOIN kb_documents d ON c.document_id = d.id
            WHERE 
                c.tenant_id = $2 
                AND d.status = 'ready'
                AND to_tsvector('english', c.content) @@ plainto_tsquery('english', $3)
            ORDER BY text_rank DESC
            LIMIT 20
        """
        
        # Combine with RRF
        combined_query = f"""
            WITH vector_results AS ({vector_query}),
                 keyword_results AS ({keyword_query})
            SELECT 
                COALESCE(v.chunk_id, k.chunk_id) AS chunk_id,
                COALESCE(v.document_id, k.document_id) AS document_id,
                COALESCE(v.content, k.content) AS content,
                COALESCE(v.section_title, k.section_title) AS section_title,
                COALESCE(v.filename, k.filename) AS filename,
                COALESCE(v.similarity, 0) AS similarity,
                -- RRF score: 1/(k + rank) for each result type
                (1.0 / (60 + COALESCE(v.vector_rank, 1000))) + 
                (1.0 / (60 + COALESCE(k.keyword_rank, 1000))) AS rrf_score
            FROM vector_results v
            FULL OUTER JOIN keyword_results k ON v.chunk_id = k.chunk_id
            ORDER BY rrf_score DESC
            LIMIT $4
        """
        
        async with self.pool.acquire() as conn:
            rows = await conn.fetch(
                combined_query,
                str(query_embedding),
                params.tenant_id,
                query_text,
                params.top_k,
            )
        
        return [
            ChunkMatch(
                chunk_id=str(row['chunk_id']),
                document_id=str(row['document_id']),
                content=row['content'],
                similarity=float(row['similarity']),
                section_title=row['section_title'],
                filename=row['filename'],
            )
            for row in rows
        ]
    
    async def insert_chunk(
        self,
        chunk_data: dict,
    ) -> str:
        """Insert a chunk with embedding."""
        query = """
            INSERT INTO kb_chunks (
                document_id, tenant_id, content, content_hash,
                chunk_index, start_char, end_char, section_title,
                embedding, token_count
            ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9::vector, $10)
            RETURNING id
        """
        
        async with self.pool.acquire() as conn:
            result = await conn.fetchrow(
                query,
                chunk_data['document_id'],
                chunk_data['tenant_id'],
                chunk_data['content'],
                chunk_data['content_hash'],
                chunk_data['chunk_index'],
                chunk_data['start_char'],
                chunk_data['end_char'],
                chunk_data.get('section_title'),
                str(chunk_data['embedding']),
                chunk_data['token_count'],
            )
        
        return str(result['id'])
    
    async def delete_document_chunks(self, document_id: str) -> int:
        """Delete all chunks for a document."""
        query = "DELETE FROM kb_chunks WHERE document_id = $1"
        
        async with self.pool.acquire() as conn:
            result = await conn.execute(query, document_id)
        
        # Extract count from "DELETE N"
        return int(result.split()[-1])

51.6 Index Maintenance

-- Rebuild index after bulk inserts
REINDEX INDEX CONCURRENTLY idx_chunks_embedding_ivf;

-- Analyze table for query planner
ANALYZE kb_chunks;

-- Check index size
SELECT 
    indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes 
WHERE tablename = 'kb_chunks';

-- Monitor index usage
SELECT 
    indexrelname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes
WHERE tablename = 'kb_chunks';

51.7 Performance Tuning

Index Parameters

-- For IVFFlat: more lists = faster queries, less accurate
-- Guideline: lists = sqrt(rows) for < 1M rows
-- lists = rows/1000 for > 1M rows

-- For HNSW: higher m = more accurate, slower builds
-- m = 16 is good default
-- ef_construction = 64 (higher = more accurate index)

Query Tuning

-- Set probes for IVFFlat (trade speed for accuracy)
SET ivfflat.probes = 10;  -- Default is 1

-- Set ef_search for HNSW
SET hnsw.ef_search = 40;  -- Default is 40

Memory Configuration

-- Increase work memory for vector operations
SET work_mem = '256MB';

-- Increase maintenance work memory for index builds
SET maintenance_work_mem = '1GB';

Summary: What You’ve Learned in Part 8B

Section 50: Vector Embeddings

  • Embeddings convert text to numerical vectors
  • Similar meanings → similar vectors
  • We use OpenAI text-embedding-3-small (1536 dimensions)
  • Cosine similarity measures vector closeness

Section 51: pgvector Integration

  • pgvector adds vector support to PostgreSQL
  • IVFFlat index for fast approximate search
  • Distance operators: <=> (cosine), <-> (L2)
  • Hybrid search combines vector + keyword
  • Index tuning critical for performance

What’s Next

In Part 8C, you’ll learn:
  • Complete RAG pipeline
  • Query processing and reranking
  • Context assembly for LLM
  • Prompt injection with retrieved context

Document Metadata

FieldValue
Document IDPRD-008B
TitleJunior Developer PRD — Part 8B
Version1.0
StatusComplete

End of Part 8B — Continue to Part 8C

Junior Developer PRD — Part 8C: RAG Pipeline & Context Injection

Document Version: 1.0
Last Updated: January 25, 2026
Part: 8C of 10 (Sub-part 3 of 3)
Sections: 52-53

Table of Contents


Section 52: RAG Pipeline

52.1 Complete RAG Flow

USER QUERY → Query Processing → Embedding → Vector Search 
          → Reranking → Context Assembly → LLM Generation → RESPONSE

52.2 Query Processing

# services/kb-service/rag/query_processor.py
from dataclasses import dataclass
from typing import List

@dataclass
class ProcessedQuery:
    original: str
    normalized: str
    keywords: List[str]
    embedding: List[float] = None

class QueryProcessor:
    STOP_WORDS = {'what', 'when', 'where', 'how', 'is', 'are', 'the', 'a', 'an'}
    
    def __init__(self, embedding_service):
        self.embedding_service = embedding_service
    
    async def process(self, query: str) -> ProcessedQuery:
        normalized = query.lower().strip()
        keywords = [w for w in normalized.split() if w not in self.STOP_WORDS]
        embedding = await self.embedding_service.embed_query(normalized)
        
        return ProcessedQuery(
            original=query,
            normalized=normalized,
            keywords=keywords,
            embedding=embedding,
        )

52.3 Reranking

Reranking improves relevance beyond vector similarity:
# services/kb-service/rag/reranker.py
from dataclasses import dataclass
from typing import List
import cohere

@dataclass
class RankedChunk:
    chunk_id: str
    content: str
    relevance_score: float
    filename: str = None

class Reranker:
    def __init__(self, api_key: str):
        self._client = cohere.Client(api_key)
    
    async def rerank(self, query: str, chunks: List, top_k: int = 5) -> List[RankedChunk]:
        documents = [c.content for c in chunks]
        
        results = self._client.rerank(
            model="rerank-english-v3.0",
            query=query,
            documents=documents,
            top_n=top_k,
        )
        
        ranked = []
        for r in results.results:
            if r.relevance_score > 0.3:
                original = chunks[r.index]
                ranked.append(RankedChunk(
                    chunk_id=original.chunk_id,
                    content=original.content,
                    relevance_score=r.relevance_score,
                    filename=original.filename,
                ))
        
        return ranked

52.4 Complete RAG Service

# services/kb-service/rag/service.py
from dataclasses import dataclass
from typing import List, Optional

@dataclass
class RAGResult:
    chunks: List[RankedChunk]
    context_text: str
    total_tokens: int

class RAGService:
    def __init__(self, vector_repo, embedding_service, reranker=None):
        self.vector_repo = vector_repo
        self.embedding_service = embedding_service
        self.reranker = reranker
        self.query_processor = QueryProcessor(embedding_service)
    
    async def retrieve(
        self,
        query: str,
        tenant_id: str,
        knowledge_base_id: Optional[str] = None,
        top_k: int = 5,
    ) -> RAGResult:
        # 1. Process query
        processed = await self.query_processor.process(query)
        
        # 2. Vector search
        candidates = await self.vector_repo.search(
            query_embedding=processed.embedding,
            params=SearchParams(
                tenant_id=tenant_id,
                knowledge_base_id=knowledge_base_id,
                top_k=20,
            ),
        )
        
        if not candidates:
            return RAGResult(chunks=[], context_text="", total_tokens=0)
        
        # 3. Rerank
        if self.reranker:
            ranked = await self.reranker.rerank(query, candidates, top_k)
        else:
            ranked = candidates[:top_k]
        
        # 4. Assemble context
        context_text = self._assemble_context(ranked)
        total_tokens = len(context_text) // 4
        
        return RAGResult(
            chunks=ranked,
            context_text=context_text,
            total_tokens=total_tokens,
        )
    
    def _assemble_context(self, chunks: List[RankedChunk]) -> str:
        parts = []
        for chunk in chunks:
            source = chunk.filename or "Knowledge Base"
            parts.append(f"[{source}]\n{chunk.content}")
        return "\n\n---\n\n".join(parts)

Section 53: Context Injection

53.1 Injecting Context into Prompts

# services/agent-service/prompts/context_injection.py

class ContextInjector:
    TEMPLATE = """
<knowledge_base>
Use this information to answer questions:

{context}

RULES:
- Only use information from above for business questions
- If not found, say "I don't have that information"
- Speak naturally - don't mention "knowledge base" to caller
- Keep answers concise for voice
</knowledge_base>
"""
    
    def inject(self, base_prompt: str, chunks: List[RankedChunk]) -> str:
        if not chunks:
            context_section = "<knowledge_base>No relevant information found.</knowledge_base>"
        else:
            context = "\n\n".join([
                f"[{c.filename or 'KB'}]\n{c.content}" for c in chunks
            ])
            context_section = self.TEMPLATE.format(context=context)
        
        return base_prompt + "\n\n" + context_section

53.2 Voice Pipeline with RAG

# services/agent-service/pipeline/rag_integration.py

class VoicePipelineWithRAG:
    def __init__(self, rag_service, llm_client, context_injector, base_prompt):
        self.rag_service = rag_service
        self.llm_client = llm_client
        self.context_injector = context_injector
        self.base_prompt = base_prompt
    
    async def generate_response(
        self,
        user_message: str,
        tenant_id: str,
        knowledge_base_id: str,
        conversation_history: list,
    ) -> str:
        # 1. Retrieve context
        rag_result = await self.rag_service.retrieve(
            query=user_message,
            tenant_id=tenant_id,
            knowledge_base_id=knowledge_base_id,
        )
        
        # 2. Inject into prompt
        enhanced_prompt = self.context_injector.inject(
            self.base_prompt,
            rag_result.chunks,
        )
        
        # 3. Generate response
        messages = conversation_history + [{"role": "user", "content": user_message}]
        
        response = ""
        async for chunk in self.llm_client.generate_streaming(
            system_prompt=enhanced_prompt,
            messages=messages,
        ):
            response += chunk.text
        
        return response

53.3 Caching RAG Results

# services/kb-service/rag/cache.py
import hashlib
import redis.asyncio as redis

class RAGCache:
    def __init__(self, redis_client, ttl_seconds=3600):
        self.redis = redis_client
        self.ttl = ttl_seconds
    
    def _key(self, tenant_id: str, kb_id: str, query: str) -> str:
        h = hashlib.md5(query.lower().encode()).hexdigest()[:16]
        return f"rag:{tenant_id}:{kb_id}:{h}"
    
    async def get(self, tenant_id, kb_id, query):
        data = await self.redis.get(self._key(tenant_id, kb_id, query))
        return json.loads(data) if data else None
    
    async def set(self, tenant_id, kb_id, query, result):
        await self.redis.setex(
            self._key(tenant_id, kb_id, query),
            self.ttl,
            json.dumps(result),
        )
    
    async def invalidate(self, tenant_id, kb_id):
        keys = await self.redis.keys(f"rag:{tenant_id}:{kb_id}:*")
        if keys:
            await self.redis.delete(*keys)

53.4 Handling Edge Cases

# No results
if not rag_result.chunks:
    return "I don't have specific information about that. Would you like me to transfer you to someone who can help?"

# Low confidence
if max(c.relevance_score for c in chunks) < 0.5:
    return "I found some information but I'm not certain it answers your question directly..."

# Multiple conflicting sources
if detect_conflicts(chunks):
    return "I found different information from different sources. Let me connect you with someone who can clarify."

Part 8 Complete Summary

Sub-PartSectionsKey Topics
8A48-49Document parsing, chunking, ingestion
8B50-51Embeddings, pgvector, similarity search
8C52-53RAG pipeline, reranking, context injection
RAG Pipeline Summary:
  1. Parse documents into text
  2. Chunk into ~512 token segments
  3. Embed chunks using text-embedding-3-small
  4. Store in pgvector
  5. Search with vector similarity
  6. Rerank for better relevance
  7. Inject context into LLM prompt
  8. Generate grounded response

What’s Next

Part 9: Testing & Deployment will cover:
  • Unit and integration testing
  • End-to-end voice testing
  • CI/CD pipelines
  • Production deployment

End of Part 8C

Junior Developer PRD — Part 9A: Unit Testing & Integration Testing

Document Version: 1.0
Last Updated: January 25, 2026
Part: 9A of 10 (Sub-part 1 of 3)
Sections: 54-55
Audience: Junior developers with no prior context
Estimated Reading Time: 25 minutes

How to Use This Document

This is Part 9A—the first of three sub-parts covering Testing & Quality Assurance:
  • Part 9A (this document): Unit Testing & Integration Testing
  • Part 9B: End-to-End Testing & Voice Testing
  • Part 9C: Performance Testing & CI/CD
Prerequisites: Parts 1-8 of the PRD series.

Table of Contents


Section 54: Unit Testing Fundamentals

54.1 Why Testing Matters

Testing isn’t optional—it’s how we ensure the system works correctly and continues to work as we make changes.

The Cost of Bugs

Stage FoundCost to FixExample
During coding1xDeveloper catches typo
Unit test2xTest fails, fix immediately
Integration test5xMultiple components involved
QA/Staging10xFull deployment needed
Production100xCustomer impact, urgent fix

Testing Philosophy for Voice AI

Voice AI has unique testing challenges:
  1. Real-time constraints: Latency matters
  2. Non-deterministic: AI responses vary
  3. External dependencies: STT, TTS, LLM APIs
  4. Audio processing: Binary data, timing
  5. State management: Conversation context
Our approach:
  • Unit tests: Fast, isolated, deterministic
  • Integration tests: Component interactions
  • E2E tests: Full pipeline with mocks
  • Voice tests: Audio-specific scenarios

54.2 Testing Stack

ToolPurposeWhy We Chose It
pytestTest frameworkIndustry standard, powerful fixtures
pytest-asyncioAsync testingOur code is async
pytest-covCoverageTrack test completeness
factory_boyTest dataGenerate realistic fixtures
fakerFake dataRandom but realistic values
respxHTTP mockingMock external APIs
pytest-mockMockingFlexible mock/patch
testcontainersDatabase testingReal PostgreSQL in Docker

Installation

# requirements-test.txt
pytest==8.0.0
pytest-asyncio==0.23.3
pytest-cov==4.1.0
pytest-mock==3.12.0
pytest-timeout==2.2.0
factory-boy==3.3.0
faker==22.0.0
respx==0.20.2
httpx==0.26.0
testcontainers==3.7.1
freezegun==1.2.2

54.3 Project Test Structure

services/
├── api-gateway/
│   ├── src/
│   └── tests/
│       ├── __init__.py
│       ├── conftest.py          # Shared fixtures
│       ├── unit/
│       │   ├── __init__.py
│       │   ├── test_auth.py
│       │   ├── test_routes.py
│       │   └── test_middleware.py
│       ├── integration/
│       │   ├── __init__.py
│       │   ├── test_database.py
│       │   └── test_redis.py
│       └── fixtures/
│           ├── __init__.py
│           ├── factories.py
│           └── sample_data.py

├── agent-service/
│   ├── src/
│   └── tests/
│       ├── conftest.py
│       ├── unit/
│       │   ├── test_vad.py
│       │   ├── test_stt.py
│       │   ├── test_llm.py
│       │   ├── test_tts.py
│       │   └── test_pipeline.py
│       ├── integration/
│       │   ├── test_deepgram.py
│       │   ├── test_anthropic.py
│       │   └── test_chatterbox.py
│       └── fixtures/
│           ├── audio_samples/
│           │   ├── hello.wav
│           │   ├── silence.wav
│           │   └── noise.wav
│           └── factories.py

└── kb-service/
    ├── src/
    └── tests/
        ├── conftest.py
        ├── unit/
        │   ├── test_chunking.py
        │   ├── test_embedding.py
        │   └── test_rag.py
        ├── integration/
        │   ├── test_pgvector.py
        │   └── test_document_processing.py
        └── fixtures/
            ├── documents/
            │   ├── sample.pdf
            │   ├── sample.docx
            │   └── sample.txt
            └── factories.py

54.4 pytest Configuration

# pytest.ini
[pytest]
asyncio_mode = auto
testpaths = tests
python_files = test_*.py
python_classes = Test*
python_functions = test_*
addopts = 
    -v
    --tb=short
    --strict-markers
    -ra
markers =
    unit: Unit tests (fast, isolated)
    integration: Integration tests (may need external services)
    slow: Slow tests (> 1 second)
    voice: Voice-specific tests
    requires_gpu: Tests requiring GPU
filterwarnings =
    ignore::DeprecationWarning
timeout = 30
# pyproject.toml (alternative)
[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]
markers = [
    "unit: Unit tests",
    "integration: Integration tests", 
    "slow: Slow tests",
    "voice: Voice-specific tests",
]

[tool.coverage.run]
source = ["src"]
branch = true
omit = ["*/tests/*", "*/__init__.py"]

[tool.coverage.report]
exclude_lines = [
    "pragma: no cover",
    "def __repr__",
    "raise NotImplementedError",
    "if TYPE_CHECKING:",
]
fail_under = 80

54.5 Shared Fixtures (conftest.py)

"""
Shared test fixtures for all tests.

File: services/agent-service/tests/conftest.py
"""
import pytest
import asyncio
from typing import AsyncGenerator, Generator
from unittest.mock import AsyncMock, MagicMock
import numpy as np
from datetime import datetime

# ============================================================
# ASYNC EVENT LOOP
# ============================================================

@pytest.fixture(scope="session")
def event_loop() -> Generator[asyncio.AbstractEventLoop, None, None]:
    """Create event loop for async tests."""
    loop = asyncio.new_event_loop()
    yield loop
    loop.close()


# ============================================================
# CONFIGURATION FIXTURES
# ============================================================

@pytest.fixture
def deepgram_config():
    """Deepgram configuration for testing."""
    from config.deepgram import DeepgramConfig, DeepgramModel
    
    return DeepgramConfig(
        api_key="test-api-key",
        model=DeepgramModel.NOVA_2,
        language="en-US",
        sample_rate=16000,
        channels=1,
        punctuate=True,
        interim_results=True,
    )


@pytest.fixture
def vad_config():
    """VAD configuration for testing."""
    from config.vad import VADConfig
    
    return VADConfig(
        threshold=0.5,
        min_speech_duration_ms=250,
        min_silence_duration_ms=300,
        frame_duration_ms=30,
        sample_rate=16000,
    )


@pytest.fixture
def llm_config():
    """LLM configuration for testing."""
    from config.llm import LLMConfig, ClaudeModel
    
    return LLMConfig(
        api_key="test-anthropic-key",
        model=ClaudeModel.SONNET,
        max_tokens=1024,
        temperature=0.7,
        stream=True,
    )


@pytest.fixture
def tts_config():
    """TTS configuration for testing."""
    from config.tts import ChatterboxConfig, TTSVoice
    
    return ChatterboxConfig(
        api_key="test-runpod-key",
        endpoint_url="https://test.runpod.ai/v2/test/runsync",
        voice=TTSVoice.PROFESSIONAL_FEMALE,
        speed=1.0,
        sample_rate=24000,
    )


# ============================================================
# AUDIO FIXTURES
# ============================================================

@pytest.fixture
def silence_audio() -> np.ndarray:
    """1 second of silence at 16kHz."""
    return np.zeros(16000, dtype=np.int16)


@pytest.fixture
def speech_audio() -> np.ndarray:
    """
    Simulated speech audio (sine wave with envelope).
    Not real speech, but triggers VAD.
    """
    sample_rate = 16000
    duration = 1.0
    t = np.linspace(0, duration, int(sample_rate * duration))
    
    # 200Hz tone with amplitude envelope
    frequency = 200
    envelope = np.sin(np.pi * t / duration)  # Fade in/out
    audio = envelope * np.sin(2 * np.pi * frequency * t)
    
    # Convert to int16
    audio = (audio * 32767 * 0.5).astype(np.int16)
    return audio


@pytest.fixture
def noise_audio() -> np.ndarray:
    """Random noise (should not trigger VAD)."""
    return (np.random.randn(16000) * 1000).astype(np.int16)


@pytest.fixture
def audio_frame_30ms() -> np.ndarray:
    """30ms audio frame (480 samples at 16kHz)."""
    return np.zeros(480, dtype=np.int16)


@pytest.fixture
def speech_frames(speech_audio) -> list:
    """Speech audio split into 30ms frames."""
    frame_size = 480  # 30ms at 16kHz
    frames = []
    for i in range(0, len(speech_audio), frame_size):
        frame = speech_audio[i:i + frame_size]
        if len(frame) == frame_size:
            frames.append(frame)
    return frames


# ============================================================
# MOCK FIXTURES
# ============================================================

@pytest.fixture
def mock_deepgram_client():
    """Mocked Deepgram STT client."""
    client = AsyncMock()
    client.connect = AsyncMock()
    client.send_audio = AsyncMock()
    client.close = AsyncMock()
    client.is_connected = True
    return client


@pytest.fixture
def mock_anthropic_client():
    """Mocked Anthropic Claude client."""
    client = AsyncMock()
    
    # Mock streaming response
    async def mock_stream(*args, **kwargs):
        class MockStream:
            async def __aenter__(self):
                return self
            
            async def __aexit__(self, *args):
                pass
            
            async def __aiter__(self):
                # Yield mock events
                yield MagicMock(type="content_block_delta", delta=MagicMock(text="Hello"))
                yield MagicMock(type="content_block_delta", delta=MagicMock(text=" there!"))
                yield MagicMock(type="message_stop")
        
        return MockStream()
    
    client.messages.stream = mock_stream
    return client


@pytest.fixture
def mock_tts_client():
    """Mocked TTS client."""
    import base64
    
    client = AsyncMock()
    
    # Create fake WAV data
    fake_audio = np.zeros(24000, dtype=np.int16).tobytes()
    fake_base64 = base64.b64encode(fake_audio).decode()
    
    client.synthesize = AsyncMock(return_value={
        "audio_base64": fake_base64,
        "duration_ms": 1000,
        "sample_rate": 24000,
    })
    
    return client


@pytest.fixture
def mock_redis():
    """Mocked Redis client."""
    redis = AsyncMock()
    redis.get = AsyncMock(return_value=None)
    redis.set = AsyncMock()
    redis.setex = AsyncMock()
    redis.delete = AsyncMock()
    redis.hget = AsyncMock(return_value=None)
    redis.hset = AsyncMock()
    redis.hgetall = AsyncMock(return_value={})
    redis.rpush = AsyncMock()
    redis.lrange = AsyncMock(return_value=[])
    redis.sadd = AsyncMock()
    redis.smembers = AsyncMock(return_value=set())
    redis.expire = AsyncMock()
    redis.hincrby = AsyncMock()
    return redis


# ============================================================
# DATABASE FIXTURES
# ============================================================

@pytest.fixture
def mock_db_pool():
    """Mocked database connection pool."""
    pool = AsyncMock()
    
    # Mock connection context manager
    conn = AsyncMock()
    conn.fetch = AsyncMock(return_value=[])
    conn.fetchrow = AsyncMock(return_value=None)
    conn.execute = AsyncMock(return_value="OK")
    
    async def acquire():
        return conn
    
    pool.acquire = MagicMock(return_value=AsyncContextManager(conn))
    return pool


class AsyncContextManager:
    """Helper for async context manager mocking."""
    def __init__(self, return_value):
        self.return_value = return_value
    
    async def __aenter__(self):
        return self.return_value
    
    async def __aexit__(self, *args):
        pass


# ============================================================
# CALL CONTEXT FIXTURES
# ============================================================

@pytest.fixture
def sample_call_context():
    """Sample call context for testing."""
    from state.models import CallContext, CallDirection
    
    return CallContext(
        call_id="test-call-123",
        tenant_id="test-tenant-456",
        agency_id="test-agency-789",
        direction=CallDirection.INBOUND,
        caller_phone="+15551234567",
        caller_name="John Doe",
        started_at=datetime.utcnow(),
        goto_call_id="goto-call-abc",
        livekit_room_name="room-xyz",
    )


@pytest.fixture
def sample_call_state():
    """Sample call state for testing."""
    from state.models import CallState, PipelineState
    
    return CallState(
        call_id="test-call-123",
        pipeline_state=PipelineState.LISTENING,
        is_speaking=False,
        is_processing=False,
        tts_playing=False,
        turn_count=0,
    )


# ============================================================
# TRANSCRIPT FIXTURES
# ============================================================

@pytest.fixture
def sample_transcript_result():
    """Sample Deepgram transcript result."""
    from integrations.deepgram_stt import TranscriptResult
    
    return TranscriptResult(
        text="What are your business hours?",
        is_final=True,
        speech_final=True,
        confidence=0.95,
        words=[
            {"word": "What", "start": 0.0, "end": 0.2, "confidence": 0.98},
            {"word": "are", "start": 0.2, "end": 0.3, "confidence": 0.97},
            {"word": "your", "start": 0.3, "end": 0.5, "confidence": 0.96},
            {"word": "business", "start": 0.5, "end": 0.8, "confidence": 0.95},
            {"word": "hours", "start": 0.8, "end": 1.0, "confidence": 0.94},
        ],
        start=0.0,
        duration=1.0,
    )


# ============================================================
# RAG FIXTURES
# ============================================================

@pytest.fixture
def sample_chunks():
    """Sample RAG chunks for testing."""
    from rag.reranker import RankedChunk
    
    return [
        RankedChunk(
            chunk_id="chunk-1",
            content="Our business hours are Monday through Friday, 9 AM to 5 PM.",
            relevance_score=0.92,
            original_rank=0,
            section_title="Hours",
            filename="business_info.pdf",
        ),
        RankedChunk(
            chunk_id="chunk-2",
            content="We are closed on weekends and major holidays.",
            relevance_score=0.78,
            original_rank=1,
            section_title="Hours",
            filename="business_info.pdf",
        ),
        RankedChunk(
            chunk_id="chunk-3",
            content="Saturday hours are 9 AM to 2 PM by appointment only.",
            relevance_score=0.65,
            original_rank=2,
            section_title="Weekend Hours",
            filename="faq.txt",
        ),
    ]

54.6 Writing Effective Unit Tests

Test Structure: Arrange-Act-Assert (AAA)

"""
Example unit tests with AAA pattern.

File: services/agent-service/tests/unit/test_vad.py
"""
import pytest
import numpy as np
from pipeline.vad import SileroVAD, VADEvent


class TestSileroVAD:
    """Unit tests for Voice Activity Detection."""
    
    # --------------------------------------------------------
    # INITIALIZATION TESTS
    # --------------------------------------------------------
    
    def test_init_creates_vad_with_config(self, vad_config):
        """Test VAD initializes with provided config."""
        # Arrange
        config = vad_config
        
        # Act
        vad = SileroVAD(config)
        
        # Assert
        assert vad.config == config
        assert vad.config.threshold == 0.5
        assert vad.config.sample_rate == 16000
        assert not vad.is_speaking
    
    def test_init_uses_default_config(self):
        """Test VAD uses defaults when no config provided."""
        # Arrange & Act
        from config.vad import VADConfig
        vad = SileroVAD(VADConfig())
        
        # Assert
        assert vad.config.threshold == 0.5
        assert vad.config.min_speech_duration_ms == 250
    
    # --------------------------------------------------------
    # MODEL LOADING TESTS
    # --------------------------------------------------------
    
    def test_load_model_success(self, vad_config):
        """Test VAD model loads successfully."""
        # Arrange
        vad = SileroVAD(vad_config)
        
        # Act
        vad.load_model()
        
        # Assert
        assert vad._model_loaded is True
        assert vad._model is not None
    
    def test_process_frame_before_load_raises_error(self, vad_config, audio_frame_30ms):
        """Test processing before loading model raises error."""
        # Arrange
        vad = SileroVAD(vad_config)
        
        # Act & Assert
        with pytest.raises(RuntimeError, match="Call load_model"):
            vad.process_frame(audio_frame_30ms)
    
    # --------------------------------------------------------
    # SPEECH DETECTION TESTS
    # --------------------------------------------------------
    
    @pytest.mark.slow
    def test_detect_speech_start(self, vad_config, speech_frames):
        """Test VAD detects start of speech."""
        # Arrange
        vad = SileroVAD(vad_config)
        vad.load_model()
        events = []
        
        # Act
        for frame in speech_frames[:20]:  # First 600ms
            event = vad.process_frame(frame)
            if event:
                events.append(event)
        
        # Assert
        speech_starts = [e for e in events if e.event_type == "speech_start"]
        assert len(speech_starts) >= 1
        assert speech_starts[0].timestamp_ms > 0
    
    @pytest.mark.slow
    def test_detect_speech_end_after_silence(self, vad_config, speech_frames, silence_audio):
        """Test VAD detects end of speech when silence follows."""
        # Arrange
        vad = SileroVAD(vad_config)
        vad.load_model()
        
        # Process speech frames
        for frame in speech_frames:
            vad.process_frame(frame)
        
        # Act - process silence frames
        silence_frames = np.array_split(silence_audio, len(silence_audio) // 480)
        events = []
        for frame in silence_frames[:20]:  # 600ms of silence
            if len(frame) == 480:
                event = vad.process_frame(frame.astype(np.int16))
                if event:
                    events.append(event)
        
        # Assert
        speech_ends = [e for e in events if e.event_type == "speech_end"]
        assert len(speech_ends) >= 1
    
    def test_silence_does_not_trigger_speech(self, vad_config, silence_audio):
        """Test silence does not trigger speech detection."""
        # Arrange
        vad = SileroVAD(vad_config)
        vad.load_model()
        
        # Act
        frames = np.array_split(silence_audio, len(silence_audio) // 480)
        events = []
        for frame in frames:
            if len(frame) == 480:
                event = vad.process_frame(frame.astype(np.int16))
                if event:
                    events.append(event)
        
        # Assert
        assert len(events) == 0
        assert not vad.is_speaking
    
    # --------------------------------------------------------
    # RESET TESTS
    # --------------------------------------------------------
    
    def test_reset_clears_state(self, vad_config, speech_frames):
        """Test reset clears VAD state."""
        # Arrange
        vad = SileroVAD(vad_config)
        vad.load_model()
        
        # Process some speech to set state
        for frame in speech_frames[:10]:
            vad.process_frame(frame)
        
        # Act
        vad.reset()
        
        # Assert
        assert not vad.is_speaking
        assert vad._speech_start_time is None
        assert vad._silence_start_time is None
    
    # --------------------------------------------------------
    # THRESHOLD TESTS
    # --------------------------------------------------------
    
    @pytest.mark.parametrize("threshold,expected_sensitivity", [
        (0.3, "high"),    # Low threshold = high sensitivity
        (0.5, "medium"),
        (0.7, "low"),     # High threshold = low sensitivity
    ])
    def test_threshold_affects_sensitivity(self, threshold, expected_sensitivity):
        """Test different thresholds affect detection sensitivity."""
        # Arrange
        from config.vad import VADConfig
        config = VADConfig(threshold=threshold)
        vad = SileroVAD(config)
        
        # Assert
        assert vad.config.threshold == threshold
        # Note: Actual sensitivity testing would require more complex audio

Testing Async Code

"""
Testing async functions.

File: services/agent-service/tests/unit/test_stt.py
"""
import pytest
from unittest.mock import AsyncMock, patch


class TestDeepgramSTTClient:
    """Unit tests for Deepgram STT client."""
    
    @pytest.mark.asyncio
    async def test_connect_success(self, deepgram_config):
        """Test successful connection to Deepgram."""
        # Arrange
        from integrations.deepgram_stt import DeepgramSTTClient
        
        with patch('websockets.connect', new_callable=AsyncMock) as mock_connect:
            mock_ws = AsyncMock()
            mock_ws.__aiter__ = AsyncMock(return_value=iter([]))
            mock_connect.return_value = mock_ws
            
            client = DeepgramSTTClient(deepgram_config)
            
            # Act
            await client.connect()
            
            # Assert
            assert client.is_connected
            mock_connect.assert_called_once()
    
    @pytest.mark.asyncio
    async def test_send_audio_when_connected(self, deepgram_config):
        """Test sending audio when connected."""
        # Arrange
        from integrations.deepgram_stt import DeepgramSTTClient
        
        with patch('websockets.connect', new_callable=AsyncMock) as mock_connect:
            mock_ws = AsyncMock()
            mock_ws.__aiter__ = AsyncMock(return_value=iter([]))
            mock_connect.return_value = mock_ws
            
            client = DeepgramSTTClient(deepgram_config)
            await client.connect()
            
            audio_data = b'\x00' * 1024
            
            # Act
            await client.send_audio(audio_data)
            
            # Assert
            mock_ws.send.assert_called_once_with(audio_data)
    
    @pytest.mark.asyncio
    async def test_send_audio_when_disconnected_logs_warning(
        self, deepgram_config, caplog
    ):
        """Test sending audio when disconnected logs warning."""
        # Arrange
        from integrations.deepgram_stt import DeepgramSTTClient
        
        client = DeepgramSTTClient(deepgram_config)
        audio_data = b'\x00' * 1024
        
        # Act
        await client.send_audio(audio_data)
        
        # Assert
        assert "not connected" in caplog.text.lower()
    
    @pytest.mark.asyncio
    async def test_transcript_callback_called(self, deepgram_config):
        """Test transcript callback is called with results."""
        # Arrange
        from integrations.deepgram_stt import DeepgramSTTClient
        import json
        
        callback = AsyncMock()
        
        # Mock WebSocket message
        mock_message = json.dumps({
            "type": "Results",
            "is_final": True,
            "speech_final": True,
            "channel": {
                "alternatives": [{
                    "transcript": "Hello world",
                    "confidence": 0.95,
                    "words": [],
                }]
            },
            "start": 0.0,
            "duration": 1.0,
        })
        
        with patch('websockets.connect', new_callable=AsyncMock) as mock_connect:
            mock_ws = AsyncMock()
            
            # Make __aiter__ yield our message
            async def mock_iter():
                yield mock_message
            
            mock_ws.__aiter__ = mock_iter
            mock_connect.return_value = mock_ws
            
            client = DeepgramSTTClient(
                deepgram_config,
                on_transcript=callback,
            )
            
            # Act
            await client.connect()
            # Give time for receive loop
            await asyncio.sleep(0.1)
            
            # Assert
            callback.assert_called_once()
            result = callback.call_args[0][0]
            assert result.text == "Hello world"
            assert result.is_final is True

54.7 Test Factories

"""
Test data factories using factory_boy.

File: services/agent-service/tests/fixtures/factories.py
"""
import factory
from factory import fuzzy
from datetime import datetime, timedelta
import uuid


class TenantFactory(factory.Factory):
    """Factory for creating test tenants."""
    
    class Meta:
        model = dict
    
    id = factory.LazyFunction(lambda: str(uuid.uuid4()))
    name = factory.Faker('company')
    slug = factory.LazyAttribute(lambda o: o.name.lower().replace(' ', '-'))
    plan = fuzzy.FuzzyChoice(['free', 'starter', 'professional', 'enterprise'])
    is_active = True
    created_at = factory.LazyFunction(datetime.utcnow)


class AgencyFactory(factory.Factory):
    """Factory for creating test agencies."""
    
    class Meta:
        model = dict
    
    id = factory.LazyFunction(lambda: str(uuid.uuid4()))
    tenant_id = factory.LazyFunction(lambda: str(uuid.uuid4()))
    name = factory.Faker('company')
    phone_number = factory.Faker('phone_number')
    timezone = 'America/New_York'
    is_active = True


class CallFactory(factory.Factory):
    """Factory for creating test calls."""
    
    class Meta:
        model = dict
    
    id = factory.LazyFunction(lambda: str(uuid.uuid4()))
    tenant_id = factory.LazyFunction(lambda: str(uuid.uuid4()))
    agency_id = factory.LazyFunction(lambda: str(uuid.uuid4()))
    direction = fuzzy.FuzzyChoice(['inbound', 'outbound'])
    caller_phone = factory.Faker('phone_number')
    caller_name = factory.Faker('name')
    status = 'active'
    started_at = factory.LazyFunction(datetime.utcnow)
    ended_at = None
    duration_seconds = None
    
    class Params:
        completed = factory.Trait(
            status='completed',
            ended_at=factory.LazyAttribute(
                lambda o: o.started_at + timedelta(minutes=5)
            ),
            duration_seconds=300,
        )


class ConversationTurnFactory(factory.Factory):
    """Factory for creating conversation turns."""
    
    class Meta:
        model = dict
    
    id = factory.LazyFunction(lambda: str(uuid.uuid4()))
    call_id = factory.LazyFunction(lambda: str(uuid.uuid4()))
    role = fuzzy.FuzzyChoice(['user', 'assistant'])
    content = factory.Faker('sentence')
    timestamp = factory.LazyFunction(datetime.utcnow)
    duration_ms = fuzzy.FuzzyFloat(500, 3000)


class DocumentFactory(factory.Factory):
    """Factory for creating test documents."""
    
    class Meta:
        model = dict
    
    id = factory.LazyFunction(lambda: str(uuid.uuid4()))
    tenant_id = factory.LazyFunction(lambda: str(uuid.uuid4()))
    knowledge_base_id = factory.LazyFunction(lambda: str(uuid.uuid4()))
    filename = factory.Faker('file_name', extension='pdf')
    file_type = 'pdf'
    file_size_bytes = fuzzy.FuzzyInteger(1000, 1000000)
    status = 'ready'
    title = factory.Faker('sentence', nb_words=5)
    created_at = factory.LazyFunction(datetime.utcnow)


class ChunkFactory(factory.Factory):
    """Factory for creating test chunks."""
    
    class Meta:
        model = dict
    
    id = factory.LazyFunction(lambda: str(uuid.uuid4()))
    document_id = factory.LazyFunction(lambda: str(uuid.uuid4()))
    tenant_id = factory.LazyFunction(lambda: str(uuid.uuid4()))
    content = factory.Faker('paragraph')
    chunk_index = factory.Sequence(lambda n: n)
    token_count = fuzzy.FuzzyInteger(100, 500)
    section_title = factory.Faker('sentence', nb_words=3)
    embedding = factory.LazyFunction(
        lambda: [0.0] * 1536  # Mock embedding
    )


# Usage examples:
# tenant = TenantFactory()
# call = CallFactory(tenant_id=tenant['id'])
# completed_call = CallFactory(completed=True)
# turns = ConversationTurnFactory.create_batch(5, call_id=call['id'])

54.8 Mocking External Services

"""
Mocking external API calls.

File: services/agent-service/tests/unit/test_llm.py
"""
import pytest
from unittest.mock import AsyncMock, MagicMock, patch
import anthropic


class TestClaudeLLM:
    """Unit tests for Claude LLM client."""
    
    @pytest.mark.asyncio
    async def test_generate_streaming_yields_chunks(self, llm_config):
        """Test streaming generation yields text chunks."""
        # Arrange
        from integrations.claude_llm import ClaudeLLM, StreamingChunk
        
        # Mock the Anthropic client
        mock_client = MagicMock()
        
        # Create mock stream
        async def mock_stream_context():
            class MockStream:
                async def __aenter__(self):
                    return self
                
                async def __aexit__(self, *args):
                    pass
                
                async def __aiter__(self):
                    # Yield content deltas
                    yield MagicMock(
                        type="content_block_delta",
                        delta=MagicMock(text="Hello")
                    )
                    yield MagicMock(
                        type="content_block_delta",
                        delta=MagicMock(text=" there")
                    )
                    yield MagicMock(
                        type="content_block_delta",
                        delta=MagicMock(text="!")
                    )
                    yield MagicMock(type="message_stop")
            
            return MockStream()
        
        mock_client.messages.stream = mock_stream_context
        
        with patch.object(anthropic, 'AsyncAnthropic', return_value=mock_client):
            llm = ClaudeLLM(llm_config)
            llm._client = mock_client
            
            # Act
            chunks = []
            async for chunk in llm.generate_streaming(
                system_prompt="You are helpful.",
                messages=[{"role": "user", "content": "Hi"}],
            ):
                chunks.append(chunk)
            
            # Assert
            assert len(chunks) == 4  # 3 text + 1 stop
            assert chunks[0].text == "Hello"
            assert chunks[1].text == " there"
            assert chunks[2].text == "!"
            assert chunks[3].is_complete is True
    
    @pytest.mark.asyncio
    async def test_timeout_returns_error_chunk(self, llm_config):
        """Test timeout returns error message chunk."""
        # Arrange
        from integrations.claude_llm import ClaudeLLM
        
        mock_client = MagicMock()
        mock_client.messages.stream = AsyncMock(
            side_effect=anthropic.APITimeoutError(request=MagicMock())
        )
        
        with patch.object(anthropic, 'AsyncAnthropic', return_value=mock_client):
            llm = ClaudeLLM(llm_config)
            llm._client = mock_client
            
            # Act
            chunks = []
            async for chunk in llm.generate_streaming(
                system_prompt="Test",
                messages=[{"role": "user", "content": "Test"}],
            ):
                chunks.append(chunk)
            
            # Assert
            assert len(chunks) == 1
            assert chunks[0].is_complete is True
            assert "trouble" in chunks[0].text.lower()


class TestTTSClient:
    """Unit tests for TTS client."""
    
    @pytest.mark.asyncio
    async def test_synthesize_returns_audio(self, tts_config):
        """Test synthesize returns audio chunk."""
        # Arrange
        from integrations.chatterbox_tts import ChatterboxTTSClient
        import base64
        
        # Create fake response
        fake_audio = b'\x00' * 48000  # 1 second at 24kHz
        fake_response = {
            "output": {
                "audio_base64": base64.b64encode(fake_audio).decode(),
                "duration_ms": 1000,
                "sample_rate": 24000,
            }
        }
        
        with patch('aiohttp.ClientSession') as mock_session_class:
            mock_session = AsyncMock()
            mock_response = AsyncMock()
            mock_response.json = AsyncMock(return_value=fake_response)
            mock_response.raise_for_status = MagicMock()
            
            mock_session.post = MagicMock(
                return_value=AsyncContextManager(mock_response)
            )
            mock_session_class.return_value = mock_session
            
            client = ChatterboxTTSClient(tts_config)
            client._session = mock_session
            
            # Act
            result = await client.synthesize("Hello world")
            
            # Assert
            assert result.audio_data == fake_audio
            assert result.duration_ms == 1000
            assert result.sample_rate == 24000
    
    @pytest.mark.asyncio
    async def test_synthesize_streaming_splits_sentences(self, tts_config):
        """Test streaming synthesize splits text into sentences."""
        # Arrange
        from integrations.chatterbox_tts import ChatterboxTTSClient
        
        client = ChatterboxTTSClient(tts_config)
        
        # Mock synthesize to track calls
        call_texts = []
        
        async def mock_synthesize(text):
            call_texts.append(text)
            from integrations.chatterbox_tts import TTSAudioChunk
            return TTSAudioChunk(
                audio_data=b'\x00' * 1000,
                sample_rate=24000,
                duration_ms=100,
                is_final=False,
                text=text,
            )
        
        client.synthesize = mock_synthesize
        
        # Act
        chunks = []
        async for chunk in client.synthesize_streaming(
            "Hello there. How are you? I'm doing well."
        ):
            chunks.append(chunk)
        
        # Assert
        assert len(chunks) == 3
        assert "Hello there." in call_texts
        assert "How are you?" in call_texts
        assert "I'm doing well." in call_texts


class AsyncContextManager:
    """Helper for mocking async context managers."""
    def __init__(self, return_value):
        self.return_value = return_value
    
    async def __aenter__(self):
        return self.return_value
    
    async def __aexit__(self, *args):
        pass

54.9 Running Unit Tests

# Run all unit tests
pytest tests/unit/ -v

# Run with coverage
pytest tests/unit/ --cov=src --cov-report=html

# Run specific test file
pytest tests/unit/test_vad.py -v

# Run specific test class
pytest tests/unit/test_vad.py::TestSileroVAD -v

# Run specific test
pytest tests/unit/test_vad.py::TestSileroVAD::test_detect_speech_start -v

# Run tests matching pattern
pytest tests/unit/ -k "speech" -v

# Run only fast tests (exclude slow marker)
pytest tests/unit/ -m "not slow" -v

# Run with parallel execution
pytest tests/unit/ -n auto

# Run with verbose output and stop on first failure
pytest tests/unit/ -vvs -x

# Generate JUnit XML for CI
pytest tests/unit/ --junitxml=test-results.xml

Section 55: Integration Testing

55.1 What is Integration Testing?

Integration tests verify that components work together correctly. They test:
  • Database operations with real PostgreSQL
  • Redis operations with real Redis
  • API endpoints with real HTTP
  • Multiple services communicating

Unit vs Integration Tests

AspectUnit TestsIntegration Tests
ScopeSingle function/classMultiple components
DependenciesMockedReal (or containers)
SpeedFast (ms)Slower (seconds)
IsolationCompletePartial
FlakinessLowHigher
PurposeLogic correctnessSystem correctness

55.2 Database Integration Tests

"""
Database integration tests using testcontainers.

File: services/api-gateway/tests/integration/test_database.py
"""
import pytest
import asyncio
import asyncpg
from testcontainers.postgres import PostgresContainer


# ============================================================
# FIXTURES
# ============================================================

@pytest.fixture(scope="module")
def postgres_container():
    """Start PostgreSQL container for tests."""
    with PostgresContainer("postgres:15") as postgres:
        yield postgres


@pytest.fixture(scope="module")
def postgres_url(postgres_container):
    """Get PostgreSQL connection URL."""
    return postgres_container.get_connection_url().replace(
        "postgresql+psycopg2://",
        "postgresql://"
    )


@pytest.fixture
async def db_pool(postgres_url):
    """Create connection pool."""
    # Parse URL for asyncpg
    pool = await asyncpg.create_pool(
        postgres_url.replace("postgresql://", "postgres://"),
        min_size=1,
        max_size=5,
    )
    
    # Run migrations
    await run_migrations(pool)
    
    yield pool
    
    await pool.close()


async def run_migrations(pool):
    """Run database migrations for tests."""
    async with pool.acquire() as conn:
        # Enable pgvector
        await conn.execute("CREATE EXTENSION IF NOT EXISTS vector")
        
        # Create tables
        await conn.execute("""
            CREATE TABLE IF NOT EXISTS tenants (
                id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
                name VARCHAR(255) NOT NULL,
                slug VARCHAR(255) UNIQUE NOT NULL,
                plan VARCHAR(50) DEFAULT 'free',
                is_active BOOLEAN DEFAULT true,
                created_at TIMESTAMPTZ DEFAULT NOW()
            )
        """)
        
        await conn.execute("""
            CREATE TABLE IF NOT EXISTS agencies (
                id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
                tenant_id UUID NOT NULL REFERENCES tenants(id),
                name VARCHAR(255) NOT NULL,
                phone_number VARCHAR(50),
                is_active BOOLEAN DEFAULT true,
                created_at TIMESTAMPTZ DEFAULT NOW()
            )
        """)
        
        await conn.execute("""
            CREATE TABLE IF NOT EXISTS kb_chunks (
                id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
                tenant_id UUID NOT NULL,
                document_id UUID NOT NULL,
                content TEXT NOT NULL,
                embedding vector(1536),
                token_count INTEGER,
                created_at TIMESTAMPTZ DEFAULT NOW()
            )
        """)
        
        # Create vector index
        await conn.execute("""
            CREATE INDEX IF NOT EXISTS idx_chunks_embedding 
            ON kb_chunks USING ivfflat (embedding vector_cosine_ops)
            WITH (lists = 10)
        """)


# ============================================================
# TESTS
# ============================================================

class TestTenantRepository:
    """Integration tests for tenant repository."""
    
    @pytest.mark.asyncio
    async def test_create_tenant(self, db_pool):
        """Test creating a tenant in database."""
        # Arrange
        from repositories.tenant_repository import TenantRepository
        
        repo = TenantRepository(db_pool)
        
        # Act
        tenant_id = await repo.create(
            name="Test Company",
            slug="test-company",
            plan="professional",
        )
        
        # Assert
        assert tenant_id is not None
        
        # Verify in database
        async with db_pool.acquire() as conn:
            row = await conn.fetchrow(
                "SELECT * FROM tenants WHERE id = $1",
                tenant_id
            )
        
        assert row is not None
        assert row['name'] == "Test Company"
        assert row['slug'] == "test-company"
        assert row['plan'] == "professional"
    
    @pytest.mark.asyncio
    async def test_get_tenant_by_slug(self, db_pool):
        """Test retrieving tenant by slug."""
        # Arrange
        from repositories.tenant_repository import TenantRepository
        
        repo = TenantRepository(db_pool)
        
        # Create tenant first
        tenant_id = await repo.create(
            name="Slug Test",
            slug="slug-test",
        )
        
        # Act
        tenant = await repo.get_by_slug("slug-test")
        
        # Assert
        assert tenant is not None
        assert str(tenant['id']) == str(tenant_id)
        assert tenant['name'] == "Slug Test"
    
    @pytest.mark.asyncio
    async def test_update_tenant(self, db_pool):
        """Test updating tenant."""
        # Arrange
        from repositories.tenant_repository import TenantRepository
        
        repo = TenantRepository(db_pool)
        
        tenant_id = await repo.create(
            name="Original Name",
            slug="original-slug",
        )
        
        # Act
        await repo.update(
            tenant_id,
            name="Updated Name",
            plan="enterprise",
        )
        
        # Assert
        tenant = await repo.get_by_id(tenant_id)
        assert tenant['name'] == "Updated Name"
        assert tenant['plan'] == "enterprise"
    
    @pytest.mark.asyncio
    async def test_delete_tenant_soft_delete(self, db_pool):
        """Test soft delete sets is_active to false."""
        # Arrange
        from repositories.tenant_repository import TenantRepository
        
        repo = TenantRepository(db_pool)
        
        tenant_id = await repo.create(
            name="To Delete",
            slug="to-delete",
        )
        
        # Act
        await repo.delete(tenant_id)
        
        # Assert
        async with db_pool.acquire() as conn:
            row = await conn.fetchrow(
                "SELECT is_active FROM tenants WHERE id = $1",
                tenant_id
            )
        
        assert row['is_active'] is False


class TestVectorRepository:
    """Integration tests for vector similarity search."""
    
    @pytest.mark.asyncio
    async def test_insert_chunk_with_embedding(self, db_pool):
        """Test inserting chunk with vector embedding."""
        # Arrange
        from repositories.vector_repository import VectorRepository
        import numpy as np
        
        repo = VectorRepository(db_pool)
        
        # Create fake embedding
        embedding = np.random.randn(1536).tolist()
        
        # Act
        chunk_id = await repo.insert_chunk({
            "document_id": "doc-123",
            "tenant_id": "tenant-456",
            "content": "Test content for embedding",
            "content_hash": "abc123",
            "chunk_index": 0,
            "start_char": 0,
            "end_char": 100,
            "embedding": embedding,
            "token_count": 10,
        })
        
        # Assert
        assert chunk_id is not None
    
    @pytest.mark.asyncio
    async def test_similarity_search(self, db_pool):
        """Test vector similarity search returns relevant results."""
        # Arrange
        from repositories.vector_repository import VectorRepository, SearchParams
        import numpy as np
        
        repo = VectorRepository(db_pool)
        tenant_id = "search-test-tenant"
        
        # Insert test chunks with known embeddings
        # Chunk 1: Similar to query
        similar_embedding = [1.0] + [0.0] * 1535
        await repo.insert_chunk({
            "document_id": "doc-1",
            "tenant_id": tenant_id,
            "content": "This is similar content",
            "content_hash": "hash1",
            "chunk_index": 0,
            "start_char": 0,
            "end_char": 50,
            "embedding": similar_embedding,
            "token_count": 5,
        })
        
        # Chunk 2: Different from query
        different_embedding = [0.0] * 1535 + [1.0]
        await repo.insert_chunk({
            "document_id": "doc-2",
            "tenant_id": tenant_id,
            "content": "This is different content",
            "content_hash": "hash2",
            "chunk_index": 0,
            "start_char": 0,
            "end_char": 50,
            "embedding": different_embedding,
            "token_count": 5,
        })
        
        # Query embedding similar to chunk 1
        query_embedding = [0.9] + [0.1] * 1535
        
        # Act
        results = await repo.search(
            query_embedding=query_embedding,
            params=SearchParams(
                tenant_id=tenant_id,
                top_k=2,
                min_similarity=0.0,
            ),
        )
        
        # Assert
        assert len(results) == 2
        assert results[0].content == "This is similar content"
        assert results[0].similarity > results[1].similarity

55.3 Redis Integration Tests

"""
Redis integration tests.

File: services/agent-service/tests/integration/test_redis.py
"""
import pytest
import json
from datetime import datetime
from testcontainers.redis import RedisContainer


@pytest.fixture(scope="module")
def redis_container():
    """Start Redis container for tests."""
    with RedisContainer() as redis:
        yield redis


@pytest.fixture
async def redis_client(redis_container):
    """Create Redis client."""
    import redis.asyncio as redis
    
    host = redis_container.get_container_host_ip()
    port = redis_container.get_exposed_port(6379)
    
    client = await redis.from_url(
        f"redis://{host}:{port}",
        encoding="utf-8",
        decode_responses=True,
    )
    
    yield client
    
    # Cleanup
    await client.flushdb()
    await client.close()


class TestCallStateManager:
    """Integration tests for call state management."""
    
    @pytest.mark.asyncio
    async def test_create_and_retrieve_call_state(self, redis_client):
        """Test creating and retrieving call state."""
        # Arrange
        from state.manager import CallStateManager
        from state.models import CallContext, CallState, CallDirection, PipelineState
        
        # Inject Redis client
        manager = CallStateManager.__new__(CallStateManager)
        manager._redis = redis_client
        
        context = CallContext(
            call_id="test-call-001",
            tenant_id="tenant-001",
            agency_id="agency-001",
            direction=CallDirection.INBOUND,
            caller_phone="+15551234567",
        )
        
        # Act
        await manager.create_call(context)
        
        # Assert
        state = await manager.get_state("test-call-001")
        assert state is not None
        assert state.call_id == "test-call-001"
        assert state.pipeline_state == PipelineState.IDLE
    
    @pytest.mark.asyncio
    async def test_pipeline_state_transitions(self, redis_client):
        """Test pipeline state transitions are persisted."""
        # Arrange
        from state.manager import CallStateManager
        from state.models import CallContext, PipelineState, CallDirection
        
        manager = CallStateManager.__new__(CallStateManager)
        manager._redis = redis_client
        
        context = CallContext(
            call_id="test-call-002",
            tenant_id="tenant-001",
            agency_id="agency-001",
            direction=CallDirection.INBOUND,
            caller_phone="+15551234567",
        )
        await manager.create_call(context)
        
        # Act
        await manager.transition_pipeline("test-call-002", PipelineState.LISTENING)
        await manager.transition_pipeline("test-call-002", PipelineState.CAPTURING)
        await manager.transition_pipeline("test-call-002", PipelineState.PROCESSING)
        
        # Assert
        state = await manager.get_state("test-call-002")
        assert state.pipeline_state == PipelineState.PROCESSING
    
    @pytest.mark.asyncio
    async def test_conversation_history(self, redis_client):
        """Test conversation history storage and retrieval."""
        # Arrange
        from state.manager import CallStateManager
        from state.models import CallContext, ConversationTurn, CallDirection
        
        manager = CallStateManager.__new__(CallStateManager)
        manager._redis = redis_client
        
        context = CallContext(
            call_id="test-call-003",
            tenant_id="tenant-001",
            agency_id="agency-001",
            direction=CallDirection.INBOUND,
            caller_phone="+15551234567",
        )
        await manager.create_call(context)
        
        # Act
        await manager.add_turn(
            "test-call-003",
            ConversationTurn(role="user", content="Hello")
        )
        await manager.add_turn(
            "test-call-003",
            ConversationTurn(role="assistant", content="Hi there!")
        )
        await manager.add_turn(
            "test-call-003",
            ConversationTurn(role="user", content="What are your hours?")
        )
        
        # Assert
        history = await manager.get_history("test-call-003")
        assert len(history) == 3
        assert history[0].role == "user"
        assert history[0].content == "Hello"
        assert history[2].content == "What are your hours?"
    
    @pytest.mark.asyncio
    async def test_end_call_sets_ttl(self, redis_client):
        """Test ending call sets TTL on keys."""
        # Arrange
        from state.manager import CallStateManager
        from state.models import CallContext, CallDirection
        
        manager = CallStateManager.__new__(CallStateManager)
        manager._redis = redis_client
        manager.CALL_DATA_TTL = 60  # Short TTL for test
        
        context = CallContext(
            call_id="test-call-004",
            tenant_id="tenant-001",
            agency_id="agency-001",
            direction=CallDirection.INBOUND,
            caller_phone="+15551234567",
        )
        await manager.create_call(context)
        
        # Act
        await manager.end_call("test-call-004")
        
        # Assert
        ttl = await redis_client.ttl("call:test-call-004:state")
        assert ttl > 0
        assert ttl <= 60

55.4 API Integration Tests

"""
API endpoint integration tests.

File: services/api-gateway/tests/integration/test_api.py
"""
import pytest
from httpx import AsyncClient, ASGITransport
from unittest.mock import patch, AsyncMock


@pytest.fixture
async def app():
    """Create test application."""
    from main import create_app
    
    app = create_app()
    yield app


@pytest.fixture
async def client(app):
    """Create test HTTP client."""
    transport = ASGITransport(app=app)
    async with AsyncClient(transport=transport, base_url="http://test") as client:
        yield client


@pytest.fixture
def auth_headers():
    """Create authenticated headers."""
    return {"Authorization": "Bearer test-token"}


class TestHealthEndpoints:
    """Tests for health check endpoints."""
    
    @pytest.mark.asyncio
    async def test_health_check(self, client):
        """Test basic health check."""
        response = await client.get("/health")
        
        assert response.status_code == 200
        data = response.json()
        assert data["status"] == "healthy"
    
    @pytest.mark.asyncio
    async def test_ready_check(self, client):
        """Test readiness check."""
        response = await client.get("/ready")
        
        assert response.status_code == 200
        data = response.json()
        assert "database" in data
        assert "redis" in data


class TestTenantEndpoints:
    """Tests for tenant management endpoints."""
    
    @pytest.mark.asyncio
    async def test_create_tenant(self, client, auth_headers):
        """Test creating a tenant."""
        # Arrange
        payload = {
            "name": "Test Company",
            "slug": "test-company",
            "plan": "professional",
        }
        
        # Act
        response = await client.post(
            "/api/v1/tenants",
            json=payload,
            headers=auth_headers,
        )
        
        # Assert
        assert response.status_code == 201
        data = response.json()
        assert data["name"] == "Test Company"
        assert "id" in data
    
    @pytest.mark.asyncio
    async def test_create_tenant_duplicate_slug(self, client, auth_headers):
        """Test creating tenant with duplicate slug fails."""
        # Arrange
        payload = {"name": "First", "slug": "duplicate"}
        await client.post("/api/v1/tenants", json=payload, headers=auth_headers)
        
        # Act
        response = await client.post(
            "/api/v1/tenants",
            json={"name": "Second", "slug": "duplicate"},
            headers=auth_headers,
        )
        
        # Assert
        assert response.status_code == 409
        assert "already exists" in response.json()["detail"].lower()
    
    @pytest.mark.asyncio
    async def test_get_tenant(self, client, auth_headers):
        """Test retrieving a tenant."""
        # Arrange
        create_response = await client.post(
            "/api/v1/tenants",
            json={"name": "Get Test", "slug": "get-test"},
            headers=auth_headers,
        )
        tenant_id = create_response.json()["id"]
        
        # Act
        response = await client.get(
            f"/api/v1/tenants/{tenant_id}",
            headers=auth_headers,
        )
        
        # Assert
        assert response.status_code == 200
        assert response.json()["name"] == "Get Test"
    
    @pytest.mark.asyncio
    async def test_get_tenant_not_found(self, client, auth_headers):
        """Test getting non-existent tenant returns 404."""
        response = await client.get(
            "/api/v1/tenants/00000000-0000-0000-0000-000000000000",
            headers=auth_headers,
        )
        
        assert response.status_code == 404


class TestCallEndpoints:
    """Tests for call management endpoints."""
    
    @pytest.mark.asyncio
    async def test_initiate_outbound_call(self, client, auth_headers):
        """Test initiating outbound call."""
        # Arrange
        payload = {
            "tenant_id": "tenant-123",
            "agency_id": "agency-456",
            "to_number": "+15551234567",
            "from_number": "+15559876543",
        }
        
        # Mock GoTo Connect
        with patch('services.goto_service.initiate_call', new_callable=AsyncMock) as mock:
            mock.return_value = {"call_id": "goto-call-789"}
            
            # Act
            response = await client.post(
                "/api/v1/calls/outbound",
                json=payload,
                headers=auth_headers,
            )
        
        # Assert
        assert response.status_code == 201
        data = response.json()
        assert "call_id" in data
        assert data["direction"] == "outbound"
    
    @pytest.mark.asyncio
    async def test_get_call_status(self, client, auth_headers):
        """Test getting call status."""
        # Arrange
        call_id = "test-call-123"
        
        with patch('state.manager.CallStateManager.get_state', new_callable=AsyncMock) as mock:
            from state.models import CallState, PipelineState
            mock.return_value = CallState(
                call_id=call_id,
                pipeline_state=PipelineState.SPEAKING,
            )
            
            # Act
            response = await client.get(
                f"/api/v1/calls/{call_id}/status",
                headers=auth_headers,
            )
        
        # Assert
        assert response.status_code == 200
        data = response.json()
        assert data["pipeline_state"] == "speaking"


class TestKnowledgeBaseEndpoints:
    """Tests for knowledge base endpoints."""
    
    @pytest.mark.asyncio
    async def test_upload_document(self, client, auth_headers):
        """Test uploading document to knowledge base."""
        # Arrange
        files = {
            "file": ("test.txt", b"Test document content", "text/plain")
        }
        data = {
            "knowledge_base_id": "kb-123",
        }
        
        # Act
        response = await client.post(
            "/api/v1/knowledge-bases/kb-123/documents",
            files=files,
            data=data,
            headers=auth_headers,
        )
        
        # Assert
        assert response.status_code == 202  # Accepted for processing
        data = response.json()
        assert "document_id" in data
        assert data["status"] == "pending"
    
    @pytest.mark.asyncio
    async def test_search_knowledge_base(self, client, auth_headers):
        """Test searching knowledge base."""
        # Arrange
        payload = {
            "query": "business hours",
            "top_k": 5,
        }
        
        with patch('rag.service.RAGService.retrieve', new_callable=AsyncMock) as mock:
            from rag.service import RAGResult
            from rag.reranker import RankedChunk
            
            mock.return_value = RAGResult(
                chunks=[
                    RankedChunk(
                        chunk_id="chunk-1",
                        content="Hours are 9-5",
                        relevance_score=0.9,
                        original_rank=0,
                    ),
                ],
                context_text="Hours are 9-5",
                total_tokens=5,
            )
            
            # Act
            response = await client.post(
                "/api/v1/knowledge-bases/kb-123/search",
                json=payload,
                headers=auth_headers,
            )
        
        # Assert
        assert response.status_code == 200
        data = response.json()
        assert len(data["results"]) == 1
        assert data["results"][0]["relevance_score"] == 0.9

Summary: What You’ve Learned in Part 9A

Section 54: Unit Testing Fundamentals

  • Testing pyramid: unit → integration → E2E
  • pytest with async support (pytest-asyncio)
  • AAA pattern: Arrange, Act, Assert
  • Fixtures for test setup and teardown
  • Factory Boy for test data generation
  • Mocking external services

Section 55: Integration Testing

  • Testcontainers for real databases
  • PostgreSQL with pgvector testing
  • Redis integration testing
  • API endpoint testing with httpx
  • Test isolation and cleanup

What’s Next

In Part 9B, you’ll learn:
  • End-to-end testing strategies
  • Voice-specific testing
  • Audio simulation
  • Pipeline testing

Document Metadata

FieldValue
Document IDPRD-009A
TitleJunior Developer PRD — Part 9A
Version1.0
StatusComplete

End of Part 9A — Continue to Part 9B

Junior Developer PRD — Part 9B: End-to-End Testing & Voice Testing

Document Version: 1.0
Last Updated: January 25, 2026
Part: 9B of 10 (Sub-part 2 of 3)
Sections: 56-57

Table of Contents


Section 56: End-to-End Testing

56.1 E2E Test Environment

# docker-compose.e2e.yaml
version: '3.8'
services:
  postgres:
    image: pgvector/pgvector:pg15
    environment:
      POSTGRES_USER: test
      POSTGRES_PASSWORD: test
      POSTGRES_DB: voiceai_test
    ports: ["5432:5432"]
  
  redis:
    image: redis:7-alpine
    ports: ["6379:6379"]
  
  api-gateway:
    build: ./services/api-gateway
    environment:
      DATABASE_URL: postgres://test:test@postgres:5432/voiceai_test
      REDIS_URL: redis://redis:6379
    depends_on: [postgres, redis]
    ports: ["8000:8000"]

56.2 E2E Test Client

# tests/e2e/framework.py
import httpx
import asyncio
from dataclasses import dataclass

@dataclass
class E2EConfig:
    api_base_url: str = "http://localhost:8000"
    timeout: float = 30.0

class E2ETestClient:
    def __init__(self, config: E2EConfig = None):
        self.config = config or E2EConfig()
        self._http = None
    
    async def __aenter__(self):
        self._http = httpx.AsyncClient(
            base_url=self.config.api_base_url,
            timeout=self.config.timeout,
        )
        return self
    
    async def __aexit__(self, *args):
        await self._http.aclose()
    
    async def create_tenant(self, name: str) -> dict:
        response = await self._http.post("/api/v1/tenants", json={
            "name": name,
            "slug": name.lower().replace(" ", "-"),
        })
        return response.json()
    
    async def upload_document(self, kb_id: str, filename: str, content: bytes) -> dict:
        files = {"file": (filename, content, "text/plain")}
        response = await self._http.post(f"/api/v1/knowledge-bases/{kb_id}/documents", files=files)
        return response.json()
    
    async def wait_for_document_ready(self, doc_id: str, timeout: float = 60.0) -> dict:
        import time
        start = time.time()
        while time.time() - start < timeout:
            response = await self._http.get(f"/api/v1/documents/{doc_id}")
            if response.json()["status"] == "ready":
                return response.json()
            await asyncio.sleep(0.5)
        raise TimeoutError()
    
    async def simulate_inbound_call(self, agency_id: str) -> dict:
        response = await self._http.post("/webhooks/goto/call", json={
            "event": "call.incoming",
            "agency_id": agency_id,
            "caller_phone": "+15551234567",
        })
        return response.json()

56.3 E2E Test Scenarios

# tests/e2e/test_scenarios.py
import pytest

class TestTenantOnboarding:
    @pytest.mark.e2e
    @pytest.mark.asyncio
    async def test_complete_tenant_setup(self, e2e_client):
        # 1. Create tenant
        tenant = await e2e_client.create_tenant("Test Corp")
        assert tenant["id"] is not None
        
        # 2. Create knowledge base
        kb = await e2e_client.create_knowledge_base(tenant["id"])
        
        # 3. Upload document
        doc = await e2e_client.upload_document(
            kb["id"], "info.txt",
            b"Business hours: Monday-Friday 9 AM to 5 PM"
        )
        
        # 4. Wait for processing
        doc = await e2e_client.wait_for_document_ready(doc["document_id"])
        assert doc["status"] == "ready"
        
        # 5. Search knowledge base
        results = await e2e_client.search_knowledge_base(kb["id"], "hours")
        assert len(results["results"]) > 0

class TestInboundCallFlow:
    @pytest.mark.e2e
    @pytest.mark.asyncio
    async def test_simple_call(self, e2e_client, test_tenant):
        # Setup agency
        agency = await e2e_client.create_agency(test_tenant["id"])
        
        # Simulate call
        call = await e2e_client.simulate_inbound_call(agency["id"])
        
        # Wait for greeting
        await e2e_client.wait_for_call_state(call["call_id"], "listening")
        
        # Send utterance
        await e2e_client.send_utterance(call["call_id"], "What are your hours?")
        
        # Verify response
        history = await e2e_client.get_conversation_history(call["call_id"])
        assert len(history["turns"]) >= 2

Section 57: Voice-Specific Testing

57.1 Audio Test Utilities

# tests/voice/audio_utils.py
import numpy as np
import wave
import io

class AudioSegment:
    def __init__(self, samples: np.ndarray, sample_rate: int = 16000):
        self.samples = samples
        self.sample_rate = sample_rate
    
    @property
    def duration_ms(self) -> float:
        return len(self.samples) / self.sample_rate * 1000
    
    def to_bytes(self) -> bytes:
        return self.samples.astype(np.int16).tobytes()
    
    def split_frames(self, frame_duration_ms: int = 30) -> list:
        frame_samples = int(self.sample_rate * frame_duration_ms / 1000)
        return [
            AudioSegment(self.samples[i:i + frame_samples], self.sample_rate)
            for i in range(0, len(self.samples), frame_samples)
            if len(self.samples[i:i + frame_samples]) == frame_samples
        ]
    
    @classmethod
    def silence(cls, duration_ms: int, sample_rate: int = 16000):
        samples = np.zeros(int(sample_rate * duration_ms / 1000), dtype=np.int16)
        return cls(samples, sample_rate)
    
    @classmethod
    def speech_like(cls, duration_ms: int, sample_rate: int = 16000):
        """Generate speech-like audio for VAD testing."""
        t = np.linspace(0, duration_ms / 1000, int(sample_rate * duration_ms / 1000))
        signal = 0.3 * np.sin(2 * np.pi * 150 * t) + 0.2 * np.sin(2 * np.pi * 300 * t)
        envelope = np.sin(np.pi * t / (duration_ms / 1000)) ** 0.5
        samples = (signal * envelope * 32767 * 0.5).astype(np.int16)
        return cls(samples, sample_rate)
    
    def concatenate(self, other):
        return AudioSegment(np.concatenate([self.samples, other.samples]), self.sample_rate)

def create_utterance_audio(speech_ms=1500, silence_before=100, silence_after=500):
    return (AudioSegment.silence(silence_before)
            .concatenate(AudioSegment.speech_like(speech_ms))
            .concatenate(AudioSegment.silence(silence_after)))

57.2 VAD Testing

# tests/voice/test_vad.py
import pytest
from voice.audio_utils import AudioSegment, create_utterance_audio

class TestVADDetection:
    @pytest.fixture
    def vad(self, vad_config):
        from pipeline.vad import SileroVAD
        vad = SileroVAD(vad_config)
        vad.load_model()
        return vad
    
    @pytest.mark.voice
    def test_detects_speech_start(self, vad):
        audio = AudioSegment.silence(200).concatenate(AudioSegment.speech_like(1000))
        frames = audio.split_frames(30)
        
        speech_detected = False
        for frame in frames:
            event = vad.process_frame(frame.samples)
            if event and event.event_type == "speech_start":
                speech_detected = True
                break
        
        assert speech_detected
    
    @pytest.mark.voice
    def test_detects_speech_end(self, vad):
        audio = create_utterance_audio(speech_ms=1000, silence_after=500)
        frames = audio.split_frames(30)
        
        events = []
        for frame in frames:
            event = vad.process_frame(frame.samples)
            if event:
                events.append(event)
        
        event_types = [e.event_type for e in events]
        assert "speech_start" in event_types
        assert "speech_end" in event_types
    
    @pytest.mark.voice
    def test_ignores_short_sounds(self, vad):
        # 50ms speech (shorter than min_speech_duration)
        audio = AudioSegment.silence(500).concatenate(
            AudioSegment.speech_like(50)
        ).concatenate(AudioSegment.silence(500))
        
        events = [vad.process_frame(f.samples) for f in audio.split_frames(30)]
        events = [e for e in events if e]
        
        assert len(events) == 0  # Should not trigger

class TestVADPerformance:
    @pytest.mark.voice
    def test_processing_under_10ms(self, vad, benchmark):
        frame = AudioSegment.speech_like(30)
        result = benchmark(vad.process_frame, frame.samples)
        assert result.stats.mean < 0.010

57.3 Pipeline Testing

# tests/voice/test_pipeline.py
import pytest
from unittest.mock import AsyncMock, MagicMock

class TestVoicePipeline:
    @pytest.mark.voice
    @pytest.mark.asyncio
    async def test_complete_turn_cycle(self, mock_stt, mock_llm, mock_tts):
        from pipeline.orchestrator import VoicePipelineOrchestrator
        
        mock_stt.transcribe = AsyncMock(return_value="What are your hours?")
        
        async def mock_stream(*args):
            yield MagicMock(text="We are open 9 to 5.", is_complete=True)
        mock_llm.generate_streaming = mock_stream
        
        pipeline = VoicePipelineOrchestrator(
            stt_client=mock_stt,
            llm_client=mock_llm,
            tts_client=mock_tts,
        )
        
        audio = create_utterance_audio()
        responses = []
        async for chunk in pipeline.process_audio_stream(audio.split_frames(30)):
            if chunk:
                responses.append(chunk)
        
        mock_stt.transcribe.assert_called()
        mock_tts.synthesize.assert_called()
        assert len(responses) > 0
    
    @pytest.mark.voice
    @pytest.mark.asyncio
    async def test_barge_in_stops_tts(self):
        # Simulate interruption during TTS playback
        tts_stopped = False
        
        async def mock_stop():
            nonlocal tts_stopped
            tts_stopped = True
        
        pipeline = VoicePipelineOrchestrator(...)
        pipeline.stop_tts = mock_stop
        pipeline._state = "speaking"
        
        # Send interruption audio
        interrupt_audio = AudioSegment.speech_like(500)
        for frame in interrupt_audio.split_frames(30):
            await pipeline.process_frame_during_playback(frame.samples)
            if tts_stopped:
                break
        
        assert tts_stopped
    
    @pytest.mark.voice
    @pytest.mark.asyncio
    async def test_latency_under_1_second(self, mock_stt, mock_llm, mock_tts):
        import time
        
        mock_stt.transcribe = AsyncMock(return_value="Hello")
        async def fast_llm(*args):
            yield MagicMock(text="Hi!", is_complete=True)
        mock_llm.generate_streaming = fast_llm
        
        pipeline = VoicePipelineOrchestrator(...)
        audio = create_utterance_audio(speech_ms=500, silence_after=300)
        
        start = time.perf_counter()
        first_audio = None
        
        async for chunk in pipeline.process_audio_stream(audio.split_frames(30)):
            if chunk and not first_audio:
                first_audio = time.perf_counter()
        
        latency = first_audio - start
        assert latency < 1.0, f"Latency {latency}s exceeds budget"

57.4 Audio Quality Testing

# tests/voice/test_audio_quality.py
import pytest
import numpy as np

class TestAudioResampling:
    @pytest.mark.voice
    def test_48k_to_16k(self):
        from pipeline.audio_utils import resample_for_stt
        
        audio_48k = np.zeros(48000, dtype=np.int16)  # 1 second
        audio_16k = resample_for_stt(audio_48k, 48000, 16000)
        
        assert len(audio_16k) == 16000
    
    @pytest.mark.voice
    def test_stereo_to_mono(self):
        from pipeline.audio_utils import resample_for_stt
        
        stereo = np.zeros((16000, 2), dtype=np.int16)
        mono = resample_for_stt(stereo, 16000, 16000)
        
        assert len(mono.shape) == 1

Summary

Section 56: End-to-End Testing

  • Docker Compose environment for full system testing
  • E2E client with retry/wait logic
  • Critical user journey tests

Section 57: Voice-Specific Testing

  • Audio utilities for test data generation
  • VAD accuracy and timing tests
  • Pipeline integration tests
  • Barge-in and latency verification

What’s Next

Part 9C: Performance Testing & CI/CD covers:
  • Load testing with Locust
  • GitHub Actions pipelines
  • Deployment strategies

End of Part 9B

Junior Developer PRD — Part 9C: Performance Testing & CI/CD

Document Version: 1.0
Last Updated: January 25, 2026
Part: 9C of 10 (Sub-part 3 of 3)
Sections: 58-59
Audience: Junior developers with no prior context
Estimated Reading Time: 25 minutes

Table of Contents


Section 58: Performance Testing

58.1 Why Performance Testing?

Voice AI has strict performance requirements:
MetricTargetImpact if Exceeded
Latency (P50)&lt; 1000msUnnatural conversation
Latency (P95)&lt; 1500msUser frustration
Latency (P99)&lt; 2000msCall abandonment
Concurrent calls100+ per instanceService degradation
Error rate&lt; 0.1%Trust loss

58.2 Load Testing with Locust

"""
Load testing with Locust.

File: tests/performance/locustfile.py
"""
from locust import HttpUser, task, between, events
import json
import time
import random


class VoiceAIUser(HttpUser):
    """Simulates a user making API calls."""
    
    wait_time = between(1, 3)
    
    def on_start(self):
        """Setup: Create tenant and get auth token."""
        # Login
        response = self.client.post("/api/v1/auth/token", json={
            "username": "loadtest@example.com",
            "password": "loadtest123",
        })
        self.token = response.json()["access_token"]
        self.headers = {"Authorization": f"Bearer {self.token}"}
        
        # Get or create test tenant
        self.tenant_id = "loadtest-tenant"
        self.kb_id = "loadtest-kb"
    
    @task(10)
    def search_knowledge_base(self):
        """Most common operation: KB search."""
        queries = [
            "What are your business hours?",
            "How much does it cost?",
            "Where are you located?",
            "Do you accept insurance?",
            "How do I make an appointment?",
        ]
        
        self.client.post(
            f"/api/v1/knowledge-bases/{self.kb_id}/search",
            json={"query": random.choice(queries), "top_k": 5},
            headers=self.headers,
            name="/kb/search",
        )
    
    @task(5)
    def get_call_status(self):
        """Check call status."""
        call_id = f"call-{random.randint(1, 1000)}"
        self.client.get(
            f"/api/v1/calls/{call_id}/status",
            headers=self.headers,
            name="/calls/status",
        )
    
    @task(2)
    def get_conversation_history(self):
        """Retrieve conversation history."""
        call_id = f"call-{random.randint(1, 1000)}"
        self.client.get(
            f"/api/v1/calls/{call_id}/history",
            headers=self.headers,
            name="/calls/history",
        )
    
    @task(1)
    def simulate_webhook(self):
        """Simulate incoming webhook."""
        self.client.post(
            "/webhooks/goto/call",
            json={
                "event": "call.status",
                "call_id": f"goto-{time.time_ns()}",
                "status": "ringing",
            },
            name="/webhooks/call",
        )


class VoicePipelineUser(HttpUser):
    """Simulates voice pipeline operations."""
    
    wait_time = between(0.1, 0.5)
    
    @task
    def process_audio_chunk(self):
        """Simulate sending audio chunk."""
        # This would connect to WebSocket in real test
        self.client.post(
            "/api/v1/pipeline/audio",
            data=b'\x00' * 960,  # 30ms of audio
            headers={"Content-Type": "application/octet-stream"},
            name="/pipeline/audio",
        )


# Custom metrics tracking
@events.request.add_listener
def track_latency(request_type, name, response_time, **kwargs):
    """Track latency percentiles."""
    # This data goes to Locust's built-in stats
    pass


@events.test_stop.add_listener
def generate_report(environment, **kwargs):
    """Generate performance report."""
    stats = environment.stats
    
    print("\n" + "=" * 60)
    print("PERFORMANCE TEST RESULTS")
    print("=" * 60)
    
    for name, entry in stats.entries.items():
        print(f"\n{name}:")
        print(f"  Requests: {entry.num_requests}")
        print(f"  Failures: {entry.num_failures}")
        print(f"  Median: {entry.median_response_time}ms")
        print(f"  P95: {entry.get_response_time_percentile(0.95)}ms")
        print(f"  P99: {entry.get_response_time_percentile(0.99)}ms")

58.3 Running Load Tests

# Start Locust web UI
locust -f tests/performance/locustfile.py --host=http://localhost:8000

# Run headless with specific parameters
locust -f tests/performance/locustfile.py \
    --host=http://localhost:8000 \
    --users=100 \
    --spawn-rate=10 \
    --run-time=5m \
    --headless \
    --csv=results/loadtest

# Distributed load testing
# Master
locust -f locustfile.py --master

# Workers (run on multiple machines)
locust -f locustfile.py --worker --master-host=<master-ip>

58.4 Latency Testing

"""
Latency measurement tests.

File: tests/performance/test_latency.py
"""
import pytest
import asyncio
import time
import statistics
from typing import List


class LatencyMeasurement:
    """Measure and analyze latency."""
    
    def __init__(self):
        self.measurements: List[float] = []
    
    def record(self, latency_ms: float):
        self.measurements.append(latency_ms)
    
    @property
    def p50(self) -> float:
        return statistics.median(self.measurements)
    
    @property
    def p95(self) -> float:
        return statistics.quantiles(self.measurements, n=20)[18]
    
    @property
    def p99(self) -> float:
        return statistics.quantiles(self.measurements, n=100)[98]
    
    @property
    def mean(self) -> float:
        return statistics.mean(self.measurements)


class TestAPILatency:
    """Test API endpoint latency."""
    
    @pytest.mark.performance
    @pytest.mark.asyncio
    async def test_kb_search_latency(self, e2e_client):
        """Knowledge base search should respond quickly."""
        latency = LatencyMeasurement()
        
        for _ in range(100):
            start = time.perf_counter()
            await e2e_client.search_knowledge_base("kb-id", "test query")
            latency.record((time.perf_counter() - start) * 1000)
        
        print(f"\nKB Search Latency:")
        print(f"  P50: {latency.p50:.1f}ms")
        print(f"  P95: {latency.p95:.1f}ms")
        print(f"  P99: {latency.p99:.1f}ms")
        
        assert latency.p50 < 100, f"P50 {latency.p50}ms exceeds 100ms"
        assert latency.p95 < 200, f"P95 {latency.p95}ms exceeds 200ms"
    
    @pytest.mark.performance
    @pytest.mark.asyncio
    async def test_concurrent_requests(self, e2e_client):
        """Test handling concurrent requests."""
        async def make_request():
            start = time.perf_counter()
            await e2e_client.search_knowledge_base("kb-id", "test")
            return (time.perf_counter() - start) * 1000
        
        # 50 concurrent requests
        tasks = [make_request() for _ in range(50)]
        latencies = await asyncio.gather(*tasks)
        
        p95 = statistics.quantiles(latencies, n=20)[18]
        assert p95 < 500, f"Concurrent P95 {p95}ms exceeds 500ms"


class TestPipelineLatency:
    """Test voice pipeline latency."""
    
    @pytest.mark.performance
    @pytest.mark.asyncio
    async def test_stt_latency(self, mock_deepgram):
        """STT should respond within budget."""
        from integrations.deepgram_stt import DeepgramSTTClient
        
        latency = LatencyMeasurement()
        audio = b'\x00' * 16000  # 1 second
        
        for _ in range(20):
            start = time.perf_counter()
            await mock_deepgram.transcribe(audio)
            latency.record((time.perf_counter() - start) * 1000)
        
        assert latency.p95 < 300, f"STT P95 {latency.p95}ms exceeds 300ms"
    
    @pytest.mark.performance
    @pytest.mark.asyncio
    async def test_llm_ttfb(self, mock_anthropic):
        """LLM time-to-first-token should be fast."""
        from integrations.claude_llm import ClaudeLLM
        
        latency = LatencyMeasurement()
        
        for _ in range(10):
            start = time.perf_counter()
            async for chunk in mock_anthropic.generate_streaming("Test", []):
                latency.record((time.perf_counter() - start) * 1000)
                break  # Only measure first token
        
        assert latency.p95 < 500, f"LLM TTFB P95 {latency.p95}ms exceeds 500ms"
    
    @pytest.mark.performance
    @pytest.mark.asyncio
    async def test_tts_latency(self, mock_chatterbox):
        """TTS should synthesize quickly."""
        latency = LatencyMeasurement()
        
        for _ in range(20):
            start = time.perf_counter()
            await mock_chatterbox.synthesize("Hello, how can I help you?")
            latency.record((time.perf_counter() - start) * 1000)
        
        assert latency.p95 < 200, f"TTS P95 {latency.p95}ms exceeds 200ms"

58.5 Database Performance Testing

"""
Database performance tests.

File: tests/performance/test_database.py
"""
import pytest
import asyncio
import time
import numpy as np


class TestVectorSearchPerformance:
    """Test pgvector search performance."""
    
    @pytest.mark.performance
    @pytest.mark.asyncio
    async def test_vector_search_scaling(self, db_pool):
        """Test search performance as data grows."""
        from repositories.vector_repository import VectorRepository, SearchParams
        
        repo = VectorRepository(db_pool)
        
        # Insert chunks in batches
        batch_sizes = [100, 1000, 10000]
        results = {}
        
        for size in batch_sizes:
            # Insert test data
            for i in range(size):
                embedding = np.random.randn(1536).tolist()
                await repo.insert_chunk({
                    "document_id": f"doc-{i}",
                    "tenant_id": "perf-test",
                    "content": f"Test content {i}",
                    "content_hash": f"hash-{i}",
                    "chunk_index": 0,
                    "start_char": 0,
                    "end_char": 100,
                    "embedding": embedding,
                    "token_count": 10,
                })
            
            # Measure search time
            query_embedding = np.random.randn(1536).tolist()
            latencies = []
            
            for _ in range(50):
                start = time.perf_counter()
                await repo.search(
                    query_embedding=query_embedding,
                    params=SearchParams(tenant_id="perf-test", top_k=5),
                )
                latencies.append((time.perf_counter() - start) * 1000)
            
            results[size] = {
                "p50": np.median(latencies),
                "p95": np.percentile(latencies, 95),
            }
        
        print("\nVector Search Scaling:")
        for size, metrics in results.items():
            print(f"  {size} chunks: P50={metrics['p50']:.1f}ms, P95={metrics['p95']:.1f}ms")
        
        # Search should be < 50ms even with 10k chunks
        assert results[10000]["p95"] < 50
    
    @pytest.mark.performance
    @pytest.mark.asyncio
    async def test_redis_state_operations(self, redis_client):
        """Test Redis state operation performance."""
        latencies = []
        
        for i in range(1000):
            start = time.perf_counter()
            
            # Simulate call state operations
            await redis_client.hset(f"call:{i}:state", mapping={
                "pipeline_state": "listening",
                "turn_count": 0,
            })
            await redis_client.hget(f"call:{i}:state", "pipeline_state")
            
            latencies.append((time.perf_counter() - start) * 1000)
        
        p95 = np.percentile(latencies, 95)
        assert p95 < 5, f"Redis P95 {p95}ms exceeds 5ms"

58.6 Stress Testing

"""
Stress testing for system limits.

File: tests/performance/test_stress.py
"""
import pytest
import asyncio


class TestStressLimits:
    """Test system under stress."""
    
    @pytest.mark.stress
    @pytest.mark.asyncio
    async def test_max_concurrent_calls(self, e2e_client):
        """Find maximum concurrent call capacity."""
        max_calls = 200
        successful = 0
        failed = 0
        
        async def simulate_call(call_num):
            try:
                call = await e2e_client.simulate_inbound_call(f"agency-{call_num % 10}")
                await e2e_client.wait_for_call_state(call["call_id"], "listening", timeout=10)
                return True
            except Exception:
                return False
        
        # Ramp up calls
        for batch in range(0, max_calls, 20):
            tasks = [simulate_call(i) for i in range(batch, batch + 20)]
            results = await asyncio.gather(*tasks)
            successful += sum(results)
            failed += len(results) - sum(results)
            
            if failed > max_calls * 0.1:  # > 10% failure
                break
        
        print(f"\nStress Test Results:")
        print(f"  Successful: {successful}")
        print(f"  Failed: {failed}")
        print(f"  Max capacity: ~{successful} concurrent calls")
        
        assert successful >= 100, "Should handle at least 100 concurrent calls"
    
    @pytest.mark.stress
    @pytest.mark.asyncio
    async def test_sustained_load(self, e2e_client):
        """Test sustained load over time."""
        duration_seconds = 60
        requests_per_second = 50
        
        start_time = time.time()
        total_requests = 0
        errors = 0
        
        while time.time() - start_time < duration_seconds:
            batch_start = time.time()
            
            tasks = [
                e2e_client.search_knowledge_base("kb-id", "test")
                for _ in range(requests_per_second)
            ]
            
            results = await asyncio.gather(*tasks, return_exceptions=True)
            
            total_requests += len(results)
            errors += sum(1 for r in results if isinstance(r, Exception))
            
            # Wait for next second
            elapsed = time.time() - batch_start
            if elapsed < 1:
                await asyncio.sleep(1 - elapsed)
        
        error_rate = errors / total_requests * 100
        print(f"\nSustained Load Results:")
        print(f"  Duration: {duration_seconds}s")
        print(f"  Total requests: {total_requests}")
        print(f"  Error rate: {error_rate:.2f}%")
        
        assert error_rate < 1, f"Error rate {error_rate}% exceeds 1%"

Section 59: CI/CD Pipelines

59.1 GitHub Actions Workflow

# .github/workflows/ci.yaml
name: CI Pipeline

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

env:
  PYTHON_VERSION: "3.11"
  NODE_VERSION: "20"

jobs:
  # ============================================================
  # LINT AND TYPE CHECK
  # ============================================================
  lint:
    name: Lint & Type Check
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ env.PYTHON_VERSION }}
      
      - name: Install dependencies
        run: |
          pip install ruff mypy
          pip install -r requirements.txt
      
      - name: Run Ruff (linting)
        run: ruff check .
      
      - name: Run Ruff (formatting)
        run: ruff format --check .
      
      - name: Run MyPy (type checking)
        run: mypy services/ --ignore-missing-imports

  # ============================================================
  # UNIT TESTS
  # ============================================================
  unit-tests:
    name: Unit Tests
    runs-on: ubuntu-latest
    strategy:
      matrix:
        service: [api-gateway, agent-service, kb-service]
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ env.PYTHON_VERSION }}
      
      - name: Install dependencies
        run: |
          cd services/${{ matrix.service }}
          pip install -r requirements.txt
          pip install -r requirements-test.txt
      
      - name: Run unit tests
        run: |
          cd services/${{ matrix.service }}
          pytest tests/unit/ -v --cov=src --cov-report=xml -m "not slow"
      
      - name: Upload coverage
        uses: codecov/codecov-action@v4
        with:
          files: services/${{ matrix.service }}/coverage.xml
          flags: ${{ matrix.service }}

  # ============================================================
  # INTEGRATION TESTS
  # ============================================================
  integration-tests:
    name: Integration Tests
    runs-on: ubuntu-latest
    needs: [lint, unit-tests]
    
    services:
      postgres:
        image: pgvector/pgvector:pg15
        env:
          POSTGRES_USER: test
          POSTGRES_PASSWORD: test
          POSTGRES_DB: voiceai_test
        ports:
          - 5432:5432
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
      
      redis:
        image: redis:7-alpine
        ports:
          - 6379:6379
        options: >-
          --health-cmd "redis-cli ping"
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ env.PYTHON_VERSION }}
      
      - name: Install dependencies
        run: |
          pip install -r requirements.txt
          pip install -r requirements-test.txt
      
      - name: Run database migrations
        env:
          DATABASE_URL: postgres://test:test@localhost:5432/voiceai_test
        run: |
          python scripts/migrate.py
      
      - name: Run integration tests
        env:
          DATABASE_URL: postgres://test:test@localhost:5432/voiceai_test
          REDIS_URL: redis://localhost:6379
        run: |
          pytest tests/integration/ -v -m integration

  # ============================================================
  # BUILD DOCKER IMAGES
  # ============================================================
  build:
    name: Build Docker Images
    runs-on: ubuntu-latest
    needs: [integration-tests]
    if: github.event_name == 'push'
    
    strategy:
      matrix:
        service: [api-gateway, agent-service, kb-service]
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      
      - name: Login to Container Registry
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      
      - name: Build and push
        uses: docker/build-push-action@v5
        with:
          context: services/${{ matrix.service }}
          push: true
          tags: |
            ghcr.io/${{ github.repository }}/${{ matrix.service }}:${{ github.sha }}
            ghcr.io/${{ github.repository }}/${{ matrix.service }}:latest
          cache-from: type=gha
          cache-to: type=gha,mode=max

  # ============================================================
  # DEPLOY TO STAGING
  # ============================================================
  deploy-staging:
    name: Deploy to Staging
    runs-on: ubuntu-latest
    needs: [build]
    if: github.ref == 'refs/heads/develop'
    environment: staging
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up kubectl
        uses: azure/setup-kubectl@v3
      
      - name: Configure kubectl
        run: |
          echo "${{ secrets.KUBE_CONFIG_STAGING }}" | base64 -d > kubeconfig
          export KUBECONFIG=kubeconfig
      
      - name: Deploy to staging
        run: |
          kubectl set image deployment/api-gateway \
            api-gateway=ghcr.io/${{ github.repository }}/api-gateway:${{ github.sha }}
          kubectl set image deployment/agent-service \
            agent-service=ghcr.io/${{ github.repository }}/agent-service:${{ github.sha }}
          kubectl set image deployment/kb-service \
            kb-service=ghcr.io/${{ github.repository }}/kb-service:${{ github.sha }}
          kubectl rollout status deployment/api-gateway
          kubectl rollout status deployment/agent-service
          kubectl rollout status deployment/kb-service
      
      - name: Run smoke tests
        run: |
          ./scripts/smoke-tests.sh staging

  # ============================================================
  # DEPLOY TO PRODUCTION
  # ============================================================
  deploy-production:
    name: Deploy to Production
    runs-on: ubuntu-latest
    needs: [build]
    if: github.ref == 'refs/heads/main'
    environment: production
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up kubectl
        uses: azure/setup-kubectl@v3
      
      - name: Configure kubectl
        run: |
          echo "${{ secrets.KUBE_CONFIG_PROD }}" | base64 -d > kubeconfig
          export KUBECONFIG=kubeconfig
      
      - name: Deploy with rolling update
        run: |
          # Update images
          kubectl set image deployment/api-gateway \
            api-gateway=ghcr.io/${{ github.repository }}/api-gateway:${{ github.sha }}
          
          # Wait for rollout
          kubectl rollout status deployment/api-gateway --timeout=300s
          
          # Run health check
          ./scripts/health-check.sh production
          
          # Continue with other services
          kubectl set image deployment/agent-service \
            agent-service=ghcr.io/${{ github.repository }}/agent-service:${{ github.sha }}
          kubectl rollout status deployment/agent-service --timeout=300s
          
          kubectl set image deployment/kb-service \
            kb-service=ghcr.io/${{ github.repository }}/kb-service:${{ github.sha }}
          kubectl rollout status deployment/kb-service --timeout=300s
      
      - name: Notify on success
        if: success()
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {
              "text": "✅ Production deployment successful: ${{ github.sha }}"
            }
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}
      
      - name: Notify on failure
        if: failure()
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {
              "text": "❌ Production deployment failed: ${{ github.sha }}"
            }
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}

59.2 Deployment Scripts

#!/bin/bash
# scripts/deploy.sh

set -euo pipefail

ENVIRONMENT=${1:-staging}
IMAGE_TAG=${2:-latest}

echo "Deploying to $ENVIRONMENT with tag $IMAGE_TAG"

# Validate environment
if [[ ! "$ENVIRONMENT" =~ ^(staging|production)$ ]]; then
    echo "Invalid environment: $ENVIRONMENT"
    exit 1
fi

# Load environment config
source "config/$ENVIRONMENT.env"

# Pre-deployment checks
echo "Running pre-deployment checks..."
./scripts/pre-deploy-checks.sh $ENVIRONMENT

# Backup current state
echo "Backing up current deployment state..."
kubectl get deployments -o yaml > "backups/deployment-$ENVIRONMENT-$(date +%Y%m%d-%H%M%S).yaml"

# Deploy services
SERVICES=(api-gateway agent-service kb-service)

for SERVICE in "${SERVICES[@]}"; do
    echo "Deploying $SERVICE..."
    
    kubectl set image deployment/$SERVICE \
        $SERVICE=ghcr.io/voiceai/$SERVICE:$IMAGE_TAG \
        --namespace=$NAMESPACE
    
    # Wait for rollout
    if ! kubectl rollout status deployment/$SERVICE --namespace=$NAMESPACE --timeout=300s; then
        echo "Rollout failed for $SERVICE, initiating rollback..."
        kubectl rollout undo deployment/$SERVICE --namespace=$NAMESPACE
        exit 1
    fi
    
    # Health check
    echo "Running health check for $SERVICE..."
    if ! ./scripts/health-check.sh $ENVIRONMENT $SERVICE; then
        echo "Health check failed for $SERVICE, initiating rollback..."
        kubectl rollout undo deployment/$SERVICE --namespace=$NAMESPACE
        exit 1
    fi
done

echo "Deployment complete!"
#!/bin/bash
# scripts/health-check.sh

ENVIRONMENT=$1
SERVICE=${2:-all}
MAX_RETRIES=10
RETRY_DELAY=5

check_service() {
    local service=$1
    local url=$2
    
    for i in $(seq 1 $MAX_RETRIES); do
        echo "Health check attempt $i for $service..."
        
        if curl -sf "$url/health" > /dev/null; then
            echo "✓ $service is healthy"
            return 0
        fi
        
        sleep $RETRY_DELAY
    done
    
    echo "✗ $service health check failed"
    return 1
}

# Get service URLs based on environment
if [ "$ENVIRONMENT" == "staging" ]; then
    API_URL="https://api.staging.voiceai.com"
elif [ "$ENVIRONMENT" == "production" ]; then
    API_URL="https://api.voiceai.com"
fi

if [ "$SERVICE" == "all" ] || [ "$SERVICE" == "api-gateway" ]; then
    check_service "api-gateway" "$API_URL"
fi

echo "All health checks passed!"

59.3 Rollback Procedures

#!/bin/bash
# scripts/rollback.sh

set -euo pipefail

ENVIRONMENT=$1
SERVICE=${2:-all}
REVISION=${3:-1}  # How many revisions to roll back

echo "Rolling back $SERVICE in $ENVIRONMENT by $REVISION revision(s)"

rollback_service() {
    local service=$1
    
    echo "Rolling back $service..."
    
    # Get current revision
    CURRENT=$(kubectl rollout history deployment/$service --namespace=$NAMESPACE | tail -2 | head -1 | awk '{print $1}')
    TARGET=$((CURRENT - REVISION))
    
    echo "Current revision: $CURRENT, Target revision: $TARGET"
    
    # Rollback
    kubectl rollout undo deployment/$service --to-revision=$TARGET --namespace=$NAMESPACE
    
    # Wait for rollout
    kubectl rollout status deployment/$service --namespace=$NAMESPACE --timeout=300s
}

if [ "$SERVICE" == "all" ]; then
    for svc in api-gateway agent-service kb-service; do
        rollback_service $svc
    done
else
    rollback_service $SERVICE
fi

# Verify health
./scripts/health-check.sh $ENVIRONMENT $SERVICE

echo "Rollback complete!"

59.4 Database Migrations

"""
Database migration script.

File: scripts/migrate.py
"""
import asyncio
import asyncpg
import os
from pathlib import Path


async def run_migrations():
    """Run all pending migrations."""
    database_url = os.environ["DATABASE_URL"]
    migrations_dir = Path("migrations")
    
    conn = await asyncpg.connect(database_url)
    
    try:
        # Create migrations table if not exists
        await conn.execute("""
            CREATE TABLE IF NOT EXISTS schema_migrations (
                version VARCHAR(255) PRIMARY KEY,
                applied_at TIMESTAMPTZ DEFAULT NOW()
            )
        """)
        
        # Get applied migrations
        applied = await conn.fetch("SELECT version FROM schema_migrations")
        applied_versions = {row["version"] for row in applied}
        
        # Get migration files
        migration_files = sorted(migrations_dir.glob("*.sql"))
        
        for migration_file in migration_files:
            version = migration_file.stem
            
            if version in applied_versions:
                print(f"Skipping {version} (already applied)")
                continue
            
            print(f"Applying {version}...")
            
            sql = migration_file.read_text()
            
            async with conn.transaction():
                await conn.execute(sql)
                await conn.execute(
                    "INSERT INTO schema_migrations (version) VALUES ($1)",
                    version,
                )
            
            print(f"Applied {version}")
        
        print("All migrations complete!")
    
    finally:
        await conn.close()


if __name__ == "__main__":
    asyncio.run(run_migrations())

Part 9 Summary

Sub-PartSectionsKey Topics
9A54-55Unit tests, integration tests, fixtures, mocking
9B56-57E2E tests, voice testing, audio utilities
9C58-59Load testing, latency testing, CI/CD pipelines
Testing Pyramid:
  • Many unit tests (fast, isolated)
  • Some integration tests (component interaction)
  • Few E2E tests (critical paths)
  • Voice-specific tests (audio, latency, barge-in)
CI/CD Pipeline:
  • Lint → Unit Tests → Integration Tests → Build → Deploy
  • Rolling deployments with health checks
  • Automatic rollback on failure

What’s Next

Part 10: Operations & Monitoring will cover:
  • Logging and observability
  • Metrics and alerting
  • Incident response
  • Scaling strategies

End of Part 9C Junior Developer PRD — Part 10A: Logging & Observability Document Version: 1.0 Last Updated: January 25, 2026 Part: 10A of 10 (Sub-part 1 of 3) Sections: 60-61 Audience: Junior developers with no prior context Estimated Reading Time: 25 minutes How to Use This Document This is Part 10A—the first of three sub-parts covering Operations & Monitoring: Part 10A (this document): Logging & Observability Part 10B: Metrics & Alerting Part 10C: Operations & Scaling Prerequisites: Parts 1-9 of the PRD series. Table of Contents Section 60: Structured Logging Section 61: Distributed Tracing Section 60: Structured Logging 60.1 Why Logging Matters In production, you can’t attach a debugger. Logs are your primary tool for understanding what’s happening: Scenario Without Good Logging With Good Logging Call dropped “Something failed” “STT timeout after 5s, call_id=abc123, tenant=xyz” Slow response “It’s slow sometimes” “LLM TTFB P95 = 2.3s, model=sonnet, prompt_tokens=4521” Customer complaint Hours of investigation Query by call_id, see full timeline 60.2 Structured Logging Principles Structured logs use key-value pairs instead of free-form text:

Bad: Unstructured

logger.info(f”Processing call {call_id} for tenant {tenant_id}“)

Good: Structured

logger.info("Processing call", extra={ "call\_id": call\_id, "tenant\_id": tenant\_id, "event": "call\_processing\_started", })

Benefits: Searchable: Find all logs for a specific call_id Aggregatable: Count errors by tenant Parseable: Automated analysis and alerting 60.3 Logging Configuration """ Centralized logging configuration. File: shared/logging_config.py """ import logging import json import sys from datetime import datetime from typing import Any, Dict import os
class JSONFormatter(logging.Formatter): """ Format logs as JSON for structured logging.

Output example:
{
    "timestamp": "2026-01-25T10:30:00.123Z",
    "level": "INFO",
    "logger": "agent-service.pipeline",
    "message": "Processing audio frame",
    "service": "agent-service",
    "environment": "production",
    "call_id": "abc123",
    "frame_duration_ms": 30
}
"""

def __init__(self, service_name: str):
    super().__init__()
    self.service_name = service_name
    self.environment = os.getenv("ENVIRONMENT", "development")

def format(self, record: logging.LogRecord) -> str:
    log_data = {
        "timestamp": datetime.utcnow().isoformat() + "Z",
        "level": record.levelname,
        "logger": record.name,
        "message": record.getMessage(),
        "service": self.service_name,
        "environment": self.environment,
    }
    
    # Add location info for errors
    if record.levelno >= logging.ERROR:
        log_data["location"] = {
            "file": record.filename,
            "line": record.lineno,
            "function": record.funcName,
        }
    
    # Add exception info if present
    if record.exc_info:
        log_data["exception"] = {
            "type": record.exc_info[0].__name__,
            "message": str(record.exc_info[1]),
            "traceback": self.formatException(record.exc_info),
        }
    
    # Add extra fields
    if hasattr(record, "__dict__"):
        for key, value in record.__dict__.items():
            if key not in (
                "name", "msg", "args", "created", "filename",
                "funcName", "levelname", "levelno", "lineno",
                "module", "msecs", "pathname", "process",
                "processName", "relativeCreated", "stack_info",
                "exc_info", "exc_text", "thread", "threadName",
                "message",
            ):
                log_data[key] = value
    
    return json.dumps(log_data, default=str)
class DevelopmentFormatter(logging.Formatter): """Human-readable format for development."""

COLORS = {
    "DEBUG": "\033[36m",    # Cyan
    "INFO": "\033[32m",     # Green
    "WARNING": "\033[33m",  # Yellow
    "ERROR": "\033[31m",    # Red
    "CRITICAL": "\033[35m", # Magenta
}
RESET = "\033[0m"

def format(self, record: logging.LogRecord) -> str:
    color = self.COLORS.get(record.levelname, "")
    
    # Build base message
    msg = f"{color}{record.levelname:8}{self.RESET} "
    msg += f"{record.name}: {record.getMessage()}"
    
    # Add extra fields
    extras = []
    for key, value in record.__dict__.items():
        if key.startswith("_") or key in (
            "name", "msg", "args", "created", "filename",
            "funcName", "levelname", "levelno", "lineno",
            "module", "msecs", "pathname", "process",
            "processName", "relativeCreated", "stack_info",
            "exc_info", "exc_text", "thread", "threadName",
            "message", "taskName",
        ):
            continue
        extras.append(f"{key}={value}")
    
    if extras:
        msg += f" [{', '.join(extras)}]"
    
    return msg
def configure\_logging( service\_name: str, level: str \= "INFO", json\_output: bool \= None, ) \-\> None: """ Configure logging for a service.

Args:
    service_name: Name of the service (e.g., "agent-service")
    level: Log level (DEBUG, INFO, WARNING, ERROR)
    json_output: Force JSON output (auto-detected if None)

Example:
    configure_logging("agent-service", level="DEBUG")
    logger = logging.getLogger("agent-service.pipeline")
    logger.info("Starting pipeline", extra={"call_id": "123"})
"""
# Auto-detect: use JSON in production, human-readable in dev
if json_output is None:
    json_output = os.getenv("ENVIRONMENT") in ("production", "staging")

# Create formatter
if json_output:
    formatter = JSONFormatter(service_name)
else:
    formatter = DevelopmentFormatter()

# Configure root handler
handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(formatter)

# Configure root logger
root_logger = logging.getLogger()
root_logger.setLevel(getattr(logging, level.upper()))
root_logger.handlers = [handler]

# Reduce noise from libraries
logging.getLogger("urllib3").setLevel(logging.WARNING)
logging.getLogger("asyncio").setLevel(logging.WARNING)
logging.getLogger("websockets").setLevel(logging.WARNING)
def get\_logger(name: str) \-\> logging.Logger: """Get a logger with the given name.""" return logging.getLogger(name) 60.4 Context-Aware Logging """ Context managers for adding context to logs.

File: shared/logging_context.py """ import logging import contextvars from typing import Any, Dict, Optional from contextlib import contextmanager from functools import wraps

Context variable for log context

\_log\_context: contextvars.ContextVar\[Dict\[str, Any\]\] \= contextvars.ContextVar( "log\_context", default={}, )

class ContextualLogger: """ Logger that automatically includes context.

Example:
    logger = ContextualLogger("agent-service.pipeline")
    
    with log_context(call_id="abc123", tenant_id="xyz"):
        logger.info("Processing started")
        # Output includes call_id and tenant_id automatically
"""

def __init__(self, name: str):
    self._logger = logging.getLogger(name)

def _log(self, level: int, msg: str, *args, **kwargs):
    # Merge context into extra
    extra = kwargs.pop("extra", {})
    context = _log_context.get()
    merged_extra = {**context, **extra}
    
    self._logger.log(level, msg, *args, extra=merged_extra, **kwargs)

def debug(self, msg: str, *args, **kwargs):
    self._log(logging.DEBUG, msg, *args, **kwargs)

def info(self, msg: str, *args, **kwargs):
    self._log(logging.INFO, msg, *args, **kwargs)

def warning(self, msg: str, *args, **kwargs):
    self._log(logging.WARNING, msg, *args, **kwargs)

def error(self, msg: str, *args, **kwargs):
    self._log(logging.ERROR, msg, *args, **kwargs)

def exception(self, msg: str, *args, **kwargs):
    kwargs["exc_info"] = True
    self._log(logging.ERROR, msg, *args, **kwargs)
@contextmanager def log_context(**kwargs): """ Add context to all logs within this block.
Example:
    with log_context(call_id="abc123"):
        logger.info("Processing")  # Includes call_id
        do_something()  # All logs inside include call_id
"""
current = _log_context.get()
new_context = {**current, **kwargs}
token = _log_context.set(new_context)
try:
    yield
finally:
    _log_context.reset(token)
def with\_log\_context(\*\*context\_kwargs): """ Decorator to add context to all logs in a function.

Example:
    @with_log_context(component="vad")
    async def process_vad(frame):
        logger.info("Processing frame")  # Includes component="vad"
"""
def decorator(func):
    @wraps(func)
    async def async_wrapper(*args, **kwargs):
        with log_context(**context_kwargs):
            return await func(*args, **kwargs)
    
    @wraps(func)
    def sync_wrapper(*args, **kwargs):
        with log_context(**context_kwargs):
            return func(*args, **kwargs)
    
    if asyncio.iscoroutinefunction(func):
        return async_wrapper
    return sync_wrapper

return decorator

Convenience function to set call context

def set\_call\_context( call\_id: str, tenant\_id: str, agency\_id: str \= None, \*\*extra, ): """ Set logging context for a call.

Example:
    with set_call_context("call-123", "tenant-456"):
        # All logs include call_id and tenant_id
        process_call()
"""
context = {
    "call_id": call_id,
    "tenant_id": tenant_id,
}
if agency_id:
    context["agency_id"] = agency_id
context.update(extra)

return log_context(**context)
60.5 Standard Log Events """ Standard log events for consistency. File: shared/log_events.py """ from enum import Enum from dataclasses import dataclass from typing import Optional, Any, Dict import time
class LogEvent(Enum): """Standard event types for structured logging."""

# Call lifecycle
CALL_STARTED = "call.started"
CALL_ANSWERED = "call.answered"
CALL_ENDED = "call.ended"
CALL_FAILED = "call.failed"
CALL_TRANSFERRED = "call.transferred"

# Pipeline events
PIPELINE_STATE_CHANGE = "pipeline.state_change"
VAD_SPEECH_START = "vad.speech_start"
VAD_SPEECH_END = "vad.speech_end"
STT_TRANSCRIPT = "stt.transcript"
LLM_REQUEST = "llm.request"
LLM_RESPONSE = "llm.response"
TTS_SYNTHESIZE = "tts.synthesize"
BARGE_IN_DETECTED = "barge_in.detected"

# RAG events
RAG_SEARCH = "rag.search"
RAG_NO_RESULTS = "rag.no_results"

# Error events
ERROR_STT = "error.stt"
ERROR_LLM = "error.llm"
ERROR_TTS = "error.tts"
ERROR_TIMEOUT = "error.timeout"

# Performance events
LATENCY_MEASUREMENT = "latency.measurement"
@dataclass class CallStartedEvent: """Log event for call started.""" call_id: str tenant_id: str agency_id: str direction: str caller_phone: str
def log(self, logger):
    logger.info(
        "Call started",
        extra={
            "event": LogEvent.CALL_STARTED.value,
            **self.__dict__,
        }
    )
@dataclass class LatencyEvent: """Log event for latency measurements.""" call_id: str component: str # stt, llm, tts, e2e latency_ms: float
def log(self, logger):
    logger.info(
        f"Latency measurement: {self.component}",
        extra={
            "event": LogEvent.LATENCY_MEASUREMENT.value,
            **self.__dict__,
        }
    )
class LatencyTimer: """ Context manager for timing operations.

Example:
    with LatencyTimer("llm", call_id, logger) as timer:
        response = await llm.generate(...)
    # Automatically logs latency
"""

def __init__(self, component: str, call_id: str, logger):
    self.component = component
    self.call_id = call_id
    self.logger = logger
    self.start_time = None
    self.latency_ms = None

def __enter__(self):
    self.start_time = time.perf_counter()
    return self

def __exit__(self, *args):
    self.latency_ms = (time.perf_counter() - self.start_time) * 1000
    LatencyEvent(
        call_id=self.call_id,
        component=self.component,
        latency_ms=self.latency_ms,
    ).log(self.logger)

async def __aenter__(self):
    return self.__enter__()

async def __aexit__(self, *args):
    return self.__exit__(*args)
60.6 Logging in Practice """ Example usage in voice pipeline. File: services/agent-service/pipeline/example_logging.py """ from shared.logging_config import configure_logging, get_logger from shared.logging_context import ContextualLogger, set_call_context, log_context from shared.log_events import LogEvent, LatencyTimer, CallStartedEvent

Configure logging at service startup

configure_logging(“agent-service”, level=“INFO”)

Create contextual logger

logger = ContextualLogger(“agent-service.pipeline”) async def handle_incoming_call(call_context): """Handle an incoming call with proper logging."""
# Set call context for all subsequent logs
with set_call_context(
    call_id=call_context.call_id,
    tenant_id=call_context.tenant_id,
    agency_id=call_context.agency_id,
):
    # Log call started
    CallStartedEvent(
        call_id=call_context.call_id,
        tenant_id=call_context.tenant_id,
        agency_id=call_context.agency_id,
        direction=call_context.direction.value,
        caller_phone=call_context.caller_phone,
    ).log(logger)
    
    try:
        # Initialize pipeline
        logger.info("Initializing voice pipeline")
        
        # Process call
        await process_call(call_context)
        
        # Log success
        logger.info(
            "Call completed successfully",
            extra={
                "event": LogEvent.CALL_ENDED.value,
                "duration_seconds": call_context.duration,
                "turn_count": call_context.turn_count,
            }
        )
    
    except TimeoutError as e:
        logger.error(
            "Call failed due to timeout",
            extra={
                "event": LogEvent.CALL_FAILED.value,
                "error_type": "timeout",
                "error_message": str(e),
            }
        )
        raise
    
    except Exception as e:
        logger.exception(
            "Call failed with unexpected error",
            extra={
                "event": LogEvent.CALL_FAILED.value,
                "error_type": type(e).__name__,
            }
        )
        raise
async def process_stt(audio_data, call_id): """Process speech-to-text with logging."""
with log_context(component="stt"):
    async with LatencyTimer("stt", call_id, logger):
        transcript = await stt_client.transcribe(audio_data)
    
    logger.info(
        "Transcript received",
        extra={
            "event": LogEvent.STT_TRANSCRIPT.value,
            "transcript_length": len(transcript.text),
            "confidence": transcript.confidence,
            "is_final": transcript.is_final,
        }
    )
    
    return transcript
async def process_llm(messages, call_id): """Process LLM generation with logging."""
with log_context(component="llm"):
    # Log request
    logger.info(
        "LLM request started",
        extra={
            "event": LogEvent.LLM_REQUEST.value,
            "message_count": len(messages),
            "prompt_tokens": sum(len(m["content"]) // 4 for m in messages),
        }
    )
    
    async with LatencyTimer("llm_ttfb", call_id, logger):
        first_chunk = True
        response_text = ""
        
        async for chunk in llm_client.generate_streaming(messages):
            if first_chunk:
                first_chunk = False
                # TTFB logged by timer
            
            response_text += chunk.text
    
    logger.info(
        "LLM response complete",
        extra={
            "event": LogEvent.LLM_RESPONSE.value,
            "response_length": len(response_text),
            "response_tokens": len(response_text) // 4,
        }
    )
    
    return response_text
Section 61: Distributed Tracing 61.1 What is Distributed Tracing? Distributed tracing tracks requests as they flow through multiple services: ┌─────────────────────────────────────────────────────────────────┐ │ DISTRIBUTED TRACE │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Trace ID: abc-123-xyz │ │ │ │ ├── api-gateway (50ms) │ │ │ └── auth.validate_token (5ms) │ │ │ │ │ ├── agent-service (850ms) │ │ │ ├── vad.detect_speech (30ms) │ │ │ ├── stt.transcribe (150ms) │ │ │ │ └── deepgram.api_call (140ms) │ │ │ ├── rag.retrieve (80ms) │ │ │ │ └── pgvector.search (60ms) │ │ │ ├── llm.generate (450ms) │ │ │ │ └── anthropic.api_call (440ms) │ │ │ └── tts.synthesize (140ms) │ │ │ └── chatterbox.api_call (130ms) │ │ │ │ │ └── kb-service (80ms) │ │ └── embedding.generate (70ms) │ │ │ │ Total: 980ms │ │ │ └─────────────────────────────────────────────────────────────────┘ 61.2 OpenTelemetry Setup """ OpenTelemetry configuration for distributed tracing. File: shared/tracing.py """ from opentelemetry import trace from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import BatchSpanProcessor from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter from opentelemetry.sdk.resources import Resource from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor from opentelemetry.instrumentation.redis import RedisInstrumentor from opentelemetry.instrumentation.asyncpg import AsyncPGInstrumentor from opentelemetry.propagate import set_global_textmap from opentelemetry.propagators.b3 import B3MultiFormat import os
def configure\_tracing(service\_name: str) \-\> trace.Tracer: """ Configure OpenTelemetry tracing.

Args:
    service_name: Name of the service

Returns:
    Configured tracer

Example:
    tracer = configure_tracing("agent-service")
    
    with tracer.start_as_current_span("process_audio") as span:
        span.set_attribute("call_id", call_id)
        process_audio(...)
"""
# Create resource with service info
resource = Resource.create({
    "service.name": service_name,
    "service.version": os.getenv("SERVICE_VERSION", "unknown"),
    "deployment.environment": os.getenv("ENVIRONMENT", "development"),
})

# Create tracer provider
provider = TracerProvider(resource=resource)

# Configure exporter (to Jaeger/Tempo/etc.)
otlp_endpoint = os.getenv("OTLP_ENDPOINT", "http://localhost:4317")
exporter = OTLPSpanExporter(endpoint=otlp_endpoint, insecure=True)

# Add batch processor for efficiency
processor = BatchSpanProcessor(exporter)
provider.add_span_processor(processor)

# Set as global provider
trace.set_tracer_provider(provider)

# Configure propagation (for cross-service traces)
set_global_textmap(B3MultiFormat())

# Auto-instrument libraries
HTTPXClientInstrumentor().instrument()
RedisInstrumentor().instrument()
AsyncPGInstrumentor().instrument()

return trace.get_tracer(service_name)
def instrument\_fastapi(app): """Instrument FastAPI application.""" FastAPIInstrumentor.instrument\_app(app)

def get\_tracer(name: str) \-\> trace.Tracer: """Get a tracer for a component.""" return trace.get\_tracer(name) 61.3 Custom Spans """ Custom span helpers for voice pipeline.

File: shared/tracing_helpers.py """ from opentelemetry import trace from opentelemetry.trace import Status, StatusCode, SpanKind from contextlib import contextmanager from functools import wraps import asyncio @contextmanager def trace_span( tracer: trace.Tracer, name: str, attributes: dict = None, kind: SpanKind = SpanKind.INTERNAL, ): """ Context manager for creating spans.
Example:
    with trace_span(tracer, "process_audio", {"call_id": "123"}):
        process_audio()
"""
with tracer.start_as_current_span(name, kind=kind) as span:
    if attributes:
        for key, value in attributes.items():
            span.set_attribute(key, value)
    
    try:
        yield span
    except Exception as e:
        span.set_status(Status(StatusCode.ERROR, str(e)))
        span.record_exception(e)
        raise
def traced(tracer: trace.Tracer, name: str \= None, attributes: dict \= None): """ Decorator for tracing functions.

Example:
    @traced(tracer, "stt.transcribe")
    async def transcribe(audio):
        ...
"""
def decorator(func):
    span_name = name or f"{func.__module__}.{func.__name__}"
    
    @wraps(func)
    async def async_wrapper(*args, **kwargs):
        with trace_span(tracer, span_name, attributes):
            return await func(*args, **kwargs)
    
    @wraps(func)
    def sync_wrapper(*args, **kwargs):
        with trace_span(tracer, span_name, attributes):
            return func(*args, **kwargs)
    
    if asyncio.iscoroutinefunction(func):
        return async_wrapper
    return sync_wrapper

return decorator
class CallTracer: """ Tracer for voice call spans.

Example:
    call_tracer = CallTracer(tracer, call_id, tenant_id)
    
    with call_tracer.span("vad.process") as span:
        span.set_attribute("frame_count", 100)
        process_vad()
"""

def __init__(self, tracer: trace.Tracer, call_id: str, tenant_id: str):
    self.tracer = tracer
    self.call_id = call_id
    self.tenant_id = tenant_id
    self._root_span = None

def start_call_trace(self):
    """Start the root span for a call."""
    self._root_span = self.tracer.start_span(
        "call.process",
        kind=SpanKind.SERVER,
    )
    self._root_span.set_attribute("call_id", self.call_id)
    self._root_span.set_attribute("tenant_id", self.tenant_id)
    return self._root_span

def end_call_trace(self, success: bool = True, error: str = None):
    """End the root span for a call."""
    if self._root_span:
        if success:
            self._root_span.set_status(Status(StatusCode.OK))
        else:
            self._root_span.set_status(Status(StatusCode.ERROR, error))
        self._root_span.end()

@contextmanager
def span(self, name: str, attributes: dict = None):
    """Create a child span for this call."""
    with self.tracer.start_as_current_span(name) as span:
        span.set_attribute("call_id", self.call_id)
        span.set_attribute("tenant_id", self.tenant_id)
        
        if attributes:
            for key, value in attributes.items():
                span.set_attribute(key, value)
        
        try:
            yield span
        except Exception as e:
            span.set_status(Status(StatusCode.ERROR, str(e)))
            span.record_exception(e)
            raise
61.4 Tracing in Practice """ Example tracing in voice pipeline. File: services/agent-service/pipeline/traced_pipeline.py """ from shared.tracing import configure_tracing, get_tracer from shared.tracing_helpers import CallTracer, traced, trace_span

Configure at startup

tracer = configure_tracing(“agent-service”)
class TracedVoicePipeline: """Voice pipeline with distributed tracing."""

def __init__(self, stt_client, llm_client, tts_client):
    self.stt = stt_client
    self.llm = llm_client
    self.tts = tts_client
    self.tracer = get_tracer("agent-service.pipeline")

async def process_call(self, call_context):
    """Process a call with full tracing."""
    
    call_tracer = CallTracer(
        self.tracer,
        call_context.call_id,
        call_context.tenant_id,
    )
    
    root_span = call_tracer.start_call_trace()
    
    try:
        with trace.use_span(root_span):
            # Process audio through pipeline
            await self._process_pipeline(call_tracer, call_context)
        
        call_tracer.end_call_trace(success=True)
    
    except Exception as e:
        call_tracer.end_call_trace(success=False, error=str(e))
        raise

async def _process_pipeline(self, call_tracer, call_context):
    """Process through VAD → STT → LLM → TTS."""
    
    # VAD processing
    with call_tracer.span("vad.process", {"threshold": 0.5}) as span:
        speech_audio = await self._collect_speech(call_context)
        span.set_attribute("audio_duration_ms", len(speech_audio) / 16)
    
    # STT processing
    with call_tracer.span("stt.transcribe") as span:
        transcript = await self.stt.transcribe(speech_audio)
        span.set_attribute("transcript", transcript.text[:100])
        span.set_attribute("confidence", transcript.confidence)
    
    # RAG retrieval
    with call_tracer.span("rag.retrieve") as span:
        context = await self._retrieve_context(transcript.text)
        span.set_attribute("chunks_retrieved", len(context.chunks))
    
    # LLM generation
    with call_tracer.span("llm.generate") as span:
        span.set_attribute("model", "claude-sonnet")
        response = await self._generate_response(transcript.text, context)
        span.set_attribute("response_length", len(response))
    
    # TTS synthesis
    with call_tracer.span("tts.synthesize") as span:
        audio = await self.tts.synthesize(response)
        span.set_attribute("audio_duration_ms", audio.duration_ms)
    
    return audio
61.5 Trace Propagation """ Propagating traces across services. File: shared/trace_propagation.py """ from opentelemetry import trace from opentelemetry.propagate import inject, extract from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator
def inject\_trace\_headers(headers: dict) \-\> dict: """ Inject trace context into HTTP headers.

Example:
    headers = {"Content-Type": "application/json"}
    headers = inject_trace_headers(headers)
    response = await client.post(url, headers=headers)
"""
inject(headers)
return headers
def extract\_trace\_context(headers: dict): """ Extract trace context from incoming request headers.

Example:
    @app.post("/webhook")
    async def webhook(request: Request):
        ctx = extract_trace_context(dict(request.headers))
        with tracer.start_as_current_span("webhook", context=ctx):
            process_webhook()
"""
return extract(headers)

FastAPI middleware for automatic propagation

from fastapi import Request from starlette.middleware.base import BaseHTTPMiddleware

class TracePropagationMiddleware(BaseHTTPMiddleware): """Middleware to propagate trace context."""

async def dispatch(self, request: Request, call_next):
    # Extract trace context from incoming headers
    ctx = extract_trace_context(dict(request.headers))
    
    # Start span with extracted context
    tracer = trace.get_tracer("api-gateway")
    
    with tracer.start_as_current_span(
        f"{request.method} {request.url.path}",
        context=ctx,
        kind=trace.SpanKind.SERVER,
    ) as span:
        span.set_attribute("http.method", request.method)
        span.set_attribute("http.url", str(request.url))
        
        response = await call_next(request)
        
        span.set_attribute("http.status_code", response.status_code)
        return response
61.6 Log Correlation """ Correlate logs with traces. File: shared/log_trace_correlation.py """ import logging from opentelemetry import trace
class TraceInjectingFilter(logging.Filter): """ Logging filter that adds trace context to log records.

This allows logs to be correlated with traces in observability tools.
"""

def filter(self, record: logging.LogRecord) -> bool:
    span = trace.get_current_span()
    
    if span and span.is_recording():
        ctx = span.get_span_context()
        record.trace_id = format(ctx.trace_id, "032x")
        record.span_id = format(ctx.span_id, "016x")
    else:
        record.trace_id = "0" * 32
        record.span_id = "0" * 16
    
    return True
def configure\_log\_trace\_correlation(): """Add trace IDs to all logs.""" root\_logger \= logging.getLogger() root\_logger.addFilter(TraceInjectingFilter())

Summary: What You’ve Learned in Part 10A Section 60: Structured Logging JSON-formatted logs for searchability Contextual logging with automatic field injection Standard log events for consistency Latency timers for performance tracking Section 61: Distributed Tracing OpenTelemetry for cross-service tracing Custom spans for voice pipeline Trace propagation across services Log-trace correlation What’s Next In Part 10B, you’ll learn: Prometheus metrics Grafana dashboards Alert rules and thresholds On-call alerting End of Part 10A — Continue to Part 10B Junior Developer PRD — Part 10B: Metrics & Alerting Document Version: 1.0 Last Updated: January 25, 2026 Part: 10B of 10 (Sub-part 2 of 3) Sections: 62-63 Audience: Junior developers with no prior context Estimated Reading Time: 25 minutes Table of Contents Section 62: Prometheus Metrics Section 63: Alerting Section 62: Prometheus Metrics 62.1 Metrics Overview Metrics are numerical measurements collected over time: Metric Type Description Example Counter Only goes up Total requests, errors Gauge Can go up or down Active calls, queue size Histogram Distribution of values Latency percentiles Summary Similar to histogram Request durations 62.2 Metrics Configuration """ Prometheus metrics configuration. File: shared/metrics.py """ from prometheus_client import Counter, Gauge, Histogram, Info, CollectorRegistry from prometheus_client import generate_latest, CONTENT_TYPE_LATEST import time from functools import wraps from contextlib import contextmanager

Create custom registry

REGISTRY = CollectorRegistry()

============================================================

SERVICE INFO

============================================================

SERVICE_INFO = Info( “voiceai_service”, “Service information”, registry=REGISTRY, )
def set\_service\_info(name: str, version: str, environment: str): """Set service metadata.""" SERVICE\_INFO.info({ "name": name, "version": version, "environment": environment, })

============================================================

CALL METRICS

============================================================

CALLS_TOTAL = Counter( “voiceai_calls_total”, “Total number of calls”, [“tenant_id”, “direction”, “status”], registry=REGISTRY, ) CALLS_ACTIVE = Gauge( “voiceai_calls_active”, “Currently active calls”, [“tenant_id”], registry=REGISTRY, ) CALL_DURATION_SECONDS = Histogram( “voiceai_call_duration_seconds”, “Call duration in seconds”, [“tenant_id”, “direction”], buckets=[30, 60, 120, 300, 600, 1200, 1800, 3600], registry=REGISTRY, )

============================================================

LATENCY METRICS

============================================================

Latency buckets optimized for voice (in seconds)

LATENCY_BUCKETS = [0.05, 0.1, 0.2, 0.3, 0.5, 0.75, 1.0, 1.5, 2.0, 3.0, 5.0] STT_LATENCY = Histogram( “voiceai_stt_latency_seconds”, “Speech-to-text latency”, [“provider”], buckets=LATENCY_BUCKETS, registry=REGISTRY, ) LLM_TTFB = Histogram( “voiceai_llm_ttfb_seconds”, “LLM time to first byte”, [“model”], buckets=LATENCY_BUCKETS, registry=REGISTRY, ) LLM_TOTAL_LATENCY = Histogram( “voiceai_llm_total_latency_seconds”, “LLM total generation time”, [“model”], buckets=[0.5, 1, 2, 3, 5, 10, 15, 20, 30], registry=REGISTRY, ) TTS_LATENCY = Histogram( “voiceai_tts_latency_seconds”, “Text-to-speech latency”, [“provider”], buckets=LATENCY_BUCKETS, registry=REGISTRY, ) E2E_LATENCY = Histogram( “voiceai_e2e_latency_seconds”, “End-to-end turn latency (mouth to ear)”, [“tenant_id”], buckets=LATENCY_BUCKETS, registry=REGISTRY, ) RAG_LATENCY = Histogram( “voiceai_rag_latency_seconds”, “RAG retrieval latency”, buckets=[0.05, 0.1, 0.2, 0.3, 0.5, 1.0], registry=REGISTRY, )

============================================================

ERROR METRICS

============================================================

ERRORS_TOTAL = Counter( “voiceai_errors_total”, “Total errors by type”, [“service”, “error_type”], registry=REGISTRY, ) STT_ERRORS = Counter( “voiceai_stt_errors_total”, “STT errors”, [“provider”, “error_type”], registry=REGISTRY, ) LLM_ERRORS = Counter( “voiceai_llm_errors_total”, “LLM errors”, [“model”, “error_type”], registry=REGISTRY, ) TTS_ERRORS = Counter( “voiceai_tts_errors_total”, “TTS errors”, [“provider”, “error_type”], registry=REGISTRY, )

============================================================

PIPELINE METRICS

============================================================

PIPELINE_STATE = Gauge( “voiceai_pipeline_state”, “Current pipeline state (encoded)”, [“call_id”], registry=REGISTRY, ) VAD_DETECTIONS = Counter( “voiceai_vad_detections_total”, “VAD detection events”, [“event_type”], # speech_start, speech_end registry=REGISTRY, ) BARGE_INS = Counter( “voiceai_barge_ins_total”, “Barge-in events”, [“tenant_id”], registry=REGISTRY, ) TURNS_TOTAL = Counter( “voiceai_turns_total”, “Conversation turns”, [“tenant_id”, “role”], # user, assistant registry=REGISTRY, )

============================================================

RESOURCE METRICS

============================================================

DB_CONNECTIONS_ACTIVE = Gauge( “voiceai_db_connections_active”, “Active database connections”, registry=REGISTRY, ) REDIS_CONNECTIONS_ACTIVE = Gauge( “voiceai_redis_connections_active”, “Active Redis connections”, registry=REGISTRY, ) QUEUE_SIZE = Gauge( “voiceai_queue_size”, “Queue size by queue name”, [“queue_name”], registry=REGISTRY, )

============================================================

TOKEN METRICS

============================================================

LLM_TOKENS = Counter( “voiceai_llm_tokens_total”, “LLM tokens used”, [“tenant_id”, “model”, “type”], # type: input, output registry=REGISTRY, ) EMBEDDING_TOKENS = Counter( “voiceai_embedding_tokens_total”, “Embedding tokens used”, [“tenant_id”], registry=REGISTRY, )

============================================================

HELPER FUNCTIONS

============================================================

@contextmanager def measure_latency(histogram, labels: dict = None): """ Context manager to measure latency.
Example:
    with measure_latency(STT_LATENCY, {"provider": "deepgram"}):
        result = await stt.transcribe(audio)
"""
labels = labels or {}
start = time.perf_counter()
try:
    yield
finally:
    duration = time.perf_counter() - start
    histogram.labels(**labels).observe(duration)
def timed(histogram, labels: dict \= None): """ Decorator to measure function latency.

Example:
    @timed(STT_LATENCY, {"provider": "deepgram"})
    async def transcribe(audio):
        ...
"""
def decorator(func):
    @wraps(func)
    async def async_wrapper(*args, **kwargs):
        with measure_latency(histogram, labels):
            return await func(*args, **kwargs)
    
    @wraps(func)
    def sync_wrapper(*args, **kwargs):
        with measure_latency(histogram, labels):
            return func(*args, **kwargs)
    
    if asyncio.iscoroutinefunction(func):
        return async_wrapper
    return sync_wrapper

return decorator
def get\_metrics() \-\> bytes: """Get metrics in Prometheus format.""" return generate\_latest(REGISTRY)

def get\_metrics\_content\_type() \-\> str: """Get content type for metrics response.""" return CONTENT\_TYPE\_LATEST 62.3 FastAPI Metrics Endpoint """ FastAPI metrics endpoint.

File: services/api-gateway/routes/metrics.py """ from fastapi import APIRouter, Response from shared.metrics import get_metrics, get_metrics_content_type router = APIRouter() @router.get(“/metrics”) async def metrics(): """Prometheus metrics endpoint.""" return Response( content=get_metrics(), media_type=get_metrics_content_type(), ) 62.4 Recording Metrics in Code """ Example metrics recording in voice pipeline. File: services/agent-service/pipeline/metrics_example.py """ from shared.metrics import ( CALLS_TOTAL, CALLS_ACTIVE, CALL_DURATION_SECONDS, STT_LATENCY, LLM_TTFB, TTS_LATENCY, E2E_LATENCY, ERRORS_TOTAL, BARGE_INS, TURNS_TOTAL, LLM_TOKENS, measure_latency, ) import time
class MetricsRecorder: """Records metrics for a call."""

def __init__(self, tenant_id: str, direction: str):
    self.tenant_id = tenant_id
    self.direction = direction
    self.call_start = None
    self.turn_start = None

def call_started(self):
    """Record call start."""
    self.call_start = time.time()
    CALLS_ACTIVE.labels(tenant_id=self.tenant_id).inc()

def call_ended(self, status: str):
    """Record call end."""
    # Duration
    if self.call_start:
        duration = time.time() - self.call_start
        CALL_DURATION_SECONDS.labels(
            tenant_id=self.tenant_id,
            direction=self.direction,
        ).observe(duration)
    
    # Count
    CALLS_TOTAL.labels(
        tenant_id=self.tenant_id,
        direction=self.direction,
        status=status,
    ).inc()
    
    # Decrement active
    CALLS_ACTIVE.labels(tenant_id=self.tenant_id).dec()

def turn_started(self):
    """Start timing a conversation turn."""
    self.turn_start = time.time()

def turn_completed(self, role: str):
    """Record turn completion."""
    TURNS_TOTAL.labels(
        tenant_id=self.tenant_id,
        role=role,
    ).inc()
    
    # E2E latency (for assistant turns)
    if role == "assistant" and self.turn_start:
        latency = time.time() - self.turn_start
        E2E_LATENCY.labels(tenant_id=self.tenant_id).observe(latency)

def record_stt(self, provider: str, latency_seconds: float):
    """Record STT metrics."""
    STT_LATENCY.labels(provider=provider).observe(latency_seconds)

def record_llm(self, model: str, ttfb: float, total: float, input_tokens: int, output_tokens: int):
    """Record LLM metrics."""
    LLM_TTFB.labels(model=model).observe(ttfb)
    LLM_TOTAL_LATENCY.labels(model=model).observe(total)
    LLM_TOKENS.labels(
        tenant_id=self.tenant_id,
        model=model,
        type="input",
    ).inc(input_tokens)
    LLM_TOKENS.labels(
        tenant_id=self.tenant_id,
        model=model,
        type="output",
    ).inc(output_tokens)

def record_tts(self, provider: str, latency_seconds: float):
    """Record TTS metrics."""
    TTS_LATENCY.labels(provider=provider).observe(latency_seconds)

def record_barge_in(self):
    """Record barge-in event."""
    BARGE_INS.labels(tenant_id=self.tenant_id).inc()

def record_error(self, error_type: str):
    """Record error."""
    ERRORS_TOTAL.labels(
        service="agent-service",
        error_type=error_type,
    ).inc()

Usage in pipeline

async def process_turn(audio, metrics: MetricsRecorder): """Process a conversation turn with metrics."""
metrics.turn_started()

# STT
with measure_latency(STT_LATENCY, {"provider": "deepgram"}):
    transcript = await stt.transcribe(audio)

# LLM
llm_start = time.time()
first_token_time = None
response = ""

async for chunk in llm.generate(transcript.text):
    if first_token_time is None:
        first_token_time = time.time()
        LLM_TTFB.labels(model="claude-sonnet").observe(first_token_time - llm_start)
    response += chunk.text

LLM_TOTAL_LATENCY.labels(model="claude-sonnet").observe(time.time() - llm_start)

# TTS
with measure_latency(TTS_LATENCY, {"provider": "chatterbox"}):
    audio = await tts.synthesize(response)

metrics.turn_completed("assistant")
return audio
62.5 Grafana Dashboards { "dashboard": { "title": "Voice AI \- Overview", "panels": \[ { "title": "Active Calls", "type": "stat", "targets": \[ { "expr": "sum(voiceai\_calls\_active)", "legendFormat": "Active Calls" } \] }, { "title": "Calls per Minute", "type": "graph", "targets": \[ { "expr": "sum(rate(voiceai\_calls\_total\[5m\])) \* 60", "legendFormat": "Calls/min" } \] }, { "title": "E2E Latency (P95)", "type": "gauge", "targets": \[ { "expr": "histogram\_quantile(0.95, sum(rate(voiceai\_e2e\_latency\_seconds\_bucket\[5m\])) by (le))", "legendFormat": "P95 Latency" } \], "thresholds": { "steps": \[ {"color": "green", "value": 0}, {"color": "yellow", "value": 1}, {"color": "red", "value": 1.5} \] } }, { "title": "Component Latencies", "type": "graph", "targets": \[ { "expr": "histogram\_quantile(0.95, sum(rate(voiceai\_stt\_latency\_seconds\_bucket\[5m\])) by (le))", "legendFormat": "STT P95" }, { "expr": "histogram\_quantile(0.95, sum(rate(voiceai\_llm\_ttfb\_seconds\_bucket\[5m\])) by (le))", "legendFormat": "LLM TTFB P95" }, { "expr": "histogram\_quantile(0.95, sum(rate(voiceai\_tts\_latency\_seconds\_bucket\[5m\])) by (le))", "legendFormat": "TTS P95" } \] }, { "title": "Error Rate", "type": "graph", "targets": \[ { "expr": "sum(rate(voiceai\_errors\_total\[5m\])) by (error\_type)", "legendFormat": "{{error\_type}}" } \] }, { "title": "Call Success Rate", "type": "stat", "targets": \[ { "expr": "sum(rate(voiceai\_calls\_total{status='completed'}\[1h\])) / sum(rate(voiceai\_calls\_total\[1h\])) \* 100", "legendFormat": "Success Rate" } \] } \] } }

Section 63: Alerting 63.1 Alert Philosophy Good alerts should be: Actionable: Someone can do something about it Urgent: Requires immediate attention Clear: Obvious what’s wrong and what to do Rare: Alert fatigue kills effectiveness 63.2 Prometheus Alert Rules

alerts/voice-ai-alerts.yaml

groups:
  • name: voice-ai-critical rules:

    ============================================================

    LATENCY ALERTS

    ============================================================

  - alert: HighE2ELatency expr: | histogram\_quantile(0.95, sum(rate(voiceai\_e2e\_latency\_seconds\_bucket\[5m\])) by (le) ) \> 1.5 for: 5m labels: severity: critical team: voice-platform annotations: summary: "E2E latency exceeds 1.5s" description: "P95 end-to-end latency is {{ $value | humanizeDuration }}" runbook: "[https://wiki.internal/runbooks/high-latency](https://wiki.internal/runbooks/high-latency)"  
      
  - alert: HighSTTLatency expr: | histogram\_quantile(0.95, sum(rate(voiceai\_stt\_latency\_seconds\_bucket\[5m\])) by (le, provider) ) \> 0.5 for: 5m labels: severity: warning team: voice-platform annotations: summary: "STT latency exceeds 500ms" description: "STT P95 latency is {{ $value | humanizeDuration }} for {{ $labels.provider }}"  
      
  - alert: HighLLMTTFB expr: | histogram\_quantile(0.95, sum(rate(voiceai\_llm\_ttfb\_seconds\_bucket\[5m\])) by (le, model) ) \> 1.0 for: 5m labels: severity: warning team: voice-platform annotations: summary: "LLM TTFB exceeds 1s" description: "LLM TTFB P95 is {{ $value | humanizeDuration }} for {{ $labels.model }}"

============================================================

ERROR ALERTS

============================================================

  - alert: HighErrorRate expr: | sum(rate(voiceai\_errors\_total\[5m\])) / sum(rate(voiceai\_calls\_total\[5m\])) \> 0.05 for: 5m labels: severity: critical team: voice-platform annotations: summary: "Error rate exceeds 5%" description: "Error rate is {{ $value | humanizePercentage }}" runbook: "[https://wiki.internal/runbooks/high-errors](https://wiki.internal/runbooks/high-errors)"  
      
  - alert: STTErrorsHigh expr: | sum(rate(voiceai\_stt\_errors\_total\[5m\])) \> 0.1 for: 5m labels: severity: warning team: voice-platform annotations: summary: "STT errors elevated" description: "STT error rate: {{ $value }} errors/sec"  
      
  - alert: LLMErrorsHigh expr: | sum(rate(voiceai\_llm\_errors\_total\[5m\])) \> 0.1 for: 5m labels: severity: warning team: voice-platform annotations: summary: "LLM errors elevated" description: "LLM error rate: {{ $value }} errors/sec"

============================================================

AVAILABILITY ALERTS

============================================================

  - alert: ServiceDown expr: up{job=\~"voiceai-.\*"} \== 0 for: 1m labels: severity: critical team: voice-platform annotations: summary: "Service {{ $labels.job }} is down" description: "Service has been unreachable for 1 minute" runbook: "[https://wiki.internal/runbooks/service-down](https://wiki.internal/runbooks/service-down)"  
      
  • alert: NoActiveCalls expr: | sum(voiceai_calls_active) == 0 and hour() &gt;= 9 and hour() &lt;= 17 and day_of_week() &gt;= 1 and day_of_week() &lt;= 5 for: 30m labels: severity: warning team: voice-platform annotations: summary: “No active calls during business hours” description: “No calls for 30 minutes during peak hours”

============================================================

CAPACITY ALERTS

============================================================

  - alert: HighCallVolume expr: sum(voiceai\_calls\_active) \> 80 for: 5m labels: severity: warning team: voice-platform annotations: summary: "High call volume" description: "{{ $value }} active calls (threshold: 80)"  
      
  - alert: CriticalCallVolume expr: sum(voiceai\_calls\_active) \> 95 for: 2m labels: severity: critical team: voice-platform annotations: summary: "Critical call volume \- near capacity" description: "{{ $value }} active calls, approaching limit" runbook: "[https://wiki.internal/runbooks/scale-up](https://wiki.internal/runbooks/scale-up)"  
      
  - alert: DatabaseConnectionsHigh expr: voiceai\_db\_connections\_active \> 80 for: 5m labels: severity: warning team: voice-platform annotations: summary: "Database connections high" description: "{{ $value }} active connections"

============================================================

COST ALERTS

============================================================

  - alert: HighTokenUsage expr: | sum(increase(voiceai\_llm\_tokens\_total\[1h\])) \> 1000000 for: 1h labels: severity: warning team: voice-platform annotations: summary: "High LLM token usage" description: "{{ $value | humanize }} tokens used in the last hour"

  • name: voice-ai-slo rules:

    ============================================================

    SLO ALERTS

    ============================================================

  - alert: SLOLatencyBreach expr: | ( sum(rate(voiceai\_e2e\_latency\_seconds\_bucket{le="1.0"}\[1h\])) / sum(rate(voiceai\_e2e\_latency\_seconds\_count\[1h\])) ) \< 0.95 for: 15m labels: severity: critical team: voice-platform annotations: summary: "SLO breach: \<95% of calls under 1s latency" description: "Only {{ $value | humanizePercentage }} of calls are under 1s latency"  
      
  - alert: SLOAvailabilityBreach expr: | ( sum(rate(voiceai\_calls\_total{status="completed"}\[1d\])) / sum(rate(voiceai\_calls\_total\[1d\])) ) \< 0.999 for: 1h labels: severity: critical team: voice-platform annotations: summary: "SLO breach: \<99.9% call success rate" description: "Call success rate is {{ $value | humanizePercentage }}" 63.3 Alertmanager Configuration

alertmanager/config.yaml

global: resolve_timeout: 5m slack_api_url: ‘https://hooks.slack.com/services/XXX/YYY/ZZZ route: receiver: ‘default’ group_by: [‘alertname’, ‘severity’] group_wait: 30s group_interval: 5m repeat_interval: 4h routes: # Critical alerts → PagerDuty + Slack - match: severity: critical receiver: ‘pagerduty-critical’ continue: true
- match:
    severity: critical
  receiver: 'slack-critical'

# Warning alerts → Slack only
- match:
    severity: warning
  receiver: 'slack-warning'
  group_wait: 5m
  repeat_interval: 12h
receivers:
  • name: ‘default’ slack_configs:
  - channel: '\#voice-ai-alerts' title: '{{ .GroupLabels.alertname }}' text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'

  • name: ‘pagerduty-critical’ pagerduty_configs:
  - service\_key: '' severity: critical description: '{{ .GroupLabels.alertname }}: {{ .CommonAnnotations.summary }}' details: firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'

  • name: ‘slack-critical’ slack_configs:
  - channel: '\#voice-ai-critical' color: 'danger' title: '🚨 CRITICAL: {{ .GroupLabels.alertname }}' text: | *Summary:* {{ .CommonAnnotations.summary }} *Description:* {{ .CommonAnnotations.description }} {{ if .CommonAnnotations.runbook }}*Runbook:* {{ .CommonAnnotations.runbook }}{{ end }} actions:  
    - type: button text: 'Acknowledge' url: '{{ .ExternalURL }}/\#/alerts'

  • name: ‘slack-warning’ slack_configs:
  - channel: '\#voice-ai-alerts' color: 'warning' title: '⚠️ Warning: {{ .GroupLabels.alertname }}' text: | {{ .CommonAnnotations.summary }} {{ .CommonAnnotations.description }}

inhibit_rules:

Don’t alert on warnings if critical is firing

  • source_match: severity: ‘critical’ target_match: severity: ‘warning’ equal: [‘alertname’] 63.4 Alert Response Procedures """ Alert response automation.
File: ops/alert_handlers.py """ from dataclasses import dataclass from typing import Callable, Dict, List import httpx import asyncio @dataclass class AlertAction: """An automated action to take for an alert.""" name: str condition: Callable[[dict], bool] action: Callable[[dict], None] cooldown_minutes: int = 10
class AlertHandler: """ Handles automated responses to alerts.

Example:
    handler = AlertHandler()
    handler.register(AlertAction(
        name="scale_up_on_high_volume",
        condition=lambda a: a["alertname"] == "CriticalCallVolume",
        action=scale_up_agent_service,
        cooldown_minutes=15,
    ))
"""

def __init__(self):
    self.actions: List[AlertAction] = []
    self._last_executed: Dict[str, float] = {}

def register(self, action: AlertAction):
    """Register an alert action."""
    self.actions.append(action)

async def handle_webhook(self, payload: dict):
    """Handle incoming alert webhook from Alertmanager."""
    for alert in payload.get("alerts", []):
        await self._process_alert(alert)

async def _process_alert(self, alert: dict):
    """Process a single alert."""
    for action in self.actions:
        if action.condition(alert):
            # Check cooldown
            last = self._last_executed.get(action.name, 0)
            if time.time() - last < action.cooldown_minutes * 60:
                continue
            
            # Execute action
            try:
                await action.action(alert)
                self._last_executed[action.name] = time.time()
            except Exception as e:
                logger.error(f"Alert action failed: {action.name}", exc_info=e)

Example actions

async def scale\_up\_agent\_service(alert: dict): """Scale up agent service replicas.""" async with httpx.AsyncClient() as client: await client.patch( "[https://k8s-api.internal/apis/apps/v1/namespaces/voiceai/deployments/agent-service](https://k8s-api.internal/apis/apps/v1/namespaces/voiceai/deployments/agent-service)", json={ "spec": { "replicas": 10 } }, headers={"Authorization": f"Bearer {K8S\_TOKEN}"}, )

await notify_slack("#voice-ai-alerts", "Scaled up agent-service to 10 replicas")
async def restart\_unhealthy\_pods(alert: dict): """Restart pods that are unhealthy.""" pod\_name \= alert.get("labels", {}).get("pod") if pod\_name: async with httpx.AsyncClient() as client: await client.delete( f"[https://k8s-api.internal/api/v1/namespaces/voiceai/pods/{pod\_name}](https://k8s-api.internal/api/v1/namespaces/voiceai/pods/{pod_name})", headers={"Authorization": f"Bearer {K8S\_TOKEN}"}, )

async def notify_on_call(alert: dict): """Send urgent notification to on-call engineer.""" await send_pagerduty_event( summary=alert[“annotations”][“summary”], severity=“critical”, source=alert[“labels”].get(“instance”, “unknown”), ) 63.5 On-Call Runbooks

Runbook: High E2E Latency

Alert

HighE2ELatency: P95 end-to-end latency exceeds 1.5 seconds

Impact

  • Users experience delayed AI responses
  • Conversation feels unnatural
  • Potential call abandonment

Investigation Steps

1. Check Component Latencies

# STT latency
histogram_quantile(0.95, sum(rate(voiceai_stt_latency_seconds_bucket[5m])) by (le))

# LLM TTFB
histogram_quantile(0.95, sum(rate(voiceai_llm_ttfb_seconds_bucket[5m])) by (le))

# TTS latency
histogram_quantile(0.95, sum(rate(voiceai_tts_latency_seconds_bucket[5m])) by (le))
2. Identify Bottleneck
If STT > 300ms → Check Deepgram status
If LLM TTFB > 500ms → Check Anthropic status, prompt length
If TTS > 200ms → Check Chatterbox/RunPod
3. Check External Services
Deepgram: https://status.deepgram.com
Anthropic: https://status.anthropic.com
RunPod: https://status.runpod.io
4. Check System Resources
kubectl top pods -n voiceai
kubectl describe pod <pod-name> -n voiceai
Mitigation
Immediate
If external service degraded → Enable fallback provider
If high load → Scale up replicas
If specific tenant affected → Contact tenant
Commands
# Enable STT fallback
kubectl set env deployment/agent-service STT_FALLBACK_ENABLED=true

# Scale up
kubectl scale deployment/agent-service --replicas=10

# Check logs
kubectl logs -l app=agent-service --tail=100 | grep -i error
Escalation
After 15 minutes: Page senior engineer
After 30 minutes: Page engineering manager
After 1 hour: Incident commander


---

## Summary: What You've Learned in Part 10B

### Section 62: Prometheus Metrics
- Counter, Gauge, Histogram metric types
- Voice-specific metrics (latency, calls, errors)
- Recording metrics in code
- Grafana dashboard concepts

### Section 63: Alerting
- Alert rule design principles
- Prometheus alerting rules
- Alertmanager configuration
- Automated alert responses
- On-call runbooks

---

## What's Next

In **Part 10C**, you'll learn:
- Incident response procedures
- Scaling strategies
- Operational runbooks
- Post-incident reviews

---

*End of Part 10B — Continue to Part 10C*

Junior Developer PRD — Part 10C: Operations & Scaling
Document Version: 1.0
Last Updated: January 25, 2026
Part: 10C of 10 (FINAL)
Sections: 64-65


Table of Contents
Section 64: Incident Response
Section 65: Scaling Strategies
PRD Conclusion


Section 64: Incident Response
64.1 Severity Levels
Severity
Description
Response Time
Example
SEV1
Complete outage
5 min
All calls failing
SEV2
Major degradation
15 min
>10% error rate
SEV3
Minor issue
1 hour
Single tenant affected
SEV4
Low impact
4 hours
Dashboard bug

64.2 Incident Response Flow
1. DETECTION → Alert fires or customer reports
2. TRIAGE → Identify scope, check recent deploys
3. MITIGATION → Rollback, enable fallbacks, scale
4. RESOLUTION → Fix root cause, verify fix
5. POST-MORTEM → Document, create action items
64.3 Common Runbooks
Service Down
# Check pods
kubectl get pods -n voiceai -l app=<service>
kubectl logs <pod> -n voiceai --previous

# Rollback if recent deploy
kubectl rollout undo deployment/<service> -n voiceai

# Force restart
kubectl rollout restart deployment/<service> -n voiceai
High Error Rate
# Check logs for errors
kubectl logs -l app=agent-service --tail=500 | grep -i error

# Enable fallbacks
kubectl set env deployment/agent-service \
  STT_FALLBACK_ENABLED=true \
  TTS_FALLBACK_ENABLED=true
Database Issues
-- Check connections
SELECT count(*) FROM pg_stat_activity;

-- Find slow queries
SELECT pid, now() - query_start AS duration, query
FROM pg_stat_activity WHERE state != 'idle'
ORDER BY duration DESC LIMIT 10;

-- Kill long queries
SELECT pg_terminate_backend(pid) FROM pg_stat_activity
WHERE duration > interval '5 minutes';
64.4 Post-Incident Review Template
## Incident: [Title]
- **ID:** INC-YYYYMMDD
- **Severity:** SEV2
- **Duration:** 45 minutes

## Timeline
| Time | Event |
|------|-------|
| 14:30 | Alert fired |
| 14:32 | On-call acknowledged |
| 14:45 | Mitigation applied |
| 15:15 | Resolved |

## Root Cause
[Description of what caused the incident]

## Action Items
| Task | Owner | Due |
|------|-------|-----|
| Add monitoring | Alice | 2026-01-27 |


Section 65: Scaling Strategies
65.1 Horizontal Pod Autoscaler
# k8s/hpa/agent-service.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: agent-service-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: agent-service
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Pods
      pods:
        metric:
          name: voiceai_calls_active
        target:
          type: AverageValue
          averageValue: "10"
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
    scaleUp:
      stabilizationWindowSeconds: 0
65.2 Database Scaling
# Read replica routing
class ReplicaAwarePool:
    def __init__(self, primary_dsn, replica_dsn):
        self.primary_dsn = primary_dsn
        self.replica_dsn = replica_dsn
    
    def acquire(self, read_only=True):
        if read_only:
            return self._replica_pool.acquire()
        return self._primary_pool.acquire()
65.3 Caching Strategy
# Multi-level cache: Memory → Redis → Database
class MultiLevelCache:
    async def get_or_fetch(self, key, fetch_fn, ttl):
        # L1: Memory
        if key in self.l1_cache:
            return self.l1_cache[key]
        
        # L2: Redis
        value = await self.redis.get(key)
        if value:
            self.l1_cache[key] = json.loads(value)
            return self.l1_cache[key]
        
        # Fetch and cache
        value = await fetch_fn()
        await self.redis.setex(key, ttl, json.dumps(value))
        self.l1_cache[key] = value
        return value
65.4 Capacity Planning
Metric
Per Pod
10 Pods
20 Pods
Concurrent calls
15
150
300
Calls/minute
30
300
600
Target utilization
70%
70%
70%



PRD Conclusion
Complete System Architecture
┌─────────────────────────────────────────────────────────────┐
│                    VOICE AI PLATFORM                        │
├─────────────────────────────────────────────────────────────┤
│  INFRASTRUCTURE: Kubernetes, PostgreSQL, Redis, MinIO       │
│  SERVICES: API Gateway, Agent Service, KB Service           │
│  AI PIPELINE: VAD → STT → RAG → LLM → TTS                  │
│  INTEGRATIONS: GoToConnect, Deepgram, Claude, Chatterbox   │
│  OPERATIONS: Prometheus, Grafana, Alertmanager, Tracing    │
└─────────────────────────────────────────────────────────────┘
PRD Document Index
Part
Content
1-6
Foundation, Architecture, Database
7A-C
Voice Pipeline (VAD, STT, LLM, TTS)
8A-C
Knowledge Base & RAG
9A-C
Testing & CI/CD
10A-C
Operations & Monitoring

Key Metrics
Metric
Target
E2E Latency P95
< 1000ms
Success Rate
> 99.9%
STT Accuracy
> 95%
Error Rate
< 0.1%

Implementation Timeline
Week
Focus
1-2
Infrastructure setup
3-4
Core services
5-6
AI provider integration
7-8
Voice pipeline
9-10
RAG & knowledge base
11-12
Testing & hardening
13+
Production deployment




Congratulations! You now have a complete understanding of building a production Voice AI system.

Remember:

Start simple, iterate
Measure everything
Plan for failure
Test thoroughly
Monitor actively

Good luck building! 🚀



End of Junior Developer PRD Series — 15 documents, 65 sections

Last modified on April 18, 2026