Normalized for Mintlify from knowledge-base/aiconnected-apps-and-modules/modules/aiConnected-voice/system-architecture.mdx.
Voice by aiConnected — System Architecture Overview \
| Field | Value |
|---|
| Document ID | ARCH-001 |
| Version | 1.0 |
| Last Updated | 2026-01-16 |
| Status | Draft |
| Owner | Engineering |
Table of Contents \
Voice by aiConnected — System Architecture Overview
Document Information
Table of Contents
1. Introduction
1.1 Purpose
1.2 Scope
1.3 Architecture Principles
1.4 Terminology
2. System Overview
2.1 What the System Does
2.2 High-Level Architecture Diagram
2.3 Component Summary
3. Component Architecture
3.1 API Gateway
3.1.1 Overview
3.1.2 Responsibilities
3.1.3 Architecture
3.1.4 Key Endpoints
3.1.5 Configuration
3.2 WebRTC Bridge
3.2.1 Overview
3.2.2 Responsibilities
3.2.3 Architecture
3.2.4 Audio Flow
3.2.5 Call State Machine
3.2.6 Configuration
3.3 Agent Service
3.3.1 Overview
3.3.2 Responsibilities
3.3.3 Architecture
3.3.4 Voice Pipeline Detail
3.3.5 Configuration
3.4 Worker Service
3.4.1 Overview
3.4.2 Responsibilities
3.4.3 Architecture
3.4.4 Task Definitions
3.4.5 Configuration
3.5 Chatterbox TTS Service
3.5.1 Overview
3.5.2 Responsibilities
3.5.3 Architecture
3.5.4 API Endpoints
3.5.5 Configuration
4. Data Flow Architecture
4.1 Inbound Call Flow
4.2 Outbound Call Flow
4.3 Transfer Flow
4.4 Tool Calling Flow
5. Service Boundaries
5.1 Service Responsibility Matrix
5.2 Service Communication
5.3 Event Catalog
5.4 API Contracts Between Services
5.4.1 WebRTC Bridge → Agent Service
5.4.2 Agent Service → WebRTC Bridge
5.4.3 Agent Service → Chatterbox TTS
6. Network Topology
6.1 Network Diagram
6.2 Port Matrix
6.3 Firewall Rules
6.4 DNS Configuration
7. External Service Dependencies
7.1 Dependency Map
7.2 Service Level Objectives
7.3 Authentication and Credentials
7.4 Rate Limits
8. Internal Service Architecture
8.1 Service Template
8.2 Shared Libraries
8.3 Configuration Management
9. Data Architecture
9.1 Database Schema Overview
9.2 Core Tables
tenants
agents
calls
transcripts
9.3 Redis Data Structures
9.4 Data Retention Policy
10. Security Architecture
10.1 Security Layers
10.2 Authentication Flow
10.3 Data Encryption
11. Scalability Architecture
11.1 Horizontal Scaling Strategy
11.2 Capacity Planning
11.3 Auto-Scaling Configuration
12. Failure Modes and Recovery
12.1 Failure Scenarios
12.2 Circuit Breaker Configuration
12.3 Graceful Degradation Hierarchy
13. Monitoring and Observability
13.1 Metrics Architecture
13.2 Key Metrics
13.3 Logging Strategy
13.4 Alerting Rules
14. Deployment Architecture
14.1 Container Architecture
14.2 Dokploy Configuration
14.3 Environment Promotion
15. Architecture Decision Records
ADR-001: Use GoToConnect for Telephony
ADR-002: Use LiveKit for Real-Time Audio
ADR-003: Self-Host TTS on RunPod
ADR-004: Use Redis for Call State
ADR-005: PostgreSQL for Persistent Data
Appendix A: Glossary
Appendix B: Document History
- Introduction \
1.1 Purpose \
This document provides a comprehensive technical overview of the Voice by aiConnected platform architecture. It serves as the authoritative reference for understanding how the system is structured, how components interact, and the rationale behind key architectural decisions.
This document is intended for:
- Engineers implementing the system
- Technical reviewers evaluating the architecture
- Operations teams deploying and maintaining the platform
- Future maintainers who need to understand the system design
1.2 Scope \
This document covers:
- High-level system architecture and component relationships
- Detailed data flows for all major operations
- Service boundaries and responsibilities
- Network topology and communication patterns
- Integration points with external services
- Scalability and reliability considerations
This document does not cover:
- Detailed API specifications (see Document ARCH-023: API Specification)
- Implementation-level code design (see individual service documents)
- Operational procedures (see Document OPS-025: Deployment Runbook)
1.3 Architecture Principles \
The Voice by aiConnected architecture is guided by the following principles:
1. Latency is King Every architectural decision prioritizes minimizing end-to-end latency. Voice conversations require sub-second response times to feel natural. We stream everything, avoid batching, and minimize network hops.
2. Graceful Degradation The system must continue operating when components fail. Each service has fallback behaviors, and partial functionality is preferred over complete failure.
3. Horizontal Scalability The system scales by adding instances, not by making instances larger. State is externalized to shared stores (PostgreSQL, Redis) so any instance can handle any request.
4. Tenant Isolation Multiple businesses share the same infrastructure, but their data and configurations are strictly isolated. A failure or misconfiguration for one tenant must not affect others.
5. Observable by Default Every component emits metrics, logs, and traces. We can understand system behavior in production without deploying debugging code.
6. Infrastructure Ownership Where It Matters We own infrastructure for components where it provides cost or capability advantages (TTS), but use managed services where operational burden outweighs benefits (telephony routing, real-time audio).
1.4 Terminology \
| Term | Definition |
|---|
| Tenant | A business customer using the platform |
| Agent | An AI configuration for a specific use case (e.g., “Appointment Scheduler”) |
| Call | A single phone conversation, inbound or outbound |
| Session | The runtime state of an active call |
| Pipeline | The STT → LLM → TTS processing chain |
| Bridge | The component connecting GoToConnect to LiveKit |
| Room | A LiveKit virtual space where call participants connect |
| Turn | One speaker’s contribution to a conversation |
| Barge-in | When a caller interrupts the AI mid-speech |
- System Overview \
2.1 What the System Does \
Voice by aiConnected is a multi-tenant Voice AI platform that enables businesses to deploy AI agents capable of handling phone conversations. The system:
- Receives phone calls via integration with GoToConnect PBX
- Transcribes speech using Deepgram’s streaming STT
- Generates responses using Anthropic’s Claude LLM
- Synthesizes speech using self-hosted Chatterbox TTS
- Executes actions via webhook-based tool calling
- Transfers calls to human agents when appropriate
- Tracks usage for billing and analytics
2.2 High-Level Architecture Diagram \
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ │
│ VOICE BY AICONNECTED │
│ SYSTEM ARCHITECTURE │
│ │
├─────────────────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ EXTERNAL LAYER │ │
│ │ │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ PSTN │ │ GoTo │ │ LiveKit │ │Deepgram │ │Anthropic│ │ │
│ │ │Callers │───▶│Connect │ │ Cloud │ │ API │ │ API │ │ │
│ │ └─────────┘ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ │
│ │ │ │ │ │ │ │
│ └───────────────────────┼──────────────┼──────────────┼──────────────┼──────────┘ │
│ │ │ │ │ │
│ ┌───────────────────────┼──────────────┼──────────────┼──────────────┼──────────┐ │
│ │ │ PLATFORM LAYER │ │ │ │
│ │ │ │ │ │ │ │
│ │ ┌───────────────────▼──────────────▼──────────────┴──────────────┴────┐ │ │
│ │ │ │ │ │
│ │ │ DIGITALOCEAN / DOKPLOY │ │ │
│ │ │ │ │ │
│ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │
│ │ │ │ API │ │ WebRTC │ │ Agent │ │ │ │
│ │ │ │ Gateway │ │ Bridge │ │ Service │ │ │ │
│ │ │ │ (FastAPI) │ │ (aiortc) │ │ (LiveKit) │ │ │ │
│ │ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ │ │
│ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │
│ │ │ ┌──────▼─────────────────▼─────────────────▼───────┐ │ │ │
│ │ │ │ EVENT BUS │ │ │ │
│ │ │ │ (Redis) │ │ │ │
│ │ │ └──────┬─────────────────┬─────────────────┬───────┘ │ │ │
│ │ │ │ │ │ │ │ │
│ │ │ ┌──────▼───────┐ ┌──────▼───────┐ ┌──────▼───────┐ │ │ │
│ │ │ │ Worker │ │ PostgreSQL │ │ Redis │ │ │ │
│ │ │ │ Service │ │ (Database) │ │ (Cache) │ │ │ │
│ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │
│ │ │ │ │ │
│ │ └──────────────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────────────────────────┐ │
│ │ GPU LAYER (RUNPOD) │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │
│ │ │ CHATTERBOX TTS │ │ │
│ │ │ (RTX A5000) │ │ │
│ │ └─────────────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────────────────────────┐ │
│ │ INTEGRATION LAYER │ │
│ │ │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ n8n │ │Knowledge│ │ CRM │ │Calendar │ │ │
│ │ │Webhooks │ │ Base │ │ APIs │ │ APIs │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────────────────┘
2.3 Component Summary \
| Component | Technology | Location | Purpose |
|---|
| API Gateway | FastAPI | DigitalOcean | Public API, authentication, routing |
| WebRTC Bridge | Python/aiortc | DigitalOcean | GoToConnect ↔ LiveKit audio bridging |
| Agent Service | Python/LiveKit SDK | DigitalOcean | AI conversation management |
| Worker Service | Python/Celery | DigitalOcean | Background job processing |
| PostgreSQL | PostgreSQL 15 | DigitalOcean | Relational data storage |
| Redis | Redis 7 | DigitalOcean | Cache, state, pub/sub |
| Chatterbox TTS | Python/PyTorch | RunPod (A5000) | Speech synthesis |
| GoToConnect | SaaS | External | Telephony/PBX |
| LiveKit Cloud | SaaS | External | Real-time audio infrastructure |
| Deepgram | SaaS | External | Speech-to-text |
| Anthropic | SaaS | External | LLM (Claude) |
- Component Architecture \
3.1 API Gateway \
3.1.1 Overview \
The API Gateway is the public-facing entry point for all HTTP traffic. It handles authentication, request routing, rate limiting, and serves as the control plane for tenant and agent management.
3.1.2 Responsibilities \
- Authentication: Validate API keys, issue and verify JWT tokens
- Authorization: Enforce tenant-scoped access control
- Request Routing: Direct requests to appropriate internal services
- Rate Limiting: Protect against abuse and ensure fair resource allocation
- Request Validation: Validate payloads against OpenAPI schemas
- Response Formatting: Ensure consistent API response structure
- Audit Logging: Record all API operations for compliance
3.1.3 Architecture \
┌─────────────────────────────────────────────────────────────────────────────┐
│ API GATEWAY │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ MIDDLEWARE STACK │ │
│ │ │ │
│ │ Request ──▶ [CORS] ──▶ [Auth] ──▶ [RateLimit] ──▶ [Tenant] ──▶ ... │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ ROUTERS │ │
│ │ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ /tenants │ │ /agents │ │ /calls │ │ /webhooks│ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
│ │ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ /voices │ │ /numbers │ │ /usage │ │ /health │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ DEPENDENCIES │ │
│ │ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ DB │ │ Redis │ │ Event │ │ Config │ │ │
│ │ │ Session │ │ Client │ │ Bus │ │ Store │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
3.1.4 Key Endpoints \
| Endpoint | Method | Purpose |
|---|
/v1/agents | GET, POST | List and create AI agents |
/v1/agents/{id} | GET, PUT, DELETE | Manage specific agent |
/v1/calls | GET, POST | List calls and initiate outbound calls |
/v1/calls/{id} | GET | Get call details and transcript |
/v1/calls/{id}/transfer | POST | Initiate call transfer |
/v1/numbers | GET, POST | Manage phone number assignments |
/v1/voices | GET, POST | Manage voice configurations |
/v1/usage | GET | Retrieve usage metrics for billing |
/v1/webhooks | GET, POST | Configure webhook endpoints |
/health | GET | Health check endpoint |
3.1.5 Configuration \
api_gateway:
host: 0.0.0.0
port: 8000
workers: 4
cors:
allowed_origins:
- "https://app.aiconnected.io"
- "https://admin.aiconnected.io"
allowed_methods: ["GET", "POST", "PUT", "DELETE", "OPTIONS"]
allowed_headers: ["Authorization", "Content-Type", "X-Request-ID"]
rate_limiting:
default_limit: 100 # requests per minute
burst_limit: 20 # concurrent requests
by_tenant: true # limits applied per tenant
authentication:
api_key_header: "X-API-Key"
jwt_algorithm: "HS256"
jwt_expiry_minutes: 60
3.2 WebRTC Bridge \
3.2.1 Overview \
The WebRTC Bridge is the critical component that connects the traditional telephone network (via GoToConnect) to the real-time AI processing infrastructure (via LiveKit). It handles bidirectional audio streaming, protocol translation, and call lifecycle management.
3.2.2 Responsibilities \
- WebRTC Signaling: Handle SDP offer/answer exchange with GoToConnect
- Audio Reception: Receive audio frames from GoToConnect WebRTC connection
- Audio Transmission: Send synthesized audio back to GoToConnect
- LiveKit Integration: Publish and subscribe to audio tracks in LiveKit rooms
- Call Control: Execute transfers, holds, and other call control operations
- State Management: Maintain call state and handle state transitions
3.2.3 Architecture \
┌─────────────────────────────────────────────────────────────────────────────┐
│ WEBRTC BRIDGE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ GOTOCONNECT INTERFACE │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ WebRTC │ │ Call Event │ │ Call Control │ │ │
│ │ │ Signaling │ │ Subscriber │ │ Client │ │ │
│ │ │ Handler │ │ (WebSocket) │ │ (REST) │ │ │
│ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ │
│ │ │ │ │ │ │
│ └─────────┼─────────────────┼─────────────────┼────────────────────────┘ │
│ │ │ │ │
│ ┌─────────▼─────────────────▼─────────────────▼────────────────────────┐ │
│ │ BRIDGE CORE │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Connection │ │ Audio │ │ State │ │ │
│ │ │ Manager │ │ Pipeline │ │ Machine │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Codec │ │ Resampler │ │ Buffer │ │ │
│ │ │ Handler │ │ │ │ Manager │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ │ │
│ └─────────┬─────────────────┬─────────────────┬────────────────────────┘ │
│ │ │ │ │
│ ┌─────────▼─────────────────▼─────────────────▼────────────────────────┐ │
│ │ LIVEKIT INTERFACE │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Room │ │ Track │ │ Participant│ │ │
│ │ │ Manager │ │ Publisher │ │ Manager │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
3.2.4 Audio Flow \
┌─────────────────────────────────────────────────────────────────────────────┐
│ AUDIO FLOW DETAIL │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ INBOUND (Caller → AI): │
│ │
│ GoToConnect Bridge LiveKit Agent │
│ │ │ │ │ │
│ │ Opus/48kHz │ │ │ │
│ │───────────────▶│ │ │ │
│ │ │ Decode Opus │ │ │
│ │ │ ────────────▶ │ │ │
│ │ │ Resample if │ │ │
│ │ │ needed │ │ │
│ │ │ ────────────▶ │ │ │
│ │ │ Encode Opus │ │ │
│ │ │ ────────────▶ │ │ │
│ │ │ │ │ │
│ │ │ Publish Track │ │ │
│ │ │──────────────────▶│ │ │
│ │ │ │ Subscribe │ │
│ │ │ │────────────────▶│ │
│ │ │ │ │ │
│ │
│ OUTBOUND (AI → Caller): │
│ │
│ Agent LiveKit Bridge GoToConnect │
│ │ │ │ │ │
│ │ Publish Track │ │ │ │
│ │───────────────▶│ │ │ │
│ │ │ Subscribe │ │ │
│ │ │──────────────────▶│ │ │
│ │ │ │ Decode Opus │ │
│ │ │ │ ────────────▶ │ │
│ │ │ │ Resample if │ │
│ │ │ │ needed │ │
│ │ │ │ ────────────▶ │ │
│ │ │ │ Encode Opus │ │
│ │ │ │ ────────────▶ │ │
│ │ │ │ │ │
│ │ │ │ Send via │ │
│ │ │ │ WebRTC │ │
│ │ │ │────────────────▶│ │
│ │ │ │ │ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
3.2.5 Call State Machine \
┌─────────────────────────────────────────────────────────────────────────────┐
│ CALL STATE MACHINE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────┐ │
│ │ INITIAL │ │
│ └─────┬─────┘ │
│ │ │
│ ┌───────────────┴───────────────┐ │
│ │ │ │
│ ▼ ▼ │
│ ┌───────────┐ ┌───────────┐ │
│ │ RINGING │ │ DIALING │ │
│ │ (inbound) │ │ (outbound)│ │
│ └─────┬─────┘ └─────┬─────┘ │
│ │ │ │
│ │ answer │ connect │
│ │ │ │
│ └───────────────┬───────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────┐ │
│ │ CONNECTED │ │
│ └─────┬─────┘ │
│ │ │
│ │ agent_joined │
│ │ │
│ ▼ │
│ ┌────────────┐ │
│ │ CONVERSING │◀──────────────────┐ │
│ └──────┬─────┘ │ │
│ │ │ │
│ ┌───────────────────┼───────────────────┐ │ │
│ │ │ │ │ │
│ ▼ ▼ ▼ │ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐│ │
│ │ ON_HOLD │ │TRANSFERRING │ ERROR ││ │
│ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘│ │
│ │ │ │ │ │
│ │ resume │ │ │ │
│ │ │ │ │ │
│ └───────────────────┴───────────────────┘ │ │
│ │ │ │
│ │ transfer_complete │ │
│ │ (to different agent) │ │
│ │ │ │
│ └──────────────────────────┘ │
│ │
│ │
│ All states can transition to ENDED: │
│ │
│ ┌───────────┐ │
│ │ ENDED │ │
│ └───────────┘ │
│ │
│ Triggers: hangup, timeout, error, transfer_to_human │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
3.2.6 Configuration \
webrtc_bridge:
host: 0.0.0.0
port: 8001
gotoconnect:
api_base_url: "https://api.goto.com"
websocket_url: "wss://realtime.goto.com"
oauth:
client_id: "${GOTO_CLIENT_ID}"
client_secret: "${GOTO_CLIENT_SECRET}"
scopes:
- "webrtc.v1.write"
- "call-events.v1.notifications.manage"
- "call-control.v1.calls.write"
livekit:
url: "${LIVEKIT_URL}"
api_key: "${LIVEKIT_API_KEY}"
api_secret: "${LIVEKIT_API_SECRET}"
audio:
input_sample_rate: 48000
output_sample_rate: 48000
channels: 1
frame_duration_ms: 20
codec: "opus"
timeouts:
call_setup_timeout_seconds: 30
idle_timeout_seconds: 300
max_call_duration_seconds: 3600
3.3 Agent Service \
3.3.1 Overview \
The Agent Service hosts the AI agents that participate in phone conversations. It uses the LiveKit Agents SDK to manage the voice pipeline (STT → LLM → TTS) and handles conversation logic, tool calling, and transfer decisions.
3.3.2 Responsibilities \
- Agent Lifecycle: Spawn, manage, and terminate AI agent instances
- Voice Pipeline: Orchestrate STT, LLM, and TTS components
- Conversation Management: Maintain conversation context and history
- Tool Execution: Handle function calling and webhook dispatch
- Transfer Logic: Determine when and how to transfer to humans
- Interruption Handling: Manage barge-in and conversation flow
3.3.3 Architecture \
┌─────────────────────────────────────────────────────────────────────────────┐
│ AGENT SERVICE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ AGENT MANAGER │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Agent │ │ Agent │ │ Agent │ │ │
│ │ │ Factory │ │ Pool │ │ Registry │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ AGENT INSTANCE │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ VOICE PIPELINE │ │ │
│ │ │ │ │ │
│ │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ │
│ │ │ │ VAD │───▶│ STT │───▶│ LLM │───▶│ TTS │ │ │ │
│ │ │ │ │ │(Deepgram│ │(Claude) │ │(Chatter │ │ │ │
│ │ │ │ │ │ │ │ │ │ box) │ │ │ │
│ │ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │ │
│ │ │ │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ CONVERSATION ENGINE │ │ │
│ │ │ │ │ │
│ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │
│ │ │ │ Context │ │ Tool │ │ Transfer │ │ │ │
│ │ │ │ Manager │ │ Handler │ │ Decision │ │ │ │
│ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │
│ │ │ │ │ │
│ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │
│ │ │ │ Knowledge │ │ Interrupt │ │ Greeting │ │ │ │
│ │ │ │ Base │ │ Handler │ │ Handler │ │ │ │
│ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │
│ │ │ │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ LIVEKIT INTEGRATION │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Room │ │ Track │ │ Event │ │ │
│ │ │ Handler │ │ Handler │ │ Handler │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
3.3.4 Voice Pipeline Detail \
┌─────────────────────────────────────────────────────────────────────────────┐
│ VOICE PIPELINE DETAIL │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Audio Input (from LiveKit) │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ VOICE ACTIVITY DETECTION │ │
│ │ │ │
│ │ - Silero VAD model │ │
│ │ - Detects speech start/end │ │
│ │ - Triggers pipeline stages │ │
│ │ - Handles barge-in detection │ │
│ │ │ │
│ └───────────────────────────────┬─────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ SPEECH-TO-TEXT (Deepgram) │ │
│ │ │ │
│ │ - Streaming transcription │ │
│ │ - Interim results for early processing │ │
│ │ - Final results trigger LLM │ │
│ │ - Language: en-US │ │
│ │ - Model: nova-2 │ │
│ │ │ │
│ └───────────────────────────────┬─────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ CONTEXT ASSEMBLY │ │
│ │ │ │
│ │ - System prompt (agent configuration) │ │
│ │ - Knowledge base retrieval (RAG) │ │
│ │ - Conversation history │ │
│ │ - Tool definitions │ │
│ │ - Current user message │ │
│ │ │ │
│ └───────────────────────────────┬─────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ LARGE LANGUAGE MODEL (Claude) │ │
│ │ │ │
│ │ - Streaming response generation │ │
│ │ - Function calling for tools │ │
│ │ - Model: claude-sonnet-4-20250514 │ │
│ │ - Temperature: 0.7 │ │
│ │ - Max tokens: 1024 │ │
│ │ │ │
│ └───────────────────────────────┬─────────────────────────────────────┘ │
│ │ │
│ ┌─────────────┴─────────────┐ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────────────────┐ ┌──────────────────────────┐ │
│ │ TEXT RESPONSE │ │ TOOL CALL │ │
│ │ │ │ │ │
│ │ - Token buffering │ │ - Extract function │ │
│ │ - Sentence detection │ │ - Execute via webhook │ │
│ │ - TTS dispatch │ │ - Inject result │ │
│ │ │ │ - Continue generation │ │
│ └────────────┬─────────────┘ └──────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ TEXT-TO-SPEECH (Chatterbox) │ │
│ │ │ │
│ │ - Streaming synthesis │ │
│ │ - Voice cloning support │ │
│ │ - Paralinguistic tags ([laugh], [cough]) │ │
│ │ - Model: Chatterbox-Turbo │ │
│ │ │ │
│ └───────────────────────────────┬─────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Audio Output (to LiveKit) │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
3.3.5 Configuration \
agent_service:
host: 0.0.0.0
port: 8002
livekit:
url: "${LIVEKIT_URL}"
api_key: "${LIVEKIT_API_KEY}"
api_secret: "${LIVEKIT_API_SECRET}"
stt:
provider: "deepgram"
model: "nova-2"
language: "en-US"
interim_results: true
punctuate: true
smart_format: true
llm:
provider: "anthropic"
model: "claude-sonnet-4-20250514"
temperature: 0.7
max_tokens: 1024
streaming: true
tts:
provider: "chatterbox"
endpoint: "${CHATTERBOX_URL}"
model: "turbo"
default_voice_id: "default_female_1"
vad:
model: "silero"
threshold: 0.5
min_speech_duration_ms: 250
min_silence_duration_ms: 300
conversation:
max_history_tokens: 8000
summarize_after_turns: 20
greeting_enabled: true
transfer_enabled: true
tools:
webhook_timeout_seconds: 10
max_concurrent_tools: 3
3.4 Worker Service \
3.4.1 Overview \
The Worker Service handles asynchronous background jobs that don’t need to happen in real-time. This includes usage aggregation, transcript processing, webhook retries, and scheduled tasks.
3.4.2 Responsibilities \
- Usage Aggregation: Compile per-tenant usage statistics for billing
- Transcript Processing: Post-process and store call transcripts
- Webhook Delivery: Retry failed webhook deliveries
- Scheduled Tasks: Execute periodic maintenance jobs
- Report Generation: Generate usage reports and analytics
3.4.3 Architecture \
┌─────────────────────────────────────────────────────────────────────────────┐
│ WORKER SERVICE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ TASK QUEUES │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ default │ │ webhooks │ │ reports │ │ │
│ │ │ queue │ │ queue │ │ queue │ │ │
│ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ │
│ │ │ │ │ │ │
│ └─────────┼─────────────────┼─────────────────┼────────────────────────┘ │
│ │ │ │ │
│ ┌─────────▼─────────────────▼─────────────────▼────────────────────────┐ │
│ │ TASK HANDLERS │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Usage │ │ Webhook │ │ Transcript │ │ │
│ │ │ Aggregation │ │ Delivery │ │ Processing │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Report │ │ Cleanup │ │ Billing │ │ │
│ │ │ Generation │ │ Tasks │ │ Sync │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ SCHEDULER │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────────────────────┐ │ │
│ │ │ Cron Jobs: │ │ │
│ │ │ - Hourly usage aggregation │ │ │
│ │ │ - Daily report generation │ │ │
│ │ │ - Weekly cleanup of old sessions │ │ │
│ │ │ - Monthly billing sync │ │ │
│ │ └──────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
3.4.4 Task Definitions \
| Task | Queue | Schedule | Description |
|---|
aggregate_usage | default | Hourly | Compile minute counts per tenant |
process_transcript | default | On event | Store and index call transcript |
deliver_webhook | webhooks | On event | Send webhook with retry |
generate_daily_report | reports | Daily 00:00 UTC | Generate usage reports |
cleanup_sessions | default | Weekly | Remove expired session data |
sync_billing | default | Monthly | Sync usage to billing system |
3.4.5 Configuration \
worker_service:
concurrency: 4
queues:
default:
concurrency: 2
webhooks:
concurrency: 4
rate_limit: 100/m
reports:
concurrency: 1
retry:
max_retries: 5
backoff_base: 60 # seconds
backoff_max: 3600
scheduler:
timezone: "UTC"
jobs:
- name: "aggregate_usage"
cron: "0 * * * *" # Every hour
- name: "generate_daily_report"
cron: "0 0 * * *" # Midnight UTC
- name: "cleanup_sessions"
cron: "0 2 * * 0" # Sunday 2am UTC
3.5 Chatterbox TTS Service \
3.5.1 Overview \
The Chatterbox TTS Service runs on a dedicated GPU instance (RunPod RTX A5000) and provides speech synthesis for all agents. It exposes a simple HTTP API that the Agent Service calls to convert text to audio.
3.5.2 Responsibilities \
- Speech Synthesis: Convert text to natural-sounding speech
- Voice Management: Load and cache voice models
- Streaming Output: Support chunked audio output for low latency
- Paralinguistics: Process tags like
[laugh], [cough]
3.5.3 Architecture \
┌─────────────────────────────────────────────────────────────────────────────┐
│ CHATTERBOX TTS SERVICE │
│ (RunPod A5000) │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ API LAYER │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ FastAPI │ │ Health │ │ Metrics │ │ │
│ │ │ Server │ │ Check │ │ Endpoint │ │ │
│ │ └──────┬───────┘ └──────────────┘ └──────────────┘ │ │
│ │ │ │ │
│ └─────────┼────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────▼────────────────────────────────────────────────────────────┐ │
│ │ SYNTHESIS ENGINE │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Text │ │ Model │ │ Audio │ │ │
│ │ │ Preprocessor │ │ Inference │ │ Encoder │ │ │
│ │ │ │ │ (Turbo) │ │ │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ VOICE MANAGEMENT │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Voice │ │ Voice │ │ Reference │ │ │
│ │ │ Registry │ │ Cache │ │ Storage │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ GPU RESOURCES │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ RTX A5000 (24GB VRAM) │ │ │
│ │ │ │ │ │
│ │ │ Model: ~4GB │ Inference: ~8GB │ Headroom │ │ │
│ │ │ │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
3.5.4 API Endpoints \
| Endpoint | Method | Purpose |
|---|
/synthesize | POST | Generate speech from text |
/synthesize/stream | POST | Stream audio chunks |
/voices | GET | List available voices |
/voices/{id} | GET | Get voice details |
/health | GET | Health check |
/metrics | GET | Prometheus metrics |
3.5.5 Configuration \
chatterbox_service:
host: 0.0.0.0
port: 8080
model:
name: "chatterbox-turbo"
device: "cuda"
precision: "float16"
synthesis:
sample_rate: 24000
default_exaggeration: 0.5
default_cfg_weight: 0.5
voices:
storage_path: "/data/voices"
cache_size: 10 # voices in memory
streaming:
chunk_duration_ms: 100
buffer_chunks: 3
- Data Flow Architecture \
4.1 Inbound Call Flow \
This section details the complete data flow for an inbound phone call, from the moment it arrives at GoToConnect to when the conversation ends.
┌─────────────────────────────────────────────────────────────────────────────┐
│ INBOUND CALL DATA FLOW │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. CALL ARRIVES │
│ ───────────── │
│ │
│ Caller ──PSTN──▶ GoToConnect │
│ │ │
│ │ WebSocket Event: "call.ringing" │
│ │ { │
│ │ "call_id": "abc123", │
│ │ "from": "+15551234567", │
│ │ "to": "+15559876543", │
│ │ "direction": "inbound" │
│ │ } │
│ ▼ │
│ WebRTC Bridge │
│ │ │
│ │ 1. Lookup tenant by phone number │
│ │ 2. Load agent configuration │
│ │ 3. Create call record in PostgreSQL │
│ │ 4. Store initial state in Redis │
│ │ │
│ │
│ 2. CALL ANSWERED │
│ ───────────── │
│ │
│ WebRTC Bridge │
│ │ │
│ │ POST /web-calls/v1/calls/{id}/answer │
│ │ │
│ ▼ │
│ GoToConnect │
│ │ │
│ │ Returns SDP offer │
│ │ │
│ ▼ │
│ WebRTC Bridge │
│ │ │
│ │ 1. Create RTCPeerConnection │
│ │ 2. Set remote description (offer) │
│ │ 3. Create answer │
│ │ 4. Set local description (answer) │
│ │ 5. Return SDP answer to GoToConnect │
│ │ │
│ │ Parallel: Create LiveKit room │
│ │ Room name: "call-{tenant_id}-{call_id}" │
│ │ │
│ │
│ 3. AUDIO STREAMING ESTABLISHED │
│ ──────────────────────────── │
│ │
│ GoToConnect ──WebRTC──▶ Bridge ──LiveKit──▶ Room │
│ │ │
│ │ Publish Event: │
│ │ "room.ready" │
│ │ │
│ ▼ │
│ Agent Service │
│ │ │
│ │ 1. Load agent config │
│ │ 2. Initialize pipeline│
│ │ 3. Join LiveKit room │
│ │ 4. Subscribe to audio │
│ │ │
│ │
│ 4. CONVERSATION LOOP │
│ ───────────────── │
│ │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Caller Audio ──▶ Bridge ──▶ LiveKit ──▶ Agent │ │
│ │ │ │ │ │
│ │ │ ┌──────────┴──────────┐ │ │
│ │ │ │ │ │ │
│ │ │ ▼ │ │ │
│ │ │ ┌─────────┐ │ │ │
│ │ │ │ STT │ │ │ │
│ │ │ │Deepgram │ │ │ │
│ │ │ └────┬────┘ │ │ │
│ │ │ │ │ │ │
│ │ │ │ Transcript │ │ │
│ │ │ ▼ │ │ │
│ │ │ ┌─────────┐ │ │ │
│ │ │ │ LLM │ │ │ │
│ │ │ │ Claude │──┐ │ │ │
│ │ │ └────┬────┘ │ │ │ │
│ │ │ │ │ Tool Call │ │ │
│ │ │ │ ▼ │ │ │
│ │ │ │ ┌─────────┐ │ │ │
│ │ │ │ │ Webhook │ │ │ │
│ │ │ │ │ (n8n) │ │ │ │
│ │ │ │ └────┬────┘ │ │ │
│ │ │ │ │ │ │ │
│ │ │ │◀──────┘ │ │ │
│ │ │ │ Response │ │ │
│ │ │ ▼ │ │ │
│ │ │ ┌─────────┐ │ │ │
│ │ │ │ TTS │ │ │ │
│ │ │ │Chatterbox │ │ │
│ │ │ └────┬────┘ │ │ │
│ │ │ │ │ │ │
│ │ │ │ Audio │ │ │
│ │ │ ▼ │ │ │
│ │ │ Agent ──▶ LiveKit ──▶ Bridge ──▶ Caller│ │
│ │ │ │ │ │
│ │ └────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ (Repeat until call ends) │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────┘ │
│ │
│ 5. CALL ENDS │
│ ───────── │
│ │
│ Trigger: Caller hangup / Agent transfer / Timeout │
│ │
│ WebRTC Bridge │
│ │ │
│ │ 1. Close WebRTC connection │
│ │ 2. Leave LiveKit room │
│ │ 3. Update call state to ENDED │
│ │ │
│ │ Publish Event: "call.ended" │
│ │ { │
│ │ "call_id": "abc123", │
│ │ "duration_seconds": 127, │
│ │ "end_reason": "caller_hangup" │
│ │ } │
│ ▼ │
│ Worker Service │
│ │ │
│ │ 1. Process transcript │
│ │ 2. Aggregate usage │
│ │ 3. Send completion webhook │
│ │ 4. Archive call data │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
4.2 Outbound Call Flow \
┌─────────────────────────────────────────────────────────────────────────────┐
│ OUTBOUND CALL DATA FLOW │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. API REQUEST │
│ ─────────── │
│ │
│ Client ──HTTP POST──▶ API Gateway │
│ │ │
│ POST /v1/calls │ │
│ { │ │
│ "agent_id": "agent_1", │ │
│ "to": "+15551234567", │ │
│ "context": { │ │
│ "customer_name": "John", │
│ "appointment_id": "apt_123" │
│ } │ │
│ } │ │
│ │ │
│ │ 1. Validate request │
│ │ 2. Check tenant credits │
│ │ 3. Create call record │
│ │ 4. Enqueue call initiation │
│ │ │
│ │ Response: 202 Accepted │
│ │ { │
│ │ "call_id": "xyz789", │
│ │ "status": "initiating" │
│ │ } │
│ │
│ 2. CALL INITIATION │
│ ──────────────── │
│ │
│ WebRTC Bridge │
│ │ │
│ │ POST /web-calls/v1/calls │
│ │ { │
│ │ "dial_string": "tel:+15551234567", │
│ │ "caller_id": "+15559876543" │
│ │ } │
│ │ │
│ ▼ │
│ GoToConnect │
│ │ │
│ │ Initiates outbound call via PSTN │
│ │ │
│ │ WebSocket Event: "call.dialing" │
│ │ │
│ │
│ 3. CALL CONNECTED │
│ ────────────── │
│ │
│ Callee answers phone │
│ │ │
│ │ WebSocket Event: "call.connected" │
│ │ │
│ ▼ │
│ (Same flow as inbound call from step 2 onwards) │
│ │
│ 4. CALL NOT ANSWERED │
│ ────────────────── │
│ │
│ Timeout or voicemail detected │
│ │ │
│ │ WebSocket Event: "call.failed" │
│ │ { │
│ │ "reason": "no_answer" | "voicemail" | "busy" │
│ │ } │
│ │ │
│ ▼ │
│ Worker Service │
│ │ │
│ │ 1. Update call status │
│ │ 2. Send failure webhook │
│ │ 3. Optionally schedule retry │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
4.3 Transfer Flow \
┌─────────────────────────────────────────────────────────────────────────────┐
│ TRANSFER DATA FLOW │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ BLIND TRANSFER │
│ ────────────── │
│ │
│ AI determines transfer is needed │
│ │ │
│ │ "Let me transfer you to our billing department." │
│ │ │
│ ▼ │
│ Agent Service │
│ │ │
│ │ Publish Event: "call.transfer_requested" │
│ │ { │
│ │ "type": "blind", │
│ │ "target": "ext:1001" │
│ │ } │
│ │ │
│ ▼ │
│ WebRTC Bridge │
│ │ │
│ │ POST /web-calls/v1/calls/{id}/blind-transfer │
│ │ { "dial_string": "ext:1001" } │
│ │ │
│ ▼ │
│ GoToConnect │
│ │ │
│ │ 1. Connects to extension 1001 │
│ │ 2. Bridges caller to new party │
│ │ 3. Disconnects AI │
│ │ │
│ │ WebSocket Event: "call.transferred" │
│ │ │
│ │
│ ───────────────────────────────────────────────────────────────────────── │
│ │
│ WARM TRANSFER │
│ ───────────── │
│ │
│ AI determines transfer is needed │
│ │ │
│ │ "I'll connect you with a specialist. One moment please." │
│ │ │
│ ▼ │
│ Agent Service │
│ │ │
│ │ Publish Event: "call.transfer_requested" │
│ │ { │
│ │ "type": "warm", │
│ │ "target": "ext:1002", │
│ │ "context": "Customer John calling about billing issue #123" │
│ │ } │
│ │ │
│ ▼ │
│ WebRTC Bridge │
│ │ │
│ │ 1. PUT /web-calls/v1/calls/{id}/hold │
│ │ (Customer hears hold music) │
│ │ │
│ │ 2. POST /web-calls/v1/calls │
│ │ { "dial_string": "ext:1002" } │
│ │ (Call agent) │
│ │ │
│ │ 3. AI briefs agent: "Transferring John, billing issue #123" │
│ │ │
│ │ 4. Agent accepts transfer │
│ │ │
│ │ 5. POST /web-calls/v1/calls/{id}/warm-transfer │
│ │ { "refer_id": "{agent_call_id}" } │
│ │ │
│ ▼ │
│ GoToConnect │
│ │ │
│ │ 1. Connects customer to agent │
│ │ 2. Disconnects AI │
│ │ │
│ │ WebSocket Event: "call.transferred" │
│ │ │
│ │
│ ───────────────────────────────────────────────────────────────────────── │
│ │
│ CONFERENCE (3-WAY) │
│ ────────────────── │
│ │
│ Supervisor wants to join call │
│ │ │
│ ▼ │
│ WebRTC Bridge │
│ │ │
│ │ 1. POST /web-calls/v1/calls │
│ │ { "dial_string": "ext:1003" } │
│ │ (Call supervisor) │
│ │ │
│ │ 2. POST /web-calls/v1/calls/{id}/merge │
│ │ { "refer_id": "{supervisor_call_id}" } │
│ │ │
│ ▼ │
│ GoToConnect │
│ │ │
│ │ All three parties (customer, AI, supervisor) in conference │
│ │ │
│ │ Supervisor can: │
│ │ - Listen silently │
│ │ - Coach AI (via separate channel) │
│ │ - Take over conversation │
│ │ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ TOOL CALLING DATA FLOW │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Caller: "Can you check if Dr. Smith has availability next Tuesday?" │
│ │
│ 1. LLM DECIDES TO USE TOOL │
│ ────────────────────────── │
│ │
│ Claude Response (streaming): │
│ { │
│ "type": "tool_use", │
│ "name": "check_availability", │
│ "input": { │
│ "provider": "Dr. Smith", │
│ "date": "2026-01-21" │
│ } │
│ } │
│ │
│ 2. TOOL EXECUTION │
│ ────────────── │
│ │
│ Agent Service │
│ │ │
│ │ 1. Extract tool call from LLM response │
│ │ 2. Validate against tool schema │
│ │ 3. Generate filler speech: "Let me check that for you..." │
│ │ 4. Send filler to TTS (non-blocking) │
│ │ │
│ │ Parallel execution: │
│ │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ │
│ │ │ TTS: Filler │ │ Webhook Call │ │
│ │ │ "Let me check" │ │ │ │
│ │ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │ │
│ │ ▼ ▼ │
│ │ LiveKit Room n8n Webhook │
│ │ │ │ │
│ │ ▼ │ │
│ │ Caller hears │ │
│ │ filler speech │ │
│ │ │ │
│ │ ▼ │
│ │ Calendar API │
│ │ │ │
│ │ │ { │
│ │ │ "available_slots": [ │
│ │ │ "9:00 AM", │
│ │ │ "2:00 PM", │
│ │ │ "4:30 PM" │
│ │ │ ] │
│ │ │ } │
│ │ │ │
│ │◀────────────────────────────────┘ │
│ │ │
│ │
│ 3. CONTINUE CONVERSATION │
│ ───────────────────── │
│ │
│ Agent Service │
│ │ │
│ │ Inject tool result into conversation: │
│ │ { │
│ │ "role": "tool_result", │
│ │ "content": "{\"available_slots\": [\"9:00 AM\", ...]}" │
│ │ } │
│ │ │
│ │ Continue LLM generation with result │
│ │ │
│ ▼ │
│ Claude │
│ │ │
│ │ "Dr. Smith has three openings on Tuesday: │
│ │ 9 AM, 2 PM, and 4:30 PM. Which works best for you?" │
│ │ │
│ ▼ │
│ TTS ──▶ LiveKit ──▶ Bridge ──▶ Caller │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
- Service Boundaries \
5.1 Service Responsibility Matrix \
| Service | Creates | Reads | Updates | Deletes |
|---|
| API Gateway | Tenants, Agents, Numbers, Voices, Webhooks | All | All (via API) | Soft delete |
| WebRTC Bridge | Calls, CallEvents | Tenants, Agents, Numbers | CallState | - |
| Agent Service | Transcripts, ToolCalls | Tenants, Agents, KnowledgeBase | Calls (status) | - |
| Worker Service | UsageRecords, Reports | Calls, Transcripts | Calls (archive) | Expired sessions |
| Chatterbox | - | Voices | - | - |
5.2 Service Communication \
┌─────────────────────────────────────────────────────────────────────────────┐
│ SERVICE COMMUNICATION MAP │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ SYNCHRONOUS (HTTP/gRPC) │
│ ────────────────────── │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ API Gateway │────────▶│ WebRTC │────────▶│ GoToConnect │ │
│ │ │ │ Bridge │ │ API │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ LiveKit │ │
│ │ Cloud │ │
│ └─────────────┘ │
│ │ │
│ │ │
│ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Deepgram │◀────────│ Agent │────────▶│ Anthropic │ │
│ │ API │ │ Service │ │ API │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Chatterbox │ │
│ │ (RunPod) │ │
│ └─────────────┘ │
│ │
│ │
│ ASYNCHRONOUS (Redis Pub/Sub) │
│ ─────────────────────────── │
│ │
│ ┌───────────┐ │
│ │ Redis │ │
│ │ Pub/Sub │ │
│ └─────┬─────┘ │
│ │ │
│ ┌─────────────────────┼─────────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ API Gateway │ │ WebRTC │ │ Worker │ │
│ │ │ │ Bridge │ │ Service │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ n8n │ │
│ │ (Webhooks) │ │
│ └─────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
5.3 Event Catalog \
| Event | Publisher | Subscribers | Payload |
|---|
call.ringing | WebRTC Bridge | Agent Service | call_id, tenant_id, from, to, direction |
call.connected | WebRTC Bridge | Agent Service, Worker | call_id, tenant_id, answered_at |
call.ended | WebRTC Bridge | Agent Service, Worker | call_id, duration, end_reason |
call.transfer_requested | Agent Service | WebRTC Bridge | call_id, type, target, context |
call.transferred | WebRTC Bridge | Worker | call_id, transferred_to |
room.ready | WebRTC Bridge | Agent Service | room_name, call_id |
agent.joined | Agent Service | WebRTC Bridge | call_id, agent_id |
transcript.turn | Agent Service | Worker | call_id, speaker, text, timestamp |
tool.called | Agent Service | Worker | call_id, tool_name, input, output |
usage.minute | Agent Service | Worker | tenant_id, call_id, minute_count |
5.4 API Contracts Between Services \
5.4.1 WebRTC Bridge → Agent Service \
Event: room.ready
Channel: call:{call_id}:events
{
"event": "room.ready",
"timestamp": "2026-01-16T10:30:00Z",
"data": {
"room_name": "call-tenant123-call456",
"call_id": "call456",
"tenant_id": "tenant123",
"agent_id": "agent789",
"caller_number": "+15551234567",
"context": {
"customer_name": "John Doe",
"account_id": "acct_123"
}
}
}
5.4.2 Agent Service → WebRTC Bridge \
Event: call.transfer_requested
Channel: call:{call_id}:events
{
"event": "call.transfer_requested",
"timestamp": "2026-01-16T10:35:00Z",
"data": {
"call_id": "call456",
"transfer_type": "warm",
"target": "ext:1001",
"context": "Customer John asking about billing, issue #123",
"reason": "customer_request"
}
}
5.4.3 Agent Service → Chatterbox TTS \
POST /synthesize
Content-Type: application/json
{
"text": "I'd be happy to help you with that [chuckle]. Let me check your account.",
"voice_id": "voice_female_01",
"options": {
"exaggeration": 0.5,
"cfg_weight": 0.5,
"streaming": true
}
}
Response (streaming):
Transfer-Encoding: chunked
Content-Type: audio/wav
[binary audio chunks]
- Network Topology \
6.1 Network Diagram \
┌─────────────────────────────────────────────────────────────────────────────┐
│ NETWORK TOPOLOGY │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ INTERNET │ │
│ └───────┬─────────────────┬─────────────────┬─────────────────┬───────┘ │
│ │ │ │ │ │
│ │ HTTPS │ HTTPS │ WebRTC │ HTTPS │
│ │ (API) │ (Webhooks) │ (Audio) │ (APIs) │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ CLOUDFLARE (CDN/WAF) │ │
│ │ │ │
│ │ - DDoS protection │ │
│ │ - SSL termination │ │
│ │ - Rate limiting │ │
│ │ - Geographic routing │ │
│ │ │ │
│ └───────────────────────────────┬───────────────────────────────────┘ │
│ │ │
│ │ HTTPS (internal) │
│ │ │
│ ┌───────────────────────────────▼───────────────────────────────────┐ │
│ │ DIGITALOCEAN VPC │ │
│ │ 10.0.0.0/16 │ │
│ │ │ │
│ │ ┌────────────────────────────────────────────────────────────┐ │ │
│ │ │ LOAD BALANCER │ │ │
│ │ │ 10.0.1.1 │ │ │
│ │ └──────────┬─────────────────┬─────────────────┬─────────────┘ │ │
│ │ │ │ │ │ │
│ │ ▼ ▼ ▼ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ API Gateway │ │ API Gateway │ │ API Gateway │ │ │
│ │ │ 10.0.2.1 │ │ 10.0.2.2 │ │ 10.0.2.3 │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │WebRTC Bridge │ │WebRTC Bridge │ │ │
│ │ │ 10.0.3.1 │ │ 10.0.3.2 │ │ │
│ │ └──────────────┘ └──────────────┘ │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │Agent Service │ │Agent Service │ │ │
│ │ │ 10.0.4.1 │ │ 10.0.4.2 │ │ │
│ │ └──────────────┘ └──────────────┘ │ │
│ │ │ │
│ │ ┌──────────────┐ │ │
│ │ │Worker Service│ │ │
│ │ │ 10.0.5.1 │ │ │
│ │ └──────────────┘ │ │
│ │ │ │
│ │ ┌────────────────────────────────────────────────────────────┐ │ │
│ │ │ DATA SUBNET │ │ │
│ │ │ 10.0.10.0/24 │ │ │
│ │ │ │ │ │
│ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │
│ │ │ │ PostgreSQL │ │ Redis │ │ DO Spaces │ │ │ │
│ │ │ │ 10.0.10.1 │ │ 10.0.10.2 │ │ (S3 API) │ │ │ │
│ │ │ │ (Primary) │ │ (Primary) │ │ │ │ │ │
│ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │
│ │ │ │ │ │
│ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │
│ │ │ │ PostgreSQL │ │ Redis │ │ │ │
│ │ │ │ 10.0.10.3 │ │ 10.0.10.4 │ │ │ │
│ │ │ │ (Replica) │ │ (Replica) │ │ │ │
│ │ │ └──────────────┘ └──────────────┘ │ │ │
│ │ │ │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ RUNPOD │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────────────────────┐ │ │
│ │ │ CHATTERBOX TTS │ │ │
│ │ │ GPU: RTX A5000 │ │ │
│ │ │ Public IP: x.x.x.x │ │ │
│ │ └──────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ EXTERNAL SERVICES │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌──────────┐ │ │
│ │ │ GoToConnect │ │ LiveKit │ │ Deepgram │ │Anthropic │ │ │
│ │ │api.goto.com │ │livekit.cloud│ │deepgram.com │ │claude.ai │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └──────────┘ │ │
│ │ │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
6.2 Port Matrix \
| Service | Internal Port | External Port | Protocol | Purpose |
|---|
| API Gateway | 8000 | 443 | HTTPS | Public API |
| WebRTC Bridge | 8001 | - | Internal | Service mesh |
| WebRTC Bridge | 10000-10100 | 10000-10100 | UDP | WebRTC media |
| Agent Service | 8002 | - | Internal | Service mesh |
| Worker Service | 8003 | - | Internal | Service mesh |
| PostgreSQL | 5432 | - | TCP | Database |
| Redis | 6379 | - | TCP | Cache/Pub-Sub |
| Chatterbox | 8080 | 443 | HTTPS | TTS API |
6.3 Firewall Rules \
firewall_rules:
# Inbound to load balancer
- name: "allow-https-inbound"
direction: inbound
protocol: tcp
port: 443
source: 0.0.0.0/0
destination: load_balancer
# WebRTC media (UDP)
- name: "allow-webrtc-media"
direction: inbound
protocol: udp
port: 10000-10100
source: 0.0.0.0/0
destination: webrtc_bridge
# Internal VPC communication
- name: "allow-vpc-internal"
direction: both
protocol: all
source: 10.0.0.0/16
destination: 10.0.0.0/16
# Outbound to external services
- name: "allow-outbound-https"
direction: outbound
protocol: tcp
port: 443
source: 10.0.0.0/16
destination: 0.0.0.0/0
# Block all other inbound
- name: "deny-all-inbound"
direction: inbound
protocol: all
source: 0.0.0.0/0
action: deny
6.4 DNS Configuration \
dns_records:
# Public endpoints
- name: api.aiconnected.io
type: A
value: [cloudflare_proxy_ip]
proxied: true
- name: tts.aiconnected.io
type: A
value: [runpod_public_ip]
proxied: false # Direct for latency
# Internal endpoints (private DNS)
- name: db.internal.aiconnected.io
type: A
value: 10.0.10.1
zone: internal
- name: redis.internal.aiconnected.io
type: A
value: 10.0.10.2
zone: internal
- External Service Dependencies \
7.1 Dependency Map \
┌─────────────────────────────────────────────────────────────────────────────┐
│ EXTERNAL SERVICE DEPENDENCIES │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ CRITICAL PATH │ │
│ │ (Required for call handling) │ │
│ │ │ │
│ │ ┌─────────────┐ │ │
│ │ │ GoToConnect │ Telephony │ │
│ │ │ │ - WebRTC signaling │ │
│ │ │ │ - Call control │ │
│ │ │ │ - PSTN connectivity │ │
│ │ └─────────────┘ │ │
│ │ │ │ │
│ │ │ Failure Impact: Cannot make/receive calls │ │
│ │ │ Fallback: None (critical) │ │
│ │ │ │
│ │ ┌─────────────┐ │ │
│ │ │ LiveKit │ Real-time Audio │ │
│ │ │ Cloud │ - Room management │ │
│ │ │ │ - Audio routing │ │
│ │ │ │ - Participant management │ │
│ │ └─────────────┘ │ │
│ │ │ │ │
│ │ │ Failure Impact: Cannot process calls │ │
│ │ │ Fallback: None (critical) │ │
│ │ │ │
│ │ ┌─────────────┐ │ │
│ │ │ Deepgram │ Speech-to-Text │ │
│ │ │ │ - Streaming transcription │ │
│ │ │ │ - Interim results │ │
│ │ └─────────────┘ │ │
│ │ │ │ │
│ │ │ Failure Impact: Cannot understand caller │ │
│ │ │ Fallback: Whisper (self-hosted, higher latency) │ │
│ │ │ │
│ │ ┌─────────────┐ │ │
│ │ │ Anthropic │ Language Model │ │
│ │ │ (Claude) │ - Response generation │ │
│ │ │ │ - Tool calling │ │
│ │ └─────────────┘ │ │
│ │ │ │ │
│ │ │ Failure Impact: Cannot generate responses │ │
│ │ │ Fallback: Cached responses, graceful transfer │ │
│ │ │ │
│ │ ┌─────────────┐ │ │
│ │ │ Chatterbox │ Text-to-Speech │ │
│ │ │ (RunPod) │ - Speech synthesis │ │
│ │ │ │ - Voice cloning │ │
│ │ └─────────────┘ │ │
│ │ │ │ │
│ │ │ Failure Impact: Cannot speak to caller │ │
│ │ │ Fallback: Resemble AI API, pre-recorded audio │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ NON-CRITICAL PATH │ │
│ │ (Required for full functionality) │ │
│ │ │ │
│ │ ┌─────────────┐ │ │
│ │ │ n8n │ Webhook Automation │ │
│ │ │ │ - Tool execution │ │
│ │ │ │ - CRM integration │ │
│ │ │ │ - Calendar integration │ │
│ │ └─────────────┘ │ │
│ │ │ │ │
│ │ │ Failure Impact: Tools unavailable │ │
│ │ │ Fallback: Inform caller, continue conversation │ │
│ │ │ │
│ │ ┌─────────────┐ │ │
│ │ │ Knowledge │ Context Retrieval │ │
│ │ │ Base │ - RAG queries │ │
│ │ │ │ - FAQ lookup │ │
│ │ └─────────────┘ │ │
│ │ │ │ │
│ │ │ Failure Impact: Generic responses only │ │
│ │ │ Fallback: Base system prompt │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
7.2 Service Level Objectives \
| Service | Expected Uptime | Latency Target | Our SLA Impact |
|---|
| GoToConnect | 99.99% | <100ms API | Critical |
| LiveKit Cloud | 99.95% | <50ms routing | Critical |
| Deepgram | 99.9% | <300ms STT | High |
| Anthropic | 99.9% | <500ms TTFT | High |
| RunPod | 99.5% | <200ms TTS | High |
| n8n | 99.0% | <1000ms webhook | Medium |
7.3 Authentication and Credentials \
| Service | Auth Method | Credential Storage | Rotation Policy |
|---|
| GoToConnect | OAuth 2.0 | Environment vars | Auto-refresh |
| LiveKit | API Key/Secret | Environment vars | Manual, quarterly |
| Deepgram | API Key | Environment vars | Manual, quarterly |
| Anthropic | API Key | Environment vars | Manual, quarterly |
| RunPod | API Key | Environment vars | Manual, quarterly |
7.4 Rate Limits \
| Service | Rate Limit | Our Expected Usage | Buffer |
|---|
| GoToConnect | 1000 req/min | ~100 req/min | 10x |
| LiveKit | Unlimited (paid) | N/A | N/A |
| Deepgram | 100 concurrent | ~50 concurrent | 2x |
| Anthropic | 4000 RPM | ~500 RPM | 8x |
| Chatterbox (self) | Hardware limited | ~100 concurrent | GPU-bound |
- Internal Service Architecture \
8.1 Service Template \
All internal services follow a consistent structure:
service-name/
├── app/
│ ├── __init__.py
│ ├── main.py # Application entry point
│ ├── config.py # Configuration management
│ ├── dependencies.py # Dependency injection
│ │
│ ├── api/ # HTTP endpoints (if applicable)
│ │ ├── __init__.py
│ │ ├── routes.py
│ │ └── schemas.py
│ │
│ ├── core/ # Business logic
│ │ ├── __init__.py
│ │ └── [domain].py
│ │
│ ├── integrations/ # External service clients
│ │ ├── __init__.py
│ │ └── [service].py
│ │
│ └── models/ # Data models
│ ├── __init__.py
│ └── [entity].py
│
├── tests/
│ ├── unit/
│ ├── integration/
│ └── conftest.py
│
├── Dockerfile
├── requirements.txt
└── pyproject.toml
8.2 Shared Libraries \
shared/
├── database/
│ ├── __init__.py
│ ├── connection.py # Connection pooling
│ ├── models.py # SQLAlchemy models
│ └── migrations/ # Alembic migrations
│
├── cache/
│ ├── __init__.py
│ └── redis_client.py # Redis client wrapper
│
├── events/
│ ├── __init__.py
│ ├── bus.py # Event bus abstraction
│ └── schemas.py # Event payload schemas
│
├── auth/
│ ├── __init__.py
│ ├── api_key.py # API key validation
│ └── jwt.py # JWT handling
│
├── observability/
│ ├── __init__.py
│ ├── logging.py # Structured logging
│ ├── metrics.py # Prometheus metrics
│ └── tracing.py # Distributed tracing
│
└── utils/
├── __init__.py
└── helpers.py # Common utilities
8.3 Configuration Management \
# shared/config/base.py
from pydantic_settings import BaseSettings
from functools import lru_cache
class Settings(BaseSettings):
# Application
app_name: str = "voice-aiconnected"
environment: str = "development"
debug: bool = False
# Database
database_url: str
database_pool_size: int = 10
database_max_overflow: int = 20
# Redis
redis_url: str
redis_pool_size: int = 10
# External Services
gotoconnect_client_id: str
gotoconnect_client_secret: str
livekit_url: str
livekit_api_key: str
livekit_api_secret: str
deepgram_api_key: str
anthropic_api_key: str
chatterbox_url: str
# Observability
log_level: str = "INFO"
metrics_enabled: bool = True
tracing_enabled: bool = True
class Config:
env_file = ".env"
env_file_encoding = "utf-8"
@lru_cache()
def get_settings() -> Settings:
return Settings()
- Data Architecture \
9.1 Database Schema Overview \
┌─────────────────────────────────────────────────────────────────────────────┐
│ DATABASE SCHEMA OVERVIEW │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ TENANT DOMAIN │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ tenants │──────▶│ agents │──────▶│ voices │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ │ │ │ │
│ │ │ │ │ │
│ │ ▼ ▼ │ │
│ │ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │phone_numbers │ │ webhooks │ │ │
│ │ └──────────────┘ └──────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ CALL DOMAIN │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ calls │──────▶│ transcripts │ │ call_events │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ │ │ │ │
│ │ │ │ │ │
│ │ ▼ │ │ │
│ │ ┌──────────────┐ │ │ │
│ │ │ tool_calls │◀─────────────────────────────────────┘ │ │
│ │ └──────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ BILLING DOMAIN │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │usage_records │──────▶│credit_buckets│ │ invoices │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
9.2 Core Tables \
tenants \
CREATE TABLE tenants (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
slug VARCHAR(100) UNIQUE NOT NULL,
status VARCHAR(50) DEFAULT 'active',
settings JSONB DEFAULT '{}',
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
deleted_at TIMESTAMP WITH TIME ZONE
);
CREATE INDEX idx_tenants_slug ON tenants(slug);
CREATE INDEX idx_tenants_status ON tenants(status);
agents \
CREATE TABLE agents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
name VARCHAR(255) NOT NULL,
description TEXT,
status VARCHAR(50) DEFAULT 'active',
voice_id UUID REFERENCES voices(id),
system_prompt TEXT NOT NULL,
greeting_message TEXT,
tools JSONB DEFAULT '[]',
settings JSONB DEFAULT '{}',
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
deleted_at TIMESTAMP WITH TIME ZONE
);
CREATE INDEX idx_agents_tenant ON agents(tenant_id);
CREATE INDEX idx_agents_status ON agents(status);
calls \
CREATE TABLE calls (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
agent_id UUID NOT NULL REFERENCES agents(id),
direction VARCHAR(20) NOT NULL, -- 'inbound' or 'outbound'
status VARCHAR(50) NOT NULL,
from_number VARCHAR(50),
to_number VARCHAR(50),
external_call_id VARCHAR(255), -- GoToConnect call ID
room_name VARCHAR(255), -- LiveKit room
started_at TIMESTAMP WITH TIME ZONE,
answered_at TIMESTAMP WITH TIME ZONE,
ended_at TIMESTAMP WITH TIME ZONE,
duration_seconds INTEGER,
end_reason VARCHAR(100),
metadata JSONB DEFAULT '{}',
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
CREATE INDEX idx_calls_tenant ON calls(tenant_id);
CREATE INDEX idx_calls_agent ON calls(agent_id);
CREATE INDEX idx_calls_status ON calls(status);
CREATE INDEX idx_calls_direction ON calls(direction);
CREATE INDEX idx_calls_started_at ON calls(started_at);
CREATE INDEX idx_calls_external_id ON calls(external_call_id);
transcripts \
CREATE TABLE transcripts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
call_id UUID NOT NULL REFERENCES calls(id),
turn_number INTEGER NOT NULL,
speaker VARCHAR(50) NOT NULL, -- 'caller', 'agent', 'system'
text TEXT NOT NULL,
confidence FLOAT,
started_at TIMESTAMP WITH TIME ZONE NOT NULL,
ended_at TIMESTAMP WITH TIME ZONE,
metadata JSONB DEFAULT '{}'
);
CREATE INDEX idx_transcripts_call ON transcripts(call_id);
CREATE INDEX idx_transcripts_call_turn ON transcripts(call_id, turn_number);
9.3 Redis Data Structures \
redis_structures:
# Call State
call:{call_id}:state:
type: hash
fields:
status: "conversing"
tenant_id: "tenant_123"
agent_id: "agent_456"
room_name: "call-tenant123-call456"
started_at: "2026-01-16T10:30:00Z"
ttl: 3600 # 1 hour after call ends
# Session Context
call:{call_id}:context:
type: hash
fields:
conversation_history: "[{...}]" # JSON array
extracted_entities: "{...}" # JSON object
pending_tool_calls: "[...]" # JSON array
ttl: 3600
# Active Calls per Tenant
tenant:{tenant_id}:active_calls:
type: set
members:
- "call_123"
- "call_456"
ttl: none
# Rate Limiting
ratelimit:{tenant_id}:{window}:
type: string
value: "42" # request count
ttl: 60 # window duration
# Event Channels
channels:
- call:{call_id}:events
- tenant:{tenant_id}:events
- system:events
9.4 Data Retention Policy \
| Data Type | Hot Storage | Warm Storage | Archive | Deletion |
|---|
| Call records | 30 days | 90 days | 2 years | 7 years |
| Transcripts | 30 days | 90 days | 2 years | 7 years |
| Audio recordings | 7 days | 30 days | 1 year | 1 year |
| Usage records | 90 days | 1 year | 7 years | 7 years |
| Session state | Call duration + 1h | - | - | Immediate |
| Audit logs | 90 days | 1 year | 7 years | 7 years |
- Security Architecture \
10.1 Security Layers \
┌─────────────────────────────────────────────────────────────────────────────┐
│ SECURITY ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ PERIMETER SECURITY │ │
│ │ │ │
│ │ • Cloudflare DDoS protection │ │
│ │ • Web Application Firewall (WAF) │ │
│ │ • Rate limiting at edge │ │
│ │ • Geographic restrictions (optional) │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ TRANSPORT SECURITY │ │
│ │ │ │
│ │ • TLS 1.3 for all external connections │ │
│ │ • Certificate management via Let's Encrypt │ │
│ │ • HSTS enabled │ │
│ │ • Internal service mesh encryption │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ APPLICATION SECURITY │ │
│ │ │ │
│ │ • API key authentication │ │
│ │ • JWT for session management │ │
│ │ • Role-based access control (RBAC) │ │
│ │ • Input validation and sanitization │ │
│ │ • Output encoding │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ DATA SECURITY │ │
│ │ │ │
│ │ • Encryption at rest (AES-256) │ │
│ │ • Database column encryption for PII │ │
│ │ • Tenant data isolation │ │
│ │ • Secure credential storage │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
10.2 Authentication Flow \
┌─────────────────────────────────────────────────────────────────────────────┐
│ API AUTHENTICATION FLOW │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Client Request │
│ │ │
│ │ Headers: │
│ │ X-API-Key: sk_live_xxxxx │
│ │ Content-Type: application/json │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ API GATEWAY │ │
│ │ │ │
│ │ 1. Extract API key from header │ │
│ │ 2. Hash and lookup in database │ │
│ │ 3. Verify key is active and not expired │ │
│ │ 4. Load tenant context from key │ │
│ │ 5. Check rate limits │ │
│ │ 6. Inject tenant context into request │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ │ Request Context: │
│ │ tenant_id: "tenant_123" │
│ │ permissions: ["read", "write", "admin"] │
│ │ rate_limit_remaining: 95 │
│ │ │
│ ▼ │
│ Route Handler │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
10.3 Data Encryption \
| Data Type | At Rest | In Transit | Key Management |
|---|
| API Keys | SHA-256 hashed | TLS 1.3 | Not stored (hash only) |
| User data | AES-256 | TLS 1.3 | AWS KMS / DO Spaces |
| Audio recordings | AES-256 | TLS 1.3 | Per-tenant keys |
| Database | Transparent encryption | TLS 1.3 | Managed PostgreSQL |
| Redis | Not encrypted | TLS | In-memory only |
- Scalability Architecture \
11.1 Horizontal Scaling Strategy \
┌─────────────────────────────────────────────────────────────────────────────┐
│ HORIZONTAL SCALING STRATEGY │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ SCALE TRIGGER: Active calls > (instances × 50) │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ LOAD BALANCER │ │
│ │ │ │
│ │ • Round-robin distribution │ │
│ │ • Health check: /health every 10s │ │
│ │ • Sticky sessions: Not required (stateless) │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────┼─────────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ API Gateway │ │ API Gateway │ │ API Gateway │ │
│ │ Instance 1 │ │ Instance 2 │ │ Instance N │ │
│ └──────────────────┘ └──────────────────┘ └──────────────────┘ │
│ │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ WebRTC Bridge │ │ WebRTC Bridge │ │ WebRTC Bridge │ │
│ │ Instance 1 │ │ Instance 2 │ │ Instance N │ │
│ │ (50 calls) │ │ (50 calls) │ │ (50 calls) │ │
│ └──────────────────┘ └──────────────────┘ └──────────────────┘ │
│ │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Agent Service │ │ Agent Service │ │ Agent Service │ │
│ │ Instance 1 │ │ Instance 2 │ │ Instance N │ │
│ │ (50 calls) │ │ (50 calls) │ │ (50 calls) │ │
│ └──────────────────┘ └──────────────────┘ └──────────────────┘ │
│ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ SHARED STATE (Redis Cluster) │ │
│ │ │ │
│ │ • Call state accessible from any instance │ │
│ │ • Event pub/sub for cross-instance communication │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
11.2 Capacity Planning \
| Component | Capacity per Instance | Scaling Metric | Scale Threshold |
|---|
| API Gateway | 1000 req/s | CPU utilization | >70% |
| WebRTC Bridge | 50 concurrent calls | Active connections | >80% |
| Agent Service | 50 concurrent calls | Active agents | >80% |
| Worker Service | 100 jobs/minute | Queue depth | >1000 |
| PostgreSQL | 500 connections | Connection count | >80% |
| Redis | 10,000 ops/s | Memory usage | >80% |
| Chatterbox | 100 concurrent synth | GPU utilization | >80% |
11.3 Auto-Scaling Configuration \
autoscaling:
api_gateway:
min_instances: 2
max_instances: 10
target_cpu_utilization: 70
scale_up_cooldown: 60
scale_down_cooldown: 300
webrtc_bridge:
min_instances: 2
max_instances: 20
target_metric: active_connections
target_value: 40
scale_up_cooldown: 30
scale_down_cooldown: 300
agent_service:
min_instances: 2
max_instances: 20
target_metric: active_agents
target_value: 40
scale_up_cooldown: 30
scale_down_cooldown: 300
worker_service:
min_instances: 1
max_instances: 5
target_metric: queue_depth
target_value: 500
scale_up_cooldown: 60
scale_down_cooldown: 300
- Failure Modes and Recovery \
12.1 Failure Scenarios \
| Scenario | Detection | Impact | Recovery |
|---|
| GoToConnect outage | Health check failure | No new calls | Wait for recovery, alert |
| LiveKit outage | Health check failure | Active calls drop | Reconnect, apologize |
| Deepgram outage | API error rate | Can’t transcribe | Fallback to Whisper |
| Claude outage | API error rate | Can’t generate | Cached responses, transfer |
| Chatterbox crash | Health check failure | Can’t speak | Fallback to Resemble |
| Database failure | Connection errors | Full outage | Failover to replica |
| Redis failure | Connection errors | State loss | Rebuild from events |
| Single instance crash | Health check failure | Minimal | Auto-restart, rebalance |
12.2 Circuit Breaker Configuration \
# shared/resilience/circuit_breaker.py
from circuitbreaker import CircuitBreaker
deepgram_breaker = CircuitBreaker(
failure_threshold=5,
recovery_timeout=30,
expected_exception=DeepgramError
)
claude_breaker = CircuitBreaker(
failure_threshold=3,
recovery_timeout=60,
expected_exception=AnthropicError
)
chatterbox_breaker = CircuitBreaker(
failure_threshold=3,
recovery_timeout=30,
expected_exception=ChatterboxError
)
12.3 Graceful Degradation Hierarchy \
┌─────────────────────────────────────────────────────────────────────────────┐
│ GRACEFUL DEGRADATION HIERARCHY │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ STT DEGRADATION: │
│ │
│ Primary: Deepgram Nova-2 (streaming) │
│ │ │
│ │ If unavailable │
│ ▼ │
│ Fallback 1: Deepgram Nova-1 (streaming) │
│ │ │
│ │ If unavailable │
│ ▼ │
│ Fallback 2: Whisper (self-hosted, higher latency) │
│ │ │
│ │ If unavailable │
│ ▼ │
│ Final: "I'm having trouble hearing you. Please hold for an agent." │
│ │
│ ───────────────────────────────────────────────────────────────────────── │
│ │
│ LLM DEGRADATION: │
│ │
│ Primary: Claude Sonnet (streaming) │
│ │ │
│ │ If unavailable │
│ ▼ │
│ Fallback 1: Claude Haiku (streaming, less capable) │
│ │ │
│ │ If unavailable │
│ ▼ │
│ Fallback 2: Cached responses for common queries │
│ │ │
│ │ If no match │
│ ▼ │
│ Final: "I apologize, let me transfer you to someone who can help." │
│ │
│ ───────────────────────────────────────────────────────────────────────── │
│ │
│ TTS DEGRADATION: │
│ │
│ Primary: Chatterbox Turbo (self-hosted) │
│ │ │
│ │ If unavailable │
│ ▼ │
│ Fallback 1: Resemble AI API │
│ │ │
│ │ If unavailable │
│ ▼ │
│ Fallback 2: Pre-recorded audio clips │
│ │ │
│ │ If no suitable clip │
│ ▼ │
│ Final: Transfer to human (cannot communicate) │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
- Monitoring and Observability \
13.1 Metrics Architecture \
┌─────────────────────────────────────────────────────────────────────────────┐
│ METRICS ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ API Gateway │ │WebRTC Bridge │ │Agent Service │ │ Worker │ │
│ │ │ │ │ │ │ │ │ │
│ │ /metrics │ │ /metrics │ │ /metrics │ │ /metrics │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │ │
│ └─────────────────┴─────────────────┴─────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Prometheus │ │
│ │ │ │
│ │ - Scraping │ │
│ │ - Storage │ │
│ │ - Alerting │ │
│ └──────┬───────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Grafana │ │
│ │ │ │
│ │ - Dashboards│ │
│ │ - Alerts │ │
│ │ - Reports │ │
│ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
13.2 Key Metrics \
| Category | Metric | Type | Labels |
|---|
| Calls | calls_total | Counter | tenant, direction, status |
| calls_active | Gauge | tenant |
| call_duration_seconds | Histogram | tenant, direction |
| Latency | stt_latency_seconds | Histogram | tenant |
| llm_latency_seconds | Histogram | tenant, model |
| tts_latency_seconds | Histogram | tenant, voice |
| e2e_latency_seconds | Histogram | tenant |
| Errors | errors_total | Counter | service, type |
| circuit_breaker_state | Gauge | service |
| Resources | http_requests_total | Counter | method, path, status |
| http_request_duration_seconds | Histogram | method, path |
| db_connections_active | Gauge | - |
| redis_connections_active | Gauge | - |
13.3 Logging Strategy \
# Structured logging format
{
"timestamp": "2026-01-16T10:30:00.123Z",
"level": "INFO",
"service": "agent-service",
"instance": "agent-service-abc123",
"trace_id": "trace-xyz789",
"span_id": "span-def456",
"tenant_id": "tenant_123",
"call_id": "call_456",
"message": "LLM response generated",
"data": {
"model": "claude-sonnet",
"tokens": 150,
"latency_ms": 342
}
}
13.4 Alerting Rules \
alerts:
- name: HighErrorRate
condition: rate(errors_total[5m]) > 0.01
severity: warning
action: Notify on-call
- name: CallDropRate
condition: rate(calls_total{status="error"}[5m]) / rate(calls_total[5m]) > 0.05
severity: critical
action: Page on-call
- name: HighLatency
condition: histogram_quantile(0.95, e2e_latency_seconds) > 2.0
severity: warning
action: Notify on-call
- name: ServiceDown
condition: up == 0
for: 1m
severity: critical
action: Page on-call
- name: DatabaseConnectionsHigh
condition: db_connections_active > 400
severity: warning
action: Notify on-call
- name: GPUMemoryHigh
condition: gpu_memory_used_bytes / gpu_memory_total_bytes > 0.9
severity: warning
action: Notify on-call
- Deployment Architecture \
14.1 Container Architecture \
# Base image for all services
FROM python:3.11-slim as base
WORKDIR /app
# Install common dependencies
RUN apt-get update && apt-get install -y \
libpq-dev \
&& rm -rf /var/lib/apt/lists/*
# Copy shared libraries
COPY shared/ /app/shared/
# Service-specific stage
FROM base as service
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app/ /app/app/
# Non-root user
RUN useradd -m appuser
USER appuser
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
14.2 Dokploy Configuration \
# dokploy.yaml
version: "1"
services:
api-gateway:
image: registry.aiconnected.io/api-gateway:${VERSION}
replicas: 2
resources:
cpu: "0.5"
memory: "512Mi"
healthcheck:
path: /health
interval: 10s
env:
- DATABASE_URL=${DATABASE_URL}
- REDIS_URL=${REDIS_URL}
webrtc-bridge:
image: registry.aiconnected.io/webrtc-bridge:${VERSION}
replicas: 2
resources:
cpu: "1"
memory: "1Gi"
ports:
- "10000-10100:10000-10100/udp"
healthcheck:
path: /health
interval: 10s
env:
- GOTOCONNECT_CLIENT_ID=${GOTOCONNECT_CLIENT_ID}
- LIVEKIT_URL=${LIVEKIT_URL}
agent-service:
image: registry.aiconnected.io/agent-service:${VERSION}
replicas: 2
resources:
cpu: "1"
memory: "2Gi"
healthcheck:
path: /health
interval: 10s
env:
- DEEPGRAM_API_KEY=${DEEPGRAM_API_KEY}
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- CHATTERBOX_URL=${CHATTERBOX_URL}
worker-service:
image: registry.aiconnected.io/worker-service:${VERSION}
replicas: 1
resources:
cpu: "0.5"
memory: "512Mi"
env:
- DATABASE_URL=${DATABASE_URL}
- REDIS_URL=${REDIS_URL}
┌─────────────────────────────────────────────────────────────────────────────┐
│ ENVIRONMENT PROMOTION │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Development │─────▶│ Staging │─────▶│ Production │ │
│ │ │ │ │ │ │ │
│ │ • Local │ │ • DO Region 1│ │ • DO Region 1│ │
│ │ • Docker │ │ • Full stack │ │ • Full stack │ │
│ │ • Mock APIs │ │ • Test data │ │ • Live data │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ │ PR merge │ Manual approval │ │
│ │ Auto deploy │ Deploy │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ CI/CD PIPELINE │ │
│ │ │ │
│ │ 1. Run tests │ │
│ │ 2. Build images │ │
│ │ 3. Push to registry │ │
│ │ 4. Deploy to target environment │ │
│ │ 5. Run smoke tests │ │
│ │ 6. Notify team │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
- Architecture Decision Records \
### ADR-001: Use GoToConnect for Telephony {#adr-001:-use-gotoconnect-for-telephony}
Status: Accepted
Context: We need a telephony provider for PSTN connectivity and call control.
Decision: Use GoToConnect because:
- Existing grandfathered unlimited plan at $17/user
- Full WebRTC API with call control
- No per-minute charges
Consequences:
- Locked into GoToConnect infrastructure
- Need to build custom WebRTC bridge
- Dependent on GoToConnect API stability
### ADR-002: Use LiveKit for Real-Time Audio {#adr-002:-use-livekit-for-real-time-audio}
Status: Accepted
Context: We need infrastructure for real-time audio routing between the phone bridge and AI agents.
Decision: Use LiveKit Cloud because:
- Purpose-built Agents SDK for voice AI
- Handles WebRTC complexity
- Scalable managed infrastructure
Consequences:
- Monthly LiveKit costs (~$0.01/min)
- Dependent on LiveKit availability
- Need to integrate with their SDK
### ADR-003: Self-Host TTS on RunPod {#adr-003:-self-host-tts-on-runpod}
Status: Accepted
Context: TTS is a significant per-minute cost at scale.
Decision: Self-host Chatterbox on RunPod RTX A5000 because:
- Zero per-minute cost after fixed infrastructure
- MIT license, full control
- Competitive quality with paralinguistics
Consequences:
- Operational overhead for GPU management
- Need fallback provider (Resemble)
- Slightly higher latency than Cartesia
### ADR-004: Use Redis for Call State {#adr-004:-use-redis-for-call-state}
Status: Accepted
Context: Call state needs to be accessible from any service instance with low latency.
Decision: Use Redis because:
- Sub-millisecond access
- Built-in pub/sub for events
- Ephemeral data doesn’t need durability
Consequences:
- State lost on Redis failure (acceptable for call state)
- Need to handle reconnection gracefully
- Memory limits on state size
### ADR-005: PostgreSQL for Persistent Data {#adr-005:-postgresql-for-persistent-data}
Status: Accepted
Context: We need a database for tenant configuration, call history, and billing data.
Decision: Use PostgreSQL because:
- Relational model fits our data
- Excellent JSON support for flexible schemas
- Managed offering available on DigitalOcean
Consequences:
- Need to manage migrations
- Horizontal scaling more complex than NoSQL
- Connection pooling required
## Appendix A: Glossary {#appendix-a:-glossary}
| Term | Definition |
|---|
| Agent | An AI configuration that handles calls for a specific purpose |
| Barge-in | When a caller interrupts the AI mid-speech |
| Bridge | Component connecting GoToConnect to LiveKit |
| Call | A single phone conversation |
| Circuit Breaker | Pattern to prevent cascading failures |
| Context Window | The LLM’s working memory for a conversation |
| ICE | Interactive Connectivity Establishment (WebRTC) |
| LiveKit | Real-time audio/video infrastructure |
| LLM | Large Language Model (Claude) |
| PBX | Private Branch Exchange (phone system) |
| PSTN | Public Switched Telephone Network |
| Room | A LiveKit virtual space for participants |
| SDP | Session Description Protocol (WebRTC) |
| Session | Runtime state of an active call |
| SIP | Session Initiation Protocol (VoIP) |
| STT | Speech-to-Text |
| Tenant | A customer business using the platform |
| TTS | Text-to-Speech |
| Turn | One speaker’s contribution to a conversation |
| VAD | Voice Activity Detection |
| WebRTC | Web Real-Time Communication |
## Appendix B: Document History {#appendix-b:-document-history}
| Version | Date | Author | Changes |
|---|
| 1.0 | 2026-01-16 | Claude | Initial document |
End of Document