Skip to main content

Master Project Task List

Document Purpose

This document serves as the authoritative overview for the Voice by aiConnected platform build. It provides Claude Code with complete context about the project’s goals, architecture, infrastructure decisions, and the sequence of work required to deliver a production-ready Voice AI contact center platform. Read this document first before beginning any implementation work.

Project Overview

What We Are Building

Voice by aiConnected is a white-label Voice AI contact center platform that enables businesses to deploy autonomous AI agents capable of handling inbound and outbound phone calls. The platform integrates with existing phone infrastructure (GoToConnect), leverages real-time audio processing (LiveKit), and delivers hyper-realistic conversational AI through a streaming pipeline of Speech-to-Text, Large Language Model, and Text-to-Speech services.

Business Context

  • Target Market: Small to medium-sized businesses needing 24/7 phone coverage, lead response, appointment scheduling, and customer service automation
  • Pricing Model: Fixed credit buckets plus per-minute overages
  • Competitive Advantage: 50-75% lower cost than competitors (Vapi, Retell, Bland AI) through infrastructure ownership and optimized provider selection
  • Parent Company: Oxford Pierpont Corporation (business development and digital marketing)

Core Capabilities

  1. Inbound Call Handling — AI answers calls, converses naturally, resolves inquiries or transfers to humans
  2. Outbound Call Automation — AI initiates calls for lead follow-up, appointment reminders, reactivation campaigns
  3. Human Handoff — Seamless transfer to live agents via blind transfer, warm transfer, or conference
  4. Tool Calling — AI executes business logic (CRM updates, calendar booking, data lookup) via webhooks/n8n
  5. Knowledge Base Integration — AI responses informed by client-specific business context (already built)
  6. Multi-Tenant Architecture — Single platform serves multiple clients with isolated configurations

Architecture Summary

Voice Pipeline

┌─────────────────────────────────────────────────────────────────────────────┐
│                                                                             │
│   PSTN ←→ GoToConnect PBX ←→ WebRTC Bridge ←→ LiveKit Room                 │
│                                    (aiortc)         │                       │
│                                                     ├── Deepgram STT        │
│                                                     │   (streaming)         │
│                                                     │                       │
│                                                     ├── Claude LLM          │
│                                                     │   (streaming)         │
│                                                     │                       │
│                                                     ├── Chatterbox TTS      │
│                                                     │   (streaming)         │
│                                                     │                       │
│                                                     └── Tool Webhooks       │
│                                                         (async, n8n)        │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Latency Budget (Target: <1000ms mouth-to-ear)

StageTargetNotes
Audio capture → STT~100msStreaming VAD
STT processing~300msDeepgram interim results
LLM time-to-first-token~350msClaude streaming
TTS time-to-first-byte~150msChatterbox streaming
Return audio path~70msLiveKit → GoTo → PSTN
Total~970msAchievable with optimization

Infrastructure Topology

┌─────────────────────────────────────────────────────────────────────────────┐
│ EXTERNAL SERVICES                                                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   GoToConnect          LiveKit Cloud        RunPod                          │
│   (Telephony)          (Real-time Audio)    (Chatterbox GPU)                │
│        │                     │                   │                          │
│   Deepgram             Anthropic API        n8n Cloud/Self-hosted           │
│   (STT)                (Claude LLM)         (Webhooks)                      │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────────────┐
│ DIGITALOCEAN / DOKPLOY                                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐       │
│   │   API       │  │   WebRTC    │  │   Agent     │  │   Worker    │       │
│   │   Gateway   │  │   Bridge    │  │   Service   │  │   Service   │       │
│   └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘       │
│                                                                             │
│   ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                        │
│   │ PostgreSQL  │  │   Redis     │  │ DO Spaces   │                        │
│   │ (Database)  │  │   (Cache)   │  │ (Storage)   │                        │
│   └─────────────┘  └─────────────┘  └─────────────┘                        │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Infrastructure Decisions (Finalized)

These decisions have been made and should not be revisited during implementation:
ComponentDecisionRationale
TelephonyGoToConnectGrandfathered $17/user unlimited plan; full call control API
Real-time AudioLiveKit CloudIndustry standard; Agents SDK for voice AI
STTDeepgram Nova-2Low latency streaming; phone audio optimized
LLMAnthropic Claude (Sonnet)Best reasoning; streaming support
TTSChatterbox-Turbo on RunPodZero per-minute cost; MIT license; paralinguistics
GPURunPod RTX A5000Best value ($0.27/hr); 24GB VRAM sufficient
Platform HostingDigitalOcean + DokployExisting infrastructure; container orchestration
DatabasePostgreSQLRelational; proven; existing expertise
Cache/StateRedisSession state; call state machine
Object StorageDO SpacesVoice samples; call recordings
Webhooksn8nTool calling; existing expertise
Knowledge BaseExisting systemAlready built and integrated
Admin DashboardExisting systemAdd service config page; UI polish is last priority

Cost Structure

Per-Minute Breakdown (at 50k min/month scale)

ComponentCost
LiveKit Agent Session$0.0100/min
GoToConnect Telephony$0.0000/min (unlimited)
Deepgram STT$0.0043/min
Claude Sonnet LLM$0.0080/min (estimated)
Chatterbox TTS (amortized)$0.0040/min
Total~$0.025/min

Monthly Infrastructure

ServiceEst. Cost
GoToConnect$17/user
LiveKit Cloud~$50-100
RunPod A5000~$197
Deepgram~$50-100
Anthropic API~$100-300
DigitalOcean~$50-100
Total~$500-800/mo starting

Build Phases

Phase 1: Foundation (Documents 1-6)

Goal: Development environment ready, architecture fully documented
#DocumentPurpose
1System Architecture OverviewComplete technical blueprint
2GoToConnect Integration SpecificationTelephony API details
3Voice Pipeline ArchitectureSTT→LLM→TTS streaming design
4WebRTC Bridge Technical DesignGoTo↔LiveKit audio bridging
5Development Environment Setup GuideLocal dev stack
6Codebase Structure & ConventionsRepo organization
Deliverables:
  • Architecture diagrams finalized
  • All API contracts documented
  • Local development environment functional
  • Repository structure established

Phase 2: Core Infrastructure (Documents 7-11)

Goal: Database, state management, and service skeleton operational
#DocumentPurpose
7Database Schema DesignPostgreSQL tables, migrations
8State Management SpecificationCall state machine, Redis structures
9Message Queue & Event Bus DesignAsync communication patterns
10Error Handling & Recovery PatternsResilience patterns
11Core Services Implementation GuideService implementations
Deliverables:
  • Database migrations created and tested
  • Redis state management implemented
  • Event bus operational
  • Core services running (API gateway, bridge, agent, worker)

Phase 3: Provider Integrations (Documents 12-16)

Goal: All external services connected and functional
#DocumentPurpose
12LiveKit Integration SpecificationAgents SDK, room management
13Deepgram STT Integration GuideStreaming transcription
14Anthropic Claude Integration GuideLLM streaming, tools
15Chatterbox TTS Integration GuideRunPod deployment, synthesis
16Tool Calling & Webhook Specificationn8n integration
Deliverables:
  • LiveKit Agents pipeline functional
  • Deepgram streaming STT working
  • Claude streaming responses working
  • Chatterbox deployed on RunPod
  • Tool calling via webhooks operational

Phase 4: Call Features (Documents 17-20)

Goal: Complete call handling capabilities
#DocumentPurpose
17Inbound Call Flow SpecificationAnswer, converse, resolve
18Outbound Call Flow SpecificationDial, converse, resolve
19Human Handoff SpecificationTransfer patterns
20Knowledge Base Integration GuideContext injection
Deliverables:
  • Inbound calls answered by AI
  • Outbound calls initiated by AI
  • Transfers to human agents working
  • Knowledge base context in AI responses

Phase 5: Platform (Documents 21-23)

Goal: Multi-tenant API complete
#DocumentPurpose
21Tenant Configuration API SpecificationAgent/voice/number management
22Usage Metering & Billing IntegrationCredit tracking, overages
23API Specification (OpenAPI)Public API documentation
Deliverables:
  • Tenant CRUD operations
  • Usage tracking per tenant
  • Billing hooks implemented
  • API documented and versioned

Phase 6: Operations (Documents 24-27)

Goal: Production deployment with observability
#DocumentPurpose
24Infrastructure ArchitectureDO/Dokploy/RunPod topology
25Deployment RunbookStep-by-step production deploy
26CI/CD Pipeline SpecificationAutomated build/deploy
27Monitoring & Observability GuideMetrics, logs, alerts
Deliverables:
  • Production environment provisioned
  • Deployment automated
  • Monitoring dashboards operational
  • Alerting configured

Phase 7: Hardening (Documents 28-30)

Goal: Secure, tested, resilient system
#DocumentPurpose
28Security Architecture DocumentAuth, encryption, trust boundaries
29Testing Strategy DocumentTest coverage plan
30Failure Mode Handling GuideFailovers, fallbacks
Deliverables:
  • Security audit passed
  • Test suite comprehensive
  • Failure scenarios handled gracefully

Skills (Provider API Reference)

In addition to the 30 build documents, the following skills provide API reference material:
/mnt/skills/user/voice-platform/
├── SKILL.md                     # Overview, when to use each sub-skill
├── gotoconnect/
│   ├── SKILL.md                 # Auth, endpoints, code patterns
│   └── postman_collection.json  # Full API collection
├── livekit/
│   └── SKILL.md                 # Agents SDK, room management
├── deepgram/
│   └── SKILL.md                 # Streaming STT configuration
├── anthropic/
│   └── SKILL.md                 # Streaming, tool calling
├── chatterbox/
│   └── SKILL.md                 # RunPod, API wrapper, voice cloning
└── n8n/
    └── SKILL.md                 # Webhook patterns

Success Criteria

MVP Definition

The minimum viable product is achieved when:
  1. Inbound Call: A call to a GoToConnect number is answered by the AI, which holds a natural conversation and either resolves the inquiry or transfers to a human
  2. Outbound Call: The platform initiates a call via API trigger, AI converses with the recipient
  3. Human Handoff: AI successfully transfers a call (blind or warm) to a live agent
  4. Tool Execution: AI executes at least one tool call (e.g., CRM update, calendar check) during a conversation
  5. Multi-Tenant: Two separate clients can operate independent AI agents simultaneously
  6. Latency: Mouth-to-ear response time under 1.5 seconds for 90% of interactions

Quality Gates

MetricTarget
Call completion rate>95%
Transfer success rate>99%
STT accuracy>90%
Average latency<1000ms
Concurrent calls (per tenant)10+
Uptime99.5%

Constraints & Requirements

Technical Constraints

  • Python preferred for WebRTC bridge (aiortc ecosystem)
  • LiveKit Agents SDK is Python-native
  • PostgreSQL for relational data (existing expertise)
  • Redis for ephemeral state (call sessions)
  • Docker/Dokploy for container orchestration

Business Constraints

  • Timeline: MVP within 6-10 weeks
  • Budget: Minimize upfront costs; scale with usage
  • Team: Development via Claude Code with human oversight
  • Existing Systems: Must integrate with existing Knowledge Base and Admin Dashboard

Non-Goals (Out of Scope for MVP)

  • Custom voice cloning per client (use pre-set voices initially)
  • Multi-language support (English only for MVP)
  • SMS/chat channels (voice only)
  • Compliance certifications (SOC 2, HIPAA) — plan for later
  • Mobile app
  • Analytics dashboard beyond basic usage metrics

Document Dependency Map

┌─────────────────────────────────────────────────────────────────────────────┐
│ PHASE 1: FOUNDATION                                                         │
│                                                                             │
│   [1] System Architecture ─────┬─────────────────────────────────────────┐ │
│            │                   │                                         │ │
│            ▼                   ▼                                         │ │
│   [2] GoToConnect    [3] Voice Pipeline    [5] Dev Environment          │ │
│            │                   │                     │                   │ │
│            └─────────┬─────────┘                     │                   │ │
│                      ▼                               │                   │ │
│            [4] WebRTC Bridge ◄───────────────────────┘                   │ │
│                      │                                                   │ │
│                      ▼                                                   │ │
│            [6] Codebase Structure                                        │ │
└─────────────────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────────────┐
│ PHASE 2: CORE INFRASTRUCTURE                                                │
│                                                                             │
│   [7] Database Schema ◄─── [8] State Management ◄─── [9] Event Bus         │
│            │                        │                       │              │
│            └────────────────────────┼───────────────────────┘              │
│                                     ▼                                      │
│                          [10] Error Handling                               │
│                                     │                                      │
│                                     ▼                                      │
│                      [11] Core Services Implementation                     │
└─────────────────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────────────┐
│ PHASE 3: PROVIDER INTEGRATIONS                                              │
│                                                                             │
│   [12] LiveKit ──┬── [13] Deepgram ──┬── [14] Claude ──┬── [15] Chatterbox │
│                  │                   │                 │                   │
│                  └───────────────────┴─────────────────┘                   │
│                                      │                                     │
│                                      ▼                                     │
│                            [16] Tool Calling                               │
└─────────────────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────────────┐
│ PHASE 4: CALL FEATURES                                                      │
│                                                                             │
│   [17] Inbound ────┬──── [18] Outbound                                     │
│         │          │            │                                          │
│         │          ▼            │                                          │
│         │    [19] Human Handoff │                                          │
│         │          │            │                                          │
│         └──────────┼────────────┘                                          │
│                    ▼                                                       │
│         [20] Knowledge Base Integration                                    │
└─────────────────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────────────┐
│ PHASE 5: PLATFORM                                                           │
│                                                                             │
│   [21] Tenant Config API ──── [22] Usage Metering ──── [23] OpenAPI Spec   │
└─────────────────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────────────┐
│ PHASE 6: OPERATIONS                                                         │
│                                                                             │
│   [24] Infrastructure ──── [25] Deployment ──── [26] CI/CD ──── [27] Monitoring │
└─────────────────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────────────┐
│ PHASE 7: HARDENING                                                          │
│                                                                             │
│   [28] Security ──── [29] Testing ──── [30] Failure Modes                  │
└─────────────────────────────────────────────────────────────────────────────┘

How to Use This Document

For Claude Code

  1. Read this document completely before starting any implementation
  2. Follow the phase order — each phase builds on the previous
  3. Consult skills for API-specific implementation details
  4. Reference individual documents for detailed specifications
  5. Check deliverables at the end of each phase before proceeding

For Human Oversight

  1. Review completed phases before approving progression
  2. Test deliverables against success criteria
  3. Provide credentials and access as needed per phase
  4. Clarify requirements when documents reference “TBD” items

Open Questions (To Be Resolved)

QuestionOwnerStatus
GoToConnect OAuth credentials for dev environmentHumanPending
LiveKit Cloud project setupHumanPending
Deepgram API keyHumanPending
Anthropic API keyHumanPending
RunPod account and A5000 provisioningHumanPending
DigitalOcean/Dokploy accessHumanPending
n8n instance URL and credentialsHumanPending
Knowledge Base API endpointHumanPending
Existing Admin Dashboard repo accessHumanPending

Version History

VersionDateAuthorChanges
1.02026-01-16ClaudeInitial document

Next Steps

  1. Human reviews and approves this Master Project Task List
  2. Human provides access credentials for open questions
  3. Claude Code proceeds to Document #1: System Architecture Overview
  4. Build proceeds phase by phase with human checkpoints

This document is the single source of truth for the Voice by aiConnected project. All implementation decisions should align with the specifications herein.

Pre-Build Checklist

✅ Infrastructure (Covered)

  • GoToConnect (telephony)
  • LiveKit Cloud (real-time audio)
  • RunPod A5000 (Chatterbox TTS)
  • Deepgram (STT)
  • Anthropic Claude (LLM)
  • DigitalOcean/Dokploy (platform)

⚠️ Technical (Needs Planning)

ItemStatusNotes
WebRTC BridgeTo buildPython/aiortc service connecting GoTo ↔ LiveKit
DatabaseNeededPostgreSQL for tenants, configs, logs
RedisNeededCall state machine, session cache
Object StorageNeededVoice samples, call recordings
Knowledge BaseNeededHow clients upload business context for their agents
n8n / WebhooksNeededTool calling (CRM, calendar, etc.)
MonitoringNeededGrafana/Prometheus or Datadog

⚠️ Business Logic (Needs Planning)

ItemQuestion to Answer
Billing/MeteringHow do you charge clients? Per minute? Per seat? Flat rate?
Usage TrackingHow do you track minutes per tenant for billing?
Admin DashboardWhat can clients configure themselves?
Onboarding FlowHow do clients set up their first agent?
Voice ManagementHow do clients provide/record their brand voice?
Human HandoffHow do live agents get notified and take over?
Call RecordingStore recordings? How long? Client access?
Rate LimitsMax concurrent calls per client tier?

⚠️ Compliance/Legal (Critical)

ItemWhy It Matters
AI DisclosureSome states (CA, WA, etc.) require disclosure that caller is speaking to AI
TCPA ComplianceOutbound calling rules, consent requirements
Call Recording ConsentTwo-party consent states
Data Retention PolicyHow long do you keep call data?
Privacy PolicyRequired for handling caller PII
Terms of ServiceLiability, acceptable use
DPA (Data Processing Agreement)For B2B clients

⚠️ Failure Modes (Needs Planning)

ScenarioFallback Plan
LLM times outGraceful “one moment please” + retry?
TTS failsPre-recorded fallback audio?
STT failsAsk caller to repeat?
RunPod goes downFailover to Resemble API?
Call volume spikeQueue management? Auto-scale?
Last modified on April 20, 2026