Normalized for Mintlify from
knowledge-base/neurigraph-memory-architecture/hyperthyme-memory-framework/hyperthyme-technical-architecture.mdx.Hyperthyme Technical Architecture Document (TAD)
Version: 1.0Author: Oxford Pierpont
Created: January 2026
Status: Draft Part of the Neurigraph Product Family
What’s Included:
| Section | Content |
|---|---|
| 1. Document Overview | Purpose, scope, audience, definitions |
| 2. System Purpose & Scope | Problem statement, solution, design philosophy, boundaries |
| 3. Architecture Overview | High-level diagrams, component summary, data flows |
| 4. Component Specifications | API Gateway, Middleware, Logger, Retriever, Injector, KG Manager, Defining Memory Detector |
| 5. Data Models & Schema | Complete PostgreSQL schema, Recall File structure, Python dataclasses |
| 6. APIs & Interfaces | REST API spec, MCP server implementation, SDK examples |
| 7. Retrieval Pipeline | 5-stage cascade with code, performance optimization, caching |
| 8. Storage Management | Hot/Warm/Cold tiers, state transitions, file layout, storage estimates |
| 9. Security & Privacy | Auth, encryption, data isolation, audit logging, deletion |
| 10. Performance Requirements | Latency/throughput targets, availability, resource budgets |
| 11. Deployment Architecture | Infrastructure diagrams, Docker, Kubernetes configs |
| 12. Integration Patterns | Direct API, LangChain, MCP, webhooks |
| 13. Error Handling & Recovery | Error categories, retry logic, circuit breakers, data recovery |
| 14. Monitoring & Observability | Prometheus metrics, structured logging, tracing, alerting |
| 15. Future Considerations | Roadmap, migration, scalability path |
Table of Contents
- Document Overview
- System Purpose & Scope
- Architecture Overview
- Component Specifications
- Data Models & Schema
- APIs & Interfaces
- Retrieval Pipeline
- Storage Management
- Security & Privacy
- Performance Requirements
- Deployment Architecture
- Integration Patterns
- Error Handling & Recovery
- Monitoring & Observability
- Future Considerations
1. Document Overview
1.1 Purpose
This Technical Architecture Document (TAD) defines the complete system design for Hyperthyme, a persistent memory layer for AI systems. It provides the technical foundation required for implementation, serving as the authoritative reference for all development decisions.1.2 Scope
This document covers:- System architecture and component design
- Data models and storage strategies
- API specifications and integration patterns
- Performance, security, and operational requirements
- Business requirements (see PRD)
- User interface design
- Marketing or go-to-market strategy
- The broader Neurigraph ecosystem (Cognigraph, etc.)
1.3 Audience
- Software engineers implementing the system
- DevOps engineers deploying and operating the system
- Technical architects reviewing the design
- Integration partners building on the platform
1.4 Definitions
| Term | Definition |
|---|---|
| Recall File | A folder containing a complete conversation segment (~50K tokens) with summary, keywords, transcript, and artifacts |
| Knowledge Graph | A graph database storing relationships between topics, projects, and Recall Files |
| RAG | Retrieval-Augmented Generation - using vector similarity to find relevant content |
| Defining Memory | A flagged moment representing a decision, milestone, or significant event |
| Hot/Warm/Cold | Storage tiers based on access recency and retrieval speed requirements |
| Middleware | The Hyperthyme layer that sits between applications and AI models |
2. System Purpose & Scope
2.1 Problem Statement
Current AI systems (LLMs) operate statelessly. They have no persistent memory across sessions. Users must re-explain context repeatedly, and valuable conversation history is lost.2.2 Solution
Hyperthyme provides a persistent memory layer that:- Archives complete conversations verbatim
- Organizes content via hierarchical knowledge graph
- Indexes content for fast semantic and keyword retrieval
- Retrieves relevant context and injects it into AI prompts
- Preserves significant moments as Defining Memories
2.3 Design Philosophy
Principle 1: Summaries are indexes, not storage- We never discard original content in favor of summaries
- Summaries enable fast search; transcripts provide full context
- Knowledge Graph narrows search space before vector search
- This maintains performance at scale (millions of Recall Files)
- Storage is cheap; token context is expensive
- Store complete archives; inject only what’s relevant
- Works with any LLM (Claude, GPT, Gemini, open-source)
- Memory persists even when switching models
2.4 System Boundaries
In Scope:- Conversation logging and archival
- Knowledge graph management
- Vector and keyword indexing
- Memory retrieval and context injection
- Defining Memory detection and indexing
- Storage lifecycle management
- API for integration
- The AI model itself (Hyperthyme wraps around it)
- User interface (provided by integrating applications)
- Real-time collaboration features
- Training or fine-tuning AI models
3. Architecture Overview
3.1 High-Level Architecture
3.2 Component Summary
| Component | Responsibility | Technology Options |
|---|---|---|
| API Gateway | Request routing, auth, rate limiting | Kong, Nginx, custom FastAPI |
| Middleware Orchestrator | Coordinates logging, retrieval, injection | Python (FastAPI) |
| Logger | Captures and stores conversations | Python async workers |
| Retriever | Finds relevant memories | Python with graph/vector clients |
| Injector | Builds context-enhanced prompts | Python |
| Knowledge Graph | Topic/project relationships | Neo4j, PostgreSQL with ltree |
| RAG (Vector Store) | Semantic similarity search | pgvector, Pinecone, Qdrant |
| Recall Files | Complete conversation archives | S3, local filesystem |
| Defining Memories | Significant moment index | PostgreSQL |
3.3 Data Flow
Write Path (Logging):4. Component Specifications
4.1 API Gateway
Purpose: Single entry point for all client requests. Responsibilities:- Request authentication and authorization
- Rate limiting per user/tenant
- Request routing to appropriate handlers
- SSL/TLS termination
- Request/response logging
- API versioning
| Endpoint | Method | Purpose |
|---|---|---|
/v1/chat | POST | Send message with memory-augmented context |
/v1/search | POST | Search memory without sending to AI |
/v1/recall-files | GET | List user’s Recall Files |
/v1/recall-files/{id} | GET | Get specific Recall File content |
/v1/defining-memories | GET | List user’s Defining Memories |
/v1/graph/nodes | GET | Query Knowledge Graph nodes |
/v1/graph/nodes | POST | Create new node |
/v1/health | GET | System health check |
4.2 Middleware Orchestrator
Purpose: Coordinates all memory operations for a request. Responsibilities:- Session management (tracking active conversations)
- Routing to Logger, Retriever, Injector
- Token budget management
- Error handling and fallbacks
- Metrics collection
4.3 Logger Component
Purpose: Captures, parses, and stores all conversation content. Responsibilities:- Append messages to active Recall File transcript
- Track token count for threshold detection
- Extract entities for Knowledge Graph updates
- Detect Defining Memory triggers
- Manage Recall File finalization
4.4 Retriever Component
Purpose: Finds relevant memories for a given query. Responsibilities:- Execute multi-stage retrieval cascade
- Rank and filter results
- Load transcript content as needed
- Manage retrieval caching
4.5 Injector Component
Purpose: Builds context-enhanced prompts for AI models. Responsibilities:- Format memories for prompt injection
- Manage token budget
- Structure context for different models
- Handle prompt templates
4.6 Knowledge Graph Manager
Purpose: Maintains the hierarchical structure of user knowledge. Responsibilities:- Create and update nodes (projects, topics, concepts)
- Manage edges (relationships between nodes)
- Link Recall Files to nodes
- Support graph traversal queries
4.7 Defining Memory Detector
Purpose: Identifies and indexes significant moments in conversations. Detection Triggers:5. Data Models & Schema
5.1 PostgreSQL Schema
5.2 Recall File Structure
Each Recall File is stored as a folder:5.3 Object Models
6. APIs & Interfaces
6.1 REST API Specification
Base URL:https://api.hyperthyme.ai/v1
6.1.1 Chat Endpoint
POST /chat Send a message with memory-augmented context. Request:6.1.2 Search Endpoint
POST /search Search memories without sending to AI. Request:6.1.3 Recall Files Endpoints
GET /recall-files List user’s Recall Files. Query Parameters:status: Filter by status (active, finalized, archived)topic: Filter by topic (fuzzy match)limit: Max results (default 20, max 100)offset: Pagination offsetsort: Sort field (created_at, updated_at, last_accessed_at)order: Sort order (asc, desc)
include: Comma-separated list (summary, keywords, transcript, artifacts)
6.1.4 Defining Memories Endpoints
GET /defining-memories List user’s Defining Memories. Query Parameters:type: Filter by type (decision, milestone, event, turning_point)since: Filter by date (ISO 8601)limit: Max resultsoffset: Pagination offset
6.1.5 Knowledge Graph Endpoints
GET /graph/nodes Query Knowledge Graph nodes. Query Parameters:type: Filter by node typename: Search by name (fuzzy)related_to: Find nodes related to a specific node IDdepth: Traversal depth for related queries
6.2 MCP (Model Context Protocol) Interface
Hyperthyme exposes tools for MCP-compatible AI systems. Tools Exposed:6.3 SDK Interface
7. Retrieval Pipeline
7.1 Pipeline Overview
The retrieval pipeline executes a multi-stage cascade designed to efficiently find relevant memories while minimizing computational cost.7.2 Stage Details
Stage 1: Defining Memory Check
Stage 2: Knowledge Graph Navigation
Stage 3: Keyword Filtering
Stage 4: Semantic Search
Stage 5: Content Loading
7.3 Performance Optimization
Caching Strategy:8. Storage Management
8.1 Storage Tiers
8.2 State Transitions
8.3 File Storage Layout
8.4 Storage Estimates
| Component | Size per Recall File | Notes |
|---|---|---|
| summary.md | ~2-5 KB | 500-1000 tokens |
| keywords.txt | ~0.5-1 KB | 50-100 keywords |
| transcript.md | ~150-200 KB | 50K tokens |
| artifacts (avg) | ~50-500 KB | Varies widely |
| Total (uncompressed) | ~200-700 KB | |
| Total (compressed) | ~50-200 KB | ~3:1 compression |
| Recall Files | Uncompressed | Compressed |
|---|---|---|
| 1,000 | 200-700 MB | 50-200 MB |
| 10,000 | 2-7 GB | 0.5-2 GB |
| 100,000 | 20-70 GB | 5-20 GB |
| 1,000,000 | 200-700 GB | 50-200 GB |
9. Security & Privacy
9.1 Authentication & Authorization
Authentication:- API key authentication for server-to-server
- OAuth 2.0 / OIDC for user-facing applications
- JWT tokens for session management
- All data is scoped by user_id
- No cross-user data access
- Role-based access for admin functions
9.2 Data Encryption
At Rest:- All stored files encrypted with AES-256-GCM
- Per-user encryption keys derived from master key
- Keys stored in separate key management system
- TLS 1.3 required for all connections
- Certificate pinning for mobile SDKs
9.3 Data Isolation
Tenant Isolation:- Logical isolation via user_id filtering on all queries
- Consider physical isolation (separate databases) for enterprise tier
9.4 Audit Logging
9.5 Data Retention & Deletion
Retention Policy:- Default: Indefinite (user controls)
- Configurable per-user retention limits
- GDPR/CCPA compliant deletion on request
10. Performance Requirements
10.1 Latency Targets
| Operation | Target (P50) | Target (P99) | Notes |
|---|---|---|---|
| Chat (with memory) | 500ms | 2000ms | Includes retrieval + AI response |
| Memory search | 50ms | 200ms | Hot/warm storage |
| Memory search (cold) | 500ms | 1000ms | Includes decompression |
| Recall File creation | 100ms | 500ms | Async summary generation |
| Knowledge Graph query | 20ms | 100ms | Graph traversal |
| Vector search | 30ms | 100ms | Scoped search |
10.2 Throughput Targets
| Metric | Target | Notes |
|---|---|---|
| Requests per second (per node) | 100 RPS | Mix of read/write |
| Concurrent users (per node) | 1,000 | Active sessions |
| Messages logged per second | 500 | Across all users |
| Search queries per second | 200 | Per node |
10.3 Availability Targets
| Metric | Target |
|---|---|
| Uptime | 99.9% (8.76 hours/year downtime) |
| RTO (Recovery Time Objective) | < 1 hour |
| RPO (Recovery Point Objective) | < 5 minutes |
10.4 Scalability Requirements
Horizontal Scaling:- API Gateway: Stateless, scale by adding instances
- Core Engine: Stateless workers behind load balancer
- PostgreSQL: Read replicas for query scaling
- Vector DB: Sharding by user_id range
- Start with reasonable instance sizes
- Scale up before scaling out for simplicity
- Document scaling thresholds
10.5 Resource Budgets
Per Request:11. Deployment Architecture
11.1 Infrastructure Overview
11.2 Container Configuration
Dockerfile:11.3 Kubernetes Configuration
Deployment:11.4 Environment Configuration
12. Integration Patterns
12.1 Direct API Integration
12.2 LangChain Integration
12.3 MCP Server Implementation
12.4 Webhook Integration
13. Error Handling & Recovery
13.1 Error Categories
13.2 Error Response Format
13.3 Retry Logic
13.4 Circuit Breaker
13.5 Data Recovery
14. Monitoring & Observability
14.1 Metrics
14.2 Logging
14.3 Tracing
14.4 Alerting
14.5 Health Checks
15. Future Considerations
15.1 Planned Enhancements
Short-term (3-6 months):- Multi-language support for summaries and keywords
- Custom embedding model fine-tuning
- Batch import/export functionality
- Advanced search filters (date ranges, sentiment, etc.)
- Team/organization shared memories
- Memory sharing with privacy controls
- Real-time collaboration features
- Mobile SDK
- Federated memory across multiple Hyperthyme instances
- On-device memory (edge deployment)
- Integration with Cognigraph training system
- Memory compression and archival strategies
15.2 Migration Considerations
Database Schema Evolution:- Use Alembic for schema migrations
- Maintain backward compatibility for 2 major versions
- Document breaking changes
- URL-based versioning (/v1/, /v2/)
- Support previous version for 12 months after deprecation
- Provide migration guides
15.3 Scalability Roadmap
| Users | Architecture |
|---|---|
| 1-1,000 | Single instance, single PostgreSQL |
| 1,000-10,000 | Multiple API instances, PostgreSQL read replicas |
| 10,000-100,000 | Sharded PostgreSQL, dedicated vector DB |
| 100,000+ | Regional deployment, global load balancing |
Appendix A: Glossary
| Term | Definition |
|---|---|
| Context Window | The maximum amount of text an AI model can process at once |
| Defining Memory | A flagged significant moment (decision, milestone, event) |
| Embedding | A numerical vector representation of text for similarity search |
| Knowledge Graph | A graph database storing relationships between entities |
| RAG | Retrieval-Augmented Generation - enhancing AI with retrieved context |
| Recall File | A complete conversation archive with summary, keywords, and transcript |
Appendix B: Reference Links
Document Control:
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0 | January 2026 | Oxford Pierpont | Initial release |
Hyperthyme is part of the Neurigraph product family.
© 2026 Oxford Pierpont. All rights reserved.