Normalized for Mintlify from knowledge-base/neurigraph-memory-architecture/hyperthyme-memory-framework/hyperthyme-junior-dev-guide.mdx.

Neurigraph Hyperthyme Artificial Memory Framework

Junior Developer Guide

By Oxford Pierpont

What Is Hyperthyme?

Hyperthyme is a memory system for AI. Right now, when you chat with an AI like ChatGPT or Claude, it forgets everything once the conversation ends. Hyperthyme solves this by creating a persistent memory layer that stores, organizes, and retrieves past conversations so the AI can “remember” what you’ve discussed—even months or years later. Think of it like this: the AI is the brain, and Hyperthyme is the long-term memory that the brain can access whenever it needs to recall something. The name comes from “hyperthymesia”—a rare condition where people remember every single day of their lives in perfect detail. We’re building that capability for AI.

The Problem We’re Solving

Context Windows

Every AI model has a “context window”—the amount of text it can see at once. For example:

GPT-4 can see about 128,000 tokens (~100,000 words)
Claude can see about 200,000 tokens (~150,000 words)

This seems like a lot, but it fills up fast. And once the conversation ends, it’s gone. The AI has no way to access previous conversations.

Current Solutions Are Incomplete

Some companies offer basic memory features, but they typically:

Only store summaries (losing important details)
Compress information (losing exact wording, code, files)
Don’t scale to thousands of conversations
Don’t organize information intelligently

Hyperthyme takes a different approach: store everything, organize it well, and retrieve only what’s needed.

How Hyperthyme Works: The Big Picture

┌─────────────────────────────────────────────────────────────┐
│                        USER                                  │
└─────────────────────────┬───────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│                  HYPERTHYME MIDDLEWARE                       │
│                                                             │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │   Logger    │  │  Retriever  │  │  Context Injector   │  │
│  │             │  │             │  │                     │  │
│  │ Saves every │  │ Finds past  │  │ Adds relevant       │  │
│  │ conversation│  │ memories    │  │ memories to prompt  │  │
│  └─────────────┘  └─────────────┘  └─────────────────────┘  │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐    │
│  │                   STORAGE LAYER                      │    │
│  │                                                      │    │
│  │  Knowledge Graph ←→ RAG Database ←→ Recall Files    │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────────┬───────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│                    AI MODEL                                  │
│            (Claude, GPT, Gemini, etc.)                      │
└─────────────────────────────────────────────────────────────┘

The middleware sits between the user and the AI. It:

Logs every conversation as it happens
Retrieves relevant past information when needed
Injects that information into the AI’s context so it can “remember”

Core Components

1. Recall Files

The foundation of the system. A Recall File is a folder that contains a snapshot of a conversation segment. When is a Recall File created? Every ~50,000 tokens (roughly 35,000-40,000 words), the system creates a new Recall File. This threshold is chosen because:

It’s small enough to fit in most AI context windows when retrieved
It’s large enough that you don’t create thousands of tiny files
It represents roughly 1-3 substantial conversations

What’s inside a Recall File?

recall-files/
└── ai-brain-memory-architecture-2025-01-11/
    ├── summary.md          # AI-generated summary of the conversation
    ├── keywords.txt        # Extracted keywords for fast searching
    ├── transcript.md       # Complete verbatim conversation log
    └── artifacts.zip       # Any files created during this conversation

File Breakdown:

File	Purpose	Size
`summary.md`	Quick overview for search matching	Small (~500-1000 words)
`keywords.txt`	Exact-match search terms	Tiny (~50-100 terms)
`transcript.md`	Full source of truth	Large (~50,000 tokens)
`artifacts.zip`	Code, documents, images created	Variable

Naming Convention:

{topic-key-subject}-{YYYY-MM-DD}/

Examples:

funnelchat-stripe-integration-2025-01-03/
ai-brain-memory-architecture-2025-01-11/
marketing-strategy-q1-planning-2025-01-08/

2. Knowledge Graph

The Knowledge Graph is a database that stores relationships between topics. Think of it as a map of everything the user has discussed. What it stores:

Nodes: Topics, projects, concepts, people, entities
Edges: Relationships between nodes

Example Structure:

[AI Brain] ──contains──► [Memory System]
     │                         │
     │                         ├──relates to──► [Hyperthyme]
     │                         │
     │                         └──discussed in──► [recall-file-2025-01-11]
     │
     ├──contains──► [Coherence Layer]
     │
     └──contains──► [Storage System]

Why it matters: When the user asks about “the memory system,” the Knowledge Graph instantly knows:

It’s part of the AI Brain project
It relates to Hyperthyme
The relevant Recall Files are from January 2025

This narrows the search space from potentially millions of files to just a handful. Technology options:

Neo4j (most popular graph database)
Amazon Neptune
PostgreSQL with graph extensions
Lightweight: NetworkX (Python library) for prototyping

3. RAG Database (Vector Store)

RAG stands for “Retrieval-Augmented Generation.” It’s a technique where you:

Convert text into numerical vectors (embeddings)
Store those vectors in a specialized database
Search by finding vectors that are “similar” to a query

How it works in Hyperthyme: The summaries from Recall Files are embedded and stored in a vector database. When the user asks a question, the question is also embedded, and we find summaries that are semantically similar.

User Query: "What was that thing about payment processing?"
                    │
                    ▼
            [Generate Embedding]
                    │
                    ▼
            [Search Vector DB]
                    │
                    ▼
    Matches: "funnelchat-stripe-integration-2025-01-03"
             "payment-gateway-comparison-2024-12-15"

Why not just use keyword search? Keyword search finds exact matches. RAG finds semantic matches.

Keyword search for “payment processing” won’t find a document that only mentions “Stripe integration”
RAG understands that “payment processing” and “Stripe integration” are related concepts

Technology options:

Pinecone (managed, easy to start)
Weaviate (open source)
Chroma (lightweight, good for prototyping)
pgvector (PostgreSQL extension)
Qdrant (open source, performant)

4. Defining Memories

Not all memories are equal. Some conversations are routine; others are significant. Defining Memories are flagged moments that represent:

Decisions (“I’ve decided to focus on the AI marketplace”)
Milestones (“We launched the beta today”)
Life events (“I’m starting a new job”)
Turning points (“This changes everything”)

How they’re detected: The system looks for trigger patterns in conversations:

DECISION_TRIGGERS = [
    "I've decided",
    "We're going with",
    "I'm committing to",
    "Let's do",
    "Final decision:",
]

MILESTONE_TRIGGERS = [
    "We launched",
    "It's done",
    "I finished",
    "Completed",
    "Shipped",
]

EVENT_TRIGGERS = [
    "I'm starting",
    "I got the job",
    "We closed the deal",
    "I'm getting married",
]

Defining Memory Structure:

{
  "id": "dm-2025-01-11-001",
  "type": "decision",
  "date": "2025-01-11",
  "summary": "Committed to building Hyperthyme memory system",
  "context": "After discovering Mem0 raised $24M for a similar approach",
  "source_recall_file": "ai-brain-memory-architecture-2025-01-11/",
  "related_nodes": ["AI Brain", "Hyperthyme", "Memory System"],
  "tags": ["product", "commitment", "startup"]
}

Why separate Defining Memories? When someone asks “When did I decide to start this project?” they don’t want to search through 10,000 conversations. They want to hit the Defining Memory index and get an instant answer. Defining Memories are always “warm”—always in memory, always fast to access.

The Search Cascade

When the user asks something that requires memory, the system searches in layers:

┌─────────────────────────────────────────────────────────────┐
│ QUERY: "What did we decide about the payment system?"       │
└─────────────────────────┬───────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│ STEP 1: Knowledge Graph Navigation                          │
│                                                             │
│ "payment system" → relates to → "funnelChat" project        │
│                                                             │
│ Result: Scope search to funnelChat-related Recall Files     │
└─────────────────────────┬───────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│ STEP 2: Keyword Search                                       │
│                                                             │
│ Search keywords.txt files for: "payment", "stripe", "billing"│
│                                                             │
│ Result: 3 Recall Files match                                │
└─────────────────────────┬───────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│ STEP 3: RAG Search on Summaries                              │
│                                                             │
│ Embed query, find similar summaries                         │
│                                                             │
│ Result: Ranked list of most relevant Recall Files           │
└─────────────────────────┬───────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│ STEP 4: Load Transcript                                      │
│                                                             │
│ Read full transcript.md from top-ranked Recall File         │
│                                                             │
│ Result: Complete context available                          │
└─────────────────────────┬───────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│ STEP 5: Check Defining Memories                              │
│                                                             │
│ Were there any decisions about payment systems?             │
│                                                             │
│ Result: "On Jan 3, decided to use Stripe Connect"           │
└─────────────────────────────────────────────────────────────┘

This cascade is fast because each step narrows the search space:

Knowledge Graph: Millions of files → Thousands (scoped to project)
Keywords: Thousands → Hundreds (exact matches)
RAG: Hundreds → Tens (semantic relevance)
Transcript: Load only what’s needed

Storage States: Hot, Warm, Cold

Not all memories need to be instantly accessible. Hyperthyme uses a tiered storage system:

Hot (Active)

Current conversation
Currently loaded Recall Files
Uncompressed, in working memory

Warm (Recent)

Accessed in the last 7 days
Same project/node as current conversation
Uncompressed, ready to read

Cold (Long-term)

Not accessed in 7+ days
Artifacts are compressed (zipped)
Keywords and summaries still indexed
Takes slightly longer to retrieve

Warming Process: When the user starts discussing a topic, the system “warms” related memories:

def warm_node(node_id):
    """
    When a topic is touched, warm all related Recall Files
    """
    # Get all Recall Files linked to this node
    recall_files = knowledge_graph.get_files_for_node(node_id)
    
    for file in recall_files:
        if file.is_cold():
            # Decompress artifacts
            file.decompress_artifacts()
            
            # Pre-load transcript into cache
            file.cache_transcript()
            
            # Mark as warm
            file.set_state("warm")

This is predictive retrieval—if you’re asking about the AI Brain project, you’ll probably ask more AI Brain questions, so we prepare.

Making It Model-Agnostic

Hyperthyme works with any AI model. Here’s how:

The Middleware Pattern

Hyperthyme doesn’t modify the AI. It wraps around it:

class HyperthymeMiddleware:
    def __init__(self, ai_client, memory_store):
        self.ai = ai_client  # Could be OpenAI, Anthropic, Google, etc.
        self.memory = memory_store
    
    def chat(self, user_message, user_id):
        # 1. Search for relevant memories
        relevant_memories = self.memory.search(
            query=user_message,
            user_id=user_id
        )
        
        # 2. Build enhanced prompt with memories
        enhanced_prompt = self.inject_memories(
            user_message, 
            relevant_memories
        )
        
        # 3. Send to AI (any model works here)
        response = self.ai.generate(enhanced_prompt)
        
        # 4. Log the conversation
        self.memory.log(user_message, response, user_id)
        
        return response
    
    def inject_memories(self, message, memories):
        memory_context = "\n".join([
            f"[From {m.date}]: {m.summary}"
            for m in memories
        ])
        
        return f"""
        Relevant context from past conversations:
        {memory_context}
        
        Current message: {message}
        """

Swapping Models

Because the middleware handles memory separately, you can swap AI models without losing memory:

# Using Claude
claude_client = AnthropicClient(api_key="...")
hyperthyme = HyperthymeMiddleware(claude_client, memory_store)

# Switch to GPT—memory stays the same
openai_client = OpenAIClient(api_key="...")
hyperthyme = HyperthymeMiddleware(openai_client, memory_store)

MCP (Model Context Protocol)

MCP is an emerging standard that lets AI models call external tools. Hyperthyme can be exposed as an MCP server:

@mcp_tool("search_memory")
def search_memory(query: str, user_id: str) -> list:
    """Search user's conversation history"""
    return memory_store.search(query, user_id)

@mcp_tool("get_defining_memories")
def get_defining_memories(user_id: str) -> list:
    """Get user's major decisions and milestones"""
    return memory_store.get_defining_memories(user_id)

Now any MCP-compatible AI can access Hyperthyme memory directly.

Database Schema (Simplified)

Here’s a starting point for the database design:

recall_files

CREATE TABLE recall_files (
    id UUID PRIMARY KEY,
    user_id UUID NOT NULL,
    folder_name VARCHAR(255) NOT NULL,
    topic VARCHAR(255),
    created_at TIMESTAMP NOT NULL,
    updated_at TIMESTAMP NOT NULL,
    token_count INTEGER,
    state VARCHAR(20) DEFAULT 'warm',  -- 'hot', 'warm', 'cold'
    summary_path TEXT,
    transcript_path TEXT,
    keywords_path TEXT,
    artifacts_path TEXT
);

knowledge_graph_nodes

CREATE TABLE knowledge_graph_nodes (
    id UUID PRIMARY KEY,
    user_id UUID NOT NULL,
    name VARCHAR(255) NOT NULL,
    node_type VARCHAR(50),  -- 'project', 'topic', 'person', 'concept'
    created_at TIMESTAMP NOT NULL,
    last_accessed TIMESTAMP
);

knowledge_graph_edges

CREATE TABLE knowledge_graph_edges (
    id UUID PRIMARY KEY,
    source_node_id UUID REFERENCES knowledge_graph_nodes(id),
    target_node_id UUID REFERENCES knowledge_graph_nodes(id),
    relationship VARCHAR(100),  -- 'contains', 'relates_to', 'discussed_in'
    created_at TIMESTAMP NOT NULL
);

recall_file_nodes (junction table)

CREATE TABLE recall_file_nodes (
    recall_file_id UUID REFERENCES recall_files(id),
    node_id UUID REFERENCES knowledge_graph_nodes(id),
    PRIMARY KEY (recall_file_id, node_id)
);

defining_memories

CREATE TABLE defining_memories (
    id UUID PRIMARY KEY,
    user_id UUID NOT NULL,
    memory_type VARCHAR(50),  -- 'decision', 'milestone', 'event', 'turning_point'
    summary TEXT NOT NULL,
    context TEXT,
    detected_at TIMESTAMP NOT NULL,
    source_recall_file_id UUID REFERENCES recall_files(id),
    tags TEXT[]  -- Array of tags
);

summary_embeddings

-- For vector search (using pgvector)
CREATE TABLE summary_embeddings (
    id UUID PRIMARY KEY,
    recall_file_id UUID REFERENCES recall_files(id),
    embedding vector(1536),  -- OpenAI embedding size
    created_at TIMESTAMP NOT NULL
);

-- Create index for fast similarity search
CREATE INDEX ON summary_embeddings 
USING ivfflat (embedding vector_cosine_ops);

Technology Stack Recommendations

For Prototyping (MVP)

Component	Recommendation	Why
Language	Python	Fastest for AI development
Database	PostgreSQL + pgvector	One database for everything
File Storage	Local filesystem	Simple, no cloud dependency
Vector Search	pgvector	Integrated with main DB
Knowledge Graph	NetworkX (in-memory)	Fast prototyping
AI Integration	LangChain or direct API	Flexibility
API Framework	FastAPI	Modern, async, automatic docs

For Production

Component	Recommendation	Why
Language	Python + Go for performance-critical	Balance of speed and AI ecosystem
Database	PostgreSQL (primary)	Battle-tested, scalable
File Storage	S3 or equivalent	Scalable, cheap
Vector Search	Pinecone or Weaviate	Purpose-built, performant
Knowledge Graph	Neo4j	Industry standard
Caching	Redis	Fast warming/hot storage
API Framework	FastAPI behind Kong/Nginx	Production-ready
Orchestration	Kubernetes	Scalability

Getting Started: Your First Task

If you’re building this, here’s what to tackle first:

Week 1: Basic Recall File Creation

# Goal: Create Recall Files from conversations

def create_recall_file(conversation, user_id):
    # 1. Generate folder name
    folder_name = generate_folder_name(conversation)
    
    # 2. Save transcript
    save_transcript(folder_name, conversation)
    
    # 3. Generate and save summary (using AI)
    summary = generate_summary(conversation)
    save_summary(folder_name, summary)
    
    # 4. Extract and save keywords
    keywords = extract_keywords(conversation)
    save_keywords(folder_name, keywords)
    
    # 5. Register in database
    register_recall_file(folder_name, user_id)

Week 2: Basic Search

# Goal: Find relevant Recall Files

def search_memory(query, user_id):
    # 1. Keyword search
    keyword_matches = search_keywords(query, user_id)
    
    # 2. Return matching Recall Files
    return load_recall_files(keyword_matches)

Week 3: RAG Integration

# Goal: Add semantic search

def search_memory_with_rag(query, user_id):
    # 1. Embed the query
    query_embedding = embed_text(query)
    
    # 2. Find similar summaries
    matches = vector_db.search(query_embedding, user_id)
    
    # 3. Load and return
    return load_recall_files(matches)

Week 4: Knowledge Graph

# Goal: Add topic-based navigation

def search_memory_with_graph(query, user_id):
    # 1. Identify relevant nodes
    nodes = knowledge_graph.find_nodes(query, user_id)
    
    # 2. Get Recall Files for those nodes
    recall_files = []
    for node in nodes:
        recall_files.extend(node.get_recall_files())
    
    # 3. Rank and return
    return rank_by_relevance(recall_files, query)

Common Pitfalls to Avoid

1. Storing Too Much in Memory

Don’t try to keep all transcripts in RAM. Use the hot/warm/cold system. Only load what’s needed.

2. Ignoring Token Limits

When injecting memories into prompts, count tokens. Don’t overflow the AI’s context window.

def inject_memories(message, memories, max_tokens=4000):
    injected = []
    token_count = 0
    
    for memory in memories:
        memory_tokens = count_tokens(memory.summary)
        if token_count + memory_tokens > max_tokens:
            break
        injected.append(memory)
        token_count += memory_tokens
    
    return injected

3. Not Handling Multiple Users

Always scope queries by user_id. Never let one user’s memories leak to another.

4. Synchronous Everything

Recall File creation, embedding generation, and cold storage compression should be async/background jobs. Don’t block the user.

5. No Backup Strategy

Memories are valuable. Implement backups from day one.

Summary

Hyperthyme is a memory layer for AI consisting of:

Recall Files — Complete conversation snapshots with summaries, keywords, transcripts, and artifacts
Knowledge Graph — Relationship map between topics for fast navigation
RAG Database — Semantic search over summaries
Defining Memories — Index of major decisions and milestones
Middleware — Model-agnostic layer that handles logging and retrieval

The system uses a search cascade (Graph → Keywords → RAG → Transcript) to efficiently find relevant memories, and a tiered storage system (Hot → Warm → Cold) to balance speed and cost. Start simple. Build the Recall File system first. Add intelligence layer by layer.

Neurigraph Hyperthyme Artificial Memory Framework
By Oxford Pierpont

​Neurigraph Hyperthyme Artificial Memory Framework

​Junior Developer Guide

​What Is Hyperthyme?

​The Problem We’re Solving

​Context Windows

​Current Solutions Are Incomplete

​How Hyperthyme Works: The Big Picture

​Core Components

​1. Recall Files

​2. Knowledge Graph

​3. RAG Database (Vector Store)

​4. Defining Memories

​The Search Cascade

​Storage States: Hot, Warm, Cold

​Hot (Active)

​Warm (Recent)

​Cold (Long-term)

​Making It Model-Agnostic

​The Middleware Pattern

​Swapping Models

​MCP (Model Context Protocol)

​Database Schema (Simplified)

​recall_files

​knowledge_graph_nodes

​knowledge_graph_edges

​recall_file_nodes (junction table)

​defining_memories

​summary_embeddings

​Technology Stack Recommendations

​For Prototyping (MVP)

​For Production

​Getting Started: Your First Task

​Week 1: Basic Recall File Creation

​Week 2: Basic Search

​Week 3: RAG Integration

​Week 4: Knowledge Graph

​Common Pitfalls to Avoid

​1. Storing Too Much in Memory

​2. Ignoring Token Limits

​3. Not Handling Multiple Users

​4. Synchronous Everything

​5. No Backup Strategy

​Summary

Neurigraph Hyperthyme Artificial Memory Framework

Junior Developer Guide

What Is Hyperthyme?

The Problem We’re Solving

Context Windows

Current Solutions Are Incomplete

How Hyperthyme Works: The Big Picture

Core Components

1. Recall Files

2. Knowledge Graph

3. RAG Database (Vector Store)

4. Defining Memories

The Search Cascade

Storage States: Hot, Warm, Cold

Hot (Active)

Warm (Recent)

Cold (Long-term)

Making It Model-Agnostic

The Middleware Pattern

Swapping Models

MCP (Model Context Protocol)

Database Schema (Simplified)

recall_files

knowledge_graph_nodes

knowledge_graph_edges

recall_file_nodes (junction table)

defining_memories

summary_embeddings

Technology Stack Recommendations

For Prototyping (MVP)

For Production

Getting Started: Your First Task

Week 1: Basic Recall File Creation

Week 2: Basic Search

Week 3: RAG Integration

Week 4: Knowledge Graph

Common Pitfalls to Avoid

1. Storing Too Much in Memory

2. Ignoring Token Limits

3. Not Handling Multiple Users

4. Synchronous Everything

5. No Backup Strategy

Summary