Normalized for Mintlify from knowledge-base/neurigraph-memory-architecture/neurigraph-tool-references/10-Lightweight-Fact-Based-AI-Memory-API.mdx.

Clean-Room Specification: Lightweight Fact-Based AI Memory API

Purpose of This Document

This document specifies the architecture for a fact-based AI memory system that automatically extracts, stores, deduplicates, and retrieves discrete factual memories from conversations. Rather than storing raw conversation transcripts, the system uses an LLM to distill conversations into atomic facts (e.g., “User prefers dark mode,” “User works at Acme Corp”), stores them as vector embeddings for semantic retrieval, and maintains a full audit history of every memory operation. The system supports user/agent/session scoping, pluggable vector store backends, optional graph-based entity-relationship memory, and both synchronous and asynchronous APIs. This specification enables independent implementation from scratch.

1. System Overview

1.1 Core Concept

Traditional memory systems store raw conversation logs. This system takes a fundamentally different approach: it uses an LLM as a memory curator that reads conversations, extracts discrete facts, compares them against existing memories, and decides whether to ADD new facts, UPDATE existing ones, DELETE obsolete ones, or take NO action. The result is a clean, deduplicated factual memory store that grows smarter over time.

1.2 High-Level Architecture

┌─────────────────────────────────────────────────────────────┐
│                        Client API                           │
│  Memory.add() / .search() / .get() / .get_all() / .update()│
│  .delete() / .delete_all() / .history() / .reset()          │
├─────────────────────────────────────────────────────────────┤
│                   Memory Pipeline                           │
│  ┌──────────┐  ┌──────────────┐  ┌─────────────────────┐   │
│  │ Message   │→│ LLM Fact     │→│ Embed + Search       │   │
│  │ Parser    │  │ Extraction   │  │ Existing Memories    │   │
│  └──────────┘  └──────────────┘  └─────────┬───────────┘   │
│                                             │               │
│  ┌──────────────────────────────────────────▼───────────┐   │
│  │           LLM Memory Update Decision                 │   │
│  │  Compare new facts vs existing → ADD/UPDATE/DELETE   │   │
│  └──────────────────────────────────────────────────────┘   │
├─────────────────────────────────────────────────────────────┤
│                    Storage Layer                            │
│  ┌────────────┐  ┌────────────┐  ┌────────────────────┐    │
│  │ Vector DB  │  │ SQLite     │  │ Neo4j (optional)   │    │
│  │ (memories) │  │ (history)  │  │ (graph memory)     │    │
│  └────────────┘  └────────────┘  └────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

1.3 Data Flow Summary

Client calls memory.add(messages, user_id=...) with conversation messages
Message Parser normalizes input into a flat string
LLM Fact Extraction sends conversation + system prompt → receives JSON array of discrete facts
For each extracted fact: a. Generate embedding vector b. Search vector store for similar existing memories (top 5) c. LLM Memory Update Decision compares new fact against existing memories → produces ADD/UPDATE/DELETE/NONE events
Execute each event against the vector store
Log every operation to SQLite history table
Optionally extract entities and relationships to graph store
Return list of memory events to the caller

2. Data Model

2.1 MemoryItem

The core data structure representing a single stored memory:

interface MemoryItem {
  id: string;              // UUID v4
  memory: string;          // The fact text, e.g. "User prefers Python over JavaScript"
  hash: string;            // MD5 hex digest of the memory text (for deduplication)
  metadata: Record<string, any>;  // Arbitrary key-value pairs
  score?: number;          // Similarity score (populated on search results only)
  created_at: string;      // ISO 8601 timestamp
  updated_at: string;      // ISO 8601 timestamp
}

Hash computation: hash = md5(memory_text).hexdigest(). Used to detect exact duplicate memories before insertion.

2.2 MemoryEvent

Represents a single operation performed during an add() call:

interface MemoryEvent {
  event: "ADD" | "UPDATE" | "DELETE" | "NONE";
  id: string;                // Memory ID affected
  old_memory?: string;       // Previous text (for UPDATE/DELETE)
  new_memory?: string;       // New text (for ADD/UPDATE)
  metadata?: Record<string, any>;
}

2.3 Message Format

Input messages follow the standard chat message format:

type Message = {
  role: "system" | "user" | "assistant";
  content: string;
};

The `add()` method accepts either a single string or an array of `Message` objects. If a string is provided, it is wrapped as `[{ role: "user", content: str }]`.

2.4 Scoping Model

Every memory operation requires at least one scope identifier. These are used as metadata filters on the vector store to isolate memories:

interface MemoryScope {
  user_id?: string;    // Isolate memories per end-user
  agent_id?: string;   // Isolate memories per AI agent/persona
  run_id?: string;     // Isolate memories per conversation/session
}

Validation rule: At least one of user_id, agent_id, or run_id MUST be provided on every API call. If none are provided, raise an error: "At least one of user_id, agent_id, or run_id must be provided". Filter construction: When scoping, build a metadata filter that matches ALL provided scope fields. For example, if both user_id="alice" and agent_id="helper" are provided, the vector store query filters for records where metadata.user_id == "alice" AND metadata.agent_id == "helper".

3. Memory Class — Public API

3.1 Constructor

class Memory {
  constructor(config?: MemoryConfig);
}

The constructor initializes three subsystems:

Vector store — configured via config.vector_store
LLM — configured via config.llm
Embedder — configured via config.embedder
History store — SQLite database (always initialized, path configurable)
Graph store (optional) — Neo4j, configured via config.graph_store

If no config is provided, use sensible defaults:

Vector store: In-memory (e.g., a simple array with brute-force cosine similarity)
LLM: OpenAI gpt-4o-mini
Embedder: OpenAI text-embedding-3-small (dimension 1536)
History: SQLite at ~/.memory/history.db

3.2 Method: `add(messages, ...scope, metadata?, filters?)`

Purpose: Extract facts from messages and store them as memories. Parameters:

Parameter	Type	Required	Description
messages	string \| Message[]	Yes	Conversation to extract facts from
user_id	string	See scope rules	User scope
agent_id	string	See scope rules	Agent scope
run_id	string	See scope rules	Session scope
metadata	Record<string, any>	No	Extra metadata to attach to each memory
filters	FilterExpression	No	Additional filters for searching existing memories
prompt	string	No	Custom system prompt override for fact extraction

**Returns**: `{ results: MemoryEvent[] }` — list of all ADD/UPDATE/DELETE/NONE events.

Algorithm (detailed in Section 4):

Parse messages into a flat conversation string
Call LLM with fact extraction prompt → get JSON array of facts
For each fact: embed → search existing (limit 5) → call LLM update decision → execute event
Log all events to history
If graph store configured, extract entities/relationships
Return events

3.3 Method: `search(query, ...scope, limit?, filters?)`

Purpose: Retrieve memories semantically similar to a query. Parameters:

Parameter	Type	Required	Description
query	string	Yes	Natural language search query
user_id	string	See scope rules	User scope
agent_id	string	See scope rules	Agent scope
run_id	string	See scope rules	Session scope
limit	number	No	Max results (default 100)
filters	FilterExpression	No	Additional metadata filters

**Returns**: `{ results: MemoryItem[] }` — sorted by descending similarity score.

Algorithm:

Generate embedding for query text
Build metadata filter from scope + any additional filters
Query vector store: vectorStore.search(embedding, limit, filters)
Return results with similarity scores

3.4 Method: `get(memory_id)`

Purpose: Retrieve a single memory by its ID. Returns: MemoryItem or null if not found.

3.5 Method: `get_all(...scope, limit?)`

Purpose: Retrieve all memories for a given scope. Parameters: Same scope parameters. limit defaults to 100.

**Returns**: `{ results: MemoryItem[] }` — all memories matching the scope filters.

Algorithm: Query vector store with scope-based metadata filter, no embedding (list all matching records).

3.6 Method: `update(memory_id, new_text)`

Purpose: Directly overwrite a memory’s text. Algorithm:

Retrieve existing memory by ID
Generate new embedding for new_text
Compute new hash: md5(new_text)
Update vector store record: text, embedding, hash, updated_at
Log UPDATE event to history

3.7 Method: `delete(memory_id)`

Purpose: Remove a single memory. Algorithm:

Retrieve existing memory by ID (for history logging)
Delete from vector store
Log DELETE event to history

3.8 Method: `delete_all(...scope)`

Purpose: Remove all memories for a given scope. Algorithm:

Retrieve all memories for scope via get_all()
Delete each from vector store
Log DELETE event for each to history

3.9 Method: `history(memory_id)`

Purpose: Retrieve the full audit trail for a specific memory. Returns: Array of history records, ordered by timestamp ascending:

interface HistoryRecord {
  id: string;           // History entry ID
  memory_id: string;    // The memory this event relates to
  event: "ADD" | "UPDATE" | "DELETE";
  old_value: string | null;
  new_value: string | null;
  timestamp: string;    // ISO 8601
  is_deleted: boolean;  // Whether memory was deleted in this event
}

3.10 Method: `reset()`

Purpose: Delete ALL memories and history. Nuclear option. Algorithm:

Drop and recreate vector store collection
Truncate history table (or drop and recreate)

4. LLM-Driven Memory Pipeline (Core Algorithm)

This is the heart of the system. The add() method orchestrates a multi-step pipeline that uses LLM calls to intelligently manage memories.

4.1 Step 1: Message Parsing

Convert input to a flat string for the LLM:

function parseMessages(input: string | Message[]): string {
  if (typeof input === "string") return input;
  return input
    .map(m => `${m.role}: ${m.content}`)
    .join("\n");
}

4.2 Step 2: Fact Extraction via LLM

Send the conversation to the LLM with a system prompt that instructs it to extract discrete facts. FACT_EXTRACTION_PROMPT (system message):

You are an expert at extracting structured, atomic facts from conversations.
Your task is to identify and extract key pieces of information from the given
conversation that would be useful to remember for future interactions.

Extract facts that fall into these categories:
1. Personal preferences (likes, dislikes, habits)
2. Biographical information (name, occupation, location, relationships)
3. Goals and intentions
4. Technical preferences and skills
5. Important dates, events, or milestones
6. Opinions and viewpoints
7. Project details and requirements
8. Communication preferences

Rules:
- Each fact must be a single, self-contained statement
- Be specific and include context where necessary
- Avoid duplicating information across facts
- Only extract information that is clearly stated or strongly implied
- Do not make assumptions beyond what is provided
- Format each fact as a concise, declarative sentence
- Use third person (e.g., "User prefers..." not "You prefer...")

Return a JSON array of strings. If no meaningful facts can be extracted,
return an empty array.

Example output:
["User's name is Alice", "User works as a software engineer at Acme Corp",
 "User prefers Python for backend development"]

User message: The parsed conversation string. LLM call configuration:

Temperature: 0 (deterministic extraction)
Response format: JSON mode (if available) or parse JSON from response text

Parse result: Extract JSON array from LLM response. If parsing fails, try to find JSON array pattern ([...]) in the response text. If still fails, return empty array. Custom prompt support: If the caller provides a prompt parameter to add(), use that as the system message instead of FACT_EXTRACTION_PROMPT. This allows domain-specific fact extraction.

4.3 Step 3: Per-Fact Processing Loop

For each extracted fact string, execute the following sub-steps:

4.3.1 Generate Embedding

embedding = embedder.embed(fact_text)

4.3.2 Search Existing Memories

Query the vector store for the top 5 most similar existing memories within the current scope:

existing = vectorStore.search(
  embedding = embedding,
  limit = 5,
  filters = buildScopeFilter(user_id, agent_id, run_id)
)

4.3.3 LLM Memory Update Decision

This is the critical decision-making step. Send the new fact AND the retrieved existing memories to the LLM, which decides what action to take. UPDATE_MEMORY_PROMPT (system message):

You are a memory management system. You will be given:
1. A new piece of information (the "new fact")
2. A list of existing memories that are potentially related

Your job is to decide what memory operations to perform. For each operation,
return a JSON object.

Possible operations:

1. ADD — The new fact contains genuinely new information not captured by any
   existing memory. Create a new memory.
   {"event": "ADD", "data": "the fact text to store"}

2. UPDATE — The new fact updates, corrects, refines, or supersedes an existing
   memory. Provide the existing memory ID and the new merged/updated text.
   {"event": "UPDATE", "id": "<existing_memory_id>", "old_memory": "<current text>",
    "data": "the updated fact text"}

3. DELETE — The new fact contradicts or invalidates an existing memory and
   the existing memory should be removed entirely.
   {"event": "DELETE", "id": "<existing_memory_id>", "old_memory": "<current text>"}

4. NONE — The new fact is already fully captured by existing memories and
   no action is needed.
   {"event": "NONE"}

Important rules:
- If the new fact contains information not present in ANY existing memory, use ADD
- If an existing memory says something similar but the new fact has updated info,
  use UPDATE (merge the information, keeping the more recent/accurate version)
- If the new fact directly contradicts an existing memory (e.g., "moved from NYC
  to SF" when existing says "lives in NYC"), UPDATE the existing memory
- If removing info is more appropriate than updating, use DELETE
- Only use NONE if the information is truly redundant
- You may return multiple operations if needed (e.g., UPDATE one memory AND ADD
  a new one)
- Always preserve important context and nuance when merging

Return a JSON array of operation objects.

User message construction:

New fact: {fact_text}

Existing memories:
{for each existing memory:}
  - ID: {memory.id}, Text: {memory.memory}
{end for}
{if no existing memories:}
  No existing memories found.
{end if}

LLM call configuration:

Temperature: 0
Response format: JSON

Parse result: Extract JSON array of event objects from the LLM response.

4.4 Step 4: Execute Memory Events

For each event returned by the update decision LLM: ADD event:

Generate a new UUID v4 for the memory
Compute embedding for the fact text
Compute hash: md5(fact_text)

4. Build metadata: `{ ...scope_fields, ...caller_metadata, hash: hash }`

5. Insert into vector store: `vectorStore.insert(id, embedding, { memory: fact_text, ...metadata })`

Log to history: historyStore.log(memory_id, "ADD", null, fact_text)

UPDATE event:

Get the target memory ID from the event
Compute new embedding for the updated text
Compute new hash

4. Update vector store record: `vectorStore.update(id, newEmbedding, { memory: updated_text, hash, updated_at })`

Log to history: historyStore.log(memory_id, "UPDATE", old_text, new_text)

DELETE event:

Get the target memory ID
Delete from vector store: vectorStore.delete(id)
Log to history: historyStore.log(memory_id, "DELETE", old_text, null, is_deleted=true)

NONE event: No action. Optionally log for analytics.

4.5 Step 5: Graph Memory Extraction (Optional)

If a graph store is configured, additionally extract entities and relationships.

Entity Extraction

Use an LLM tool call with the following tool definition: EXTRACT_ENTITIES_TOOL:

{
  "name": "extract_entities",
  "description": "Extract entities (people, organizations, concepts, locations, events) from the conversation",
  "parameters": {
    "type": "object",
    "properties": {
      "entities": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "name": { "type": "string", "description": "Entity name (normalized, title case)" },
            "type": { "type": "string", "enum": ["person", "organization", "concept", "location", "event", "technology", "product"] },
            "description": { "type": "string", "description": "Brief description of the entity" }
          },
          "required": ["name", "type"]
        }
      }
    }
  }
}

Relationship Extraction

EXTRACT_RELATIONS_TOOL:

{
  "name": "extract_relations",
  "description": "Extract relationships between entities",
  "parameters": {
    "type": "object",
    "properties": {
      "relations": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "source": { "type": "string", "description": "Source entity name" },
            "relation": { "type": "string", "description": "Relationship type (e.g., works_at, located_in, uses, knows)" },
            "target": { "type": "string", "description": "Target entity name" }
          },
          "required": ["source", "relation", "target"]
        }
      }
    }
  }
}

Graph Store Operations

For each extracted entity, perform an upsert in the graph database:

MERGE (e:Entity {name: $name})
SET e.type = $type, e.description = $description, e.updated_at = $now

For each extracted relationship:

MATCH (s:Entity {name: $source})
MATCH (t:Entity {name: $target})
MERGE (s)-[r:RELATES_TO {type: $relation}]->(t)
SET r.updated_at = $now

When searching with graph memory enabled, also query the graph for entities related to the search query and merge those results with vector search results. Use BM25 reranking if the graph store supports it to score relevance of graph-retrieved memories.

5. History Store (SQLite)

5.1 Schema

CREATE TABLE IF NOT EXISTS memory_history (
    id TEXT PRIMARY KEY,           -- UUID v4
    memory_id TEXT NOT NULL,       -- References the memory
    event TEXT NOT NULL,           -- 'ADD', 'UPDATE', 'DELETE'
    old_value TEXT,                -- Previous memory text (null for ADD)
    new_value TEXT,                -- New memory text (null for DELETE)
    timestamp TEXT NOT NULL,       -- ISO 8601
    is_deleted INTEGER DEFAULT 0,  -- 1 if this was a DELETE event

    -- Scope fields for queryability
    user_id TEXT,
    agent_id TEXT,
    run_id TEXT
);

CREATE INDEX IF NOT EXISTS idx_history_memory_id ON memory_history(memory_id);
CREATE INDEX IF NOT EXISTS idx_history_timestamp ON memory_history(timestamp);

5.2 Logging Function

function logHistory(memoryId, event, oldValue, newValue, scope, isDeleted = false):
    insert into memory_history values (
        uuid4(), memoryId, event, oldValue, newValue,
        new Date().toISOString(), isDeleted ? 1 : 0,
        scope.user_id, scope.agent_id, scope.run_id
    )

5.3 Query Function

function getHistory(memoryId):
    SELECT * FROM memory_history
    WHERE memory_id = ?
    ORDER BY timestamp ASC

6. Vector Store Abstraction

6.1 VectorStoreBase Interface

All vector store backends implement this interface:

interface VectorStoreBase {
  // Collection management
  createCollection(name: string, dimension: number): Promise<void>;
  deleteCollection(name: string): Promise<void>;
  listCollections(): Promise<string[]>;
  getCollectionInfo(name: string): Promise<{ name: string; count: number; dimension: number }>;

  // CRUD operations
  insert(
    collectionName: string,
    id: string,
    vector: number[],
    payload: Record<string, any>
  ): Promise<void>;

  search(
    collectionName: string,
    queryVector: number[],
    limit: number,
    filters?: FilterExpression
  ): Promise<Array<{ id: string; score: number; payload: Record<string, any> }>>;

  get(collectionName: string, id: string): Promise<{ id: string; payload: Record<string, any> } | null>;

  update(
    collectionName: string,
    id: string,
    vector?: number[],
    payload?: Record<string, any>
  ): Promise<void>;

  delete(collectionName: string, id: string): Promise<void>;

  list(
    collectionName: string,
    filters?: FilterExpression,
    limit?: number
  ): Promise<Array<{ id: string; payload: Record<string, any> }>>;

  reset(): Promise<void>;
}

6.2 In-Memory Vector Store (Default)

For development and testing, implement a simple in-memory store:

class InMemoryVectorStore implements VectorStoreBase {
  private collections: Map<string, Map<string, { vector: number[]; payload: Record<string, any> }>>;

  search(collectionName, queryVector, limit, filters?):
    // For each record in collection:
    //   1. If filters provided, check metadata matches
    //   2. Compute cosine similarity: dot(a,b) / (norm(a) * norm(b))
    //   3. Collect (id, score, payload)
    // Sort by score descending, return top `limit`
}

Cosine similarity:

function cosineSimilarity(a: number[], b: number[]): number {
  let dot = 0, normA = 0, normB = 0;
  for (let i = 0; i < a.length; i++) {
    dot += a[i] * b[i];
    normA += a[i] * a[i];
    normB += b[i] * b[i];
  }
  return dot / (Math.sqrt(normA) * Math.sqrt(normB));
}

6.3 Qdrant Backend

class QdrantVectorStore implements VectorStoreBase {
  constructor(config: { host: string; port: number; apiKey?: string; onDisk?: boolean });

  // Uses Qdrant REST API:
  // PUT /collections/{name} — createCollection
  // PUT /collections/{name}/points — insert (upsert)
  // POST /collections/{name}/points/search — search
  // GET /collections/{name}/points/{id} — get
  // POST /collections/{name}/points/delete — delete

  // Filter translation: Convert FilterExpression to Qdrant filter format
  // { must: [{ key: "user_id", match: { value: "alice" } }] }
}

6.4 PostgreSQL/pgvector Backend

class PgVectorStore implements VectorStoreBase {
  constructor(config: { connectionString: string; schema?: string });

  createCollection(name, dimension):
    // CREATE TABLE {name} (
    //   id TEXT PRIMARY KEY,
    //   vector vector({dimension}),
    //   payload JSONB,
    //   created_at TIMESTAMP DEFAULT NOW()
    // );
    // CREATE INDEX ON {name} USING ivfflat (vector vector_cosine_ops);

  search(collectionName, queryVector, limit, filters?):
    // SELECT id, payload, 1 - (vector <=> $1::vector) as score
    // FROM {collection}
    // WHERE {filter_clauses}
    // ORDER BY vector <=> $1::vector
    // LIMIT $2

  // Filter translation: Convert FilterExpression to SQL WHERE clauses
  // { field: "user_id", op: "eq", value: "alice" }
  //   → payload->>'user_id' = 'alice'
}

6.5 ChromaDB Backend

class ChromaVectorStore implements VectorStoreBase {
  constructor(config: { host: string; port: number; path?: string });

  // Uses ChromaDB client:
  // client.createCollection(name) / getCollection(name)
  // collection.add(ids, embeddings, metadatas, documents)
  // collection.query(queryEmbeddings, nResults, where)
  // collection.update(ids, embeddings, metadatas, documents)
  // collection.delete(ids)

  // Filter translation: Convert FilterExpression to ChromaDB where format
  // { "$and": [{ "user_id": { "$eq": "alice" } }] }
}

6.6 Additional Backend Targets

The interface should support these backends (implementation details vary but all implement VectorStoreBase):

Pinecone: REST API with namespaces for scoping
Weaviate: GraphQL-based queries with class schemas
Milvus: gRPC client with collection/partition model
FAISS: Local file-based index with separate metadata store
Elasticsearch: kNN search with dense_vector field type
Azure AI Search: REST API with vector search profiles
Redis: RediSearch with VECTOR field type (HNSW/FLAT)

7. Filter Expression System

7.1 Filter Syntax

Filters allow complex metadata queries across all vector store backends. The system defines a portable filter expression that is translated to each backend’s native syntax.

type FilterOperator = "eq" | "ne" | "gt" | "gte" | "lt" | "lte" |
                       "in" | "nin" | "contains" | "icontains";

type FilterCondition = {
  field: string;
  operator: FilterOperator;
  value: any;
};

type FilterExpression =
  | FilterCondition
  | { AND: FilterExpression[] }
  | { OR: FilterExpression[] }
  | { NOT: FilterExpression };

7.2 Operator Semantics

Operator	Meaning	Example

| eq | Equals | `{ field: "user_id", operator: "eq", value: "alice" }` |

| ne | Not equals | `{ field: "status", operator: "ne", value: "archived" }` |

| gt | Greater than | `{ field: "score", operator: "gt", value: 0.8 }` |

| gte | Greater or equal | `{ field: "created_at", operator: "gte", value: "2024-01-01" }` |

| lt | Less than | `{ field: "priority", operator: "lt", value: 5 }` |

| lte | Less or equal | `{ field: "age", operator: "lte", value: 30 }` |

| in | Value in set | `{ field: "tag", operator: "in", value: ["work", "personal"] }` |

| nin | Value not in set | `{ field: "tag", operator: "nin", value: ["spam"] }` |

| contains | String contains (case-sensitive) | `{ field: "memory", operator: "contains", value: "Python" }` |

| icontains | String contains (case-insensitive) | `{ field: "memory", operator: "icontains", value: "python" }` |

7.3 Composition

// Example: Find memories for user "alice" that mention either "Python" or "JavaScript"
const filter: FilterExpression = {
  AND: [
    { field: "user_id", operator: "eq", value: "alice" },
    { OR: [
      { field: "memory", operator: "icontains", value: "Python" },
      { field: "memory", operator: "icontains", value: "JavaScript" }
    ]}
  ]
};

7.4 Backend Translation

Each vector store backend implements a translateFilter(expr: FilterExpression) method that converts the portable expression to the backend’s native format. For example:

- **Qdrant**: `{ must: [{ key: "field", match: { value: "x" } }] }`

- **ChromaDB**: `{ "$and": [{ "field": { "$eq": "x" } }] }`

pgvector: WHERE payload->>'field' = 'x'

- **Pinecone**: `{ "field": { "$eq": "x" } }`

8. Configuration System

8.1 MemoryConfig

interface MemoryConfig {
  // Vector store backend configuration
  vector_store?: {
    provider: "memory" | "qdrant" | "chroma" | "pgvector" | "pinecone" |
              "weaviate" | "milvus" | "faiss" | "elasticsearch" | "redis";
    config: Record<string, any>;  // Provider-specific connection config
    collection_name?: string;     // Default: "memories"
  };

  // LLM configuration (for fact extraction and update decisions)
  llm?: {
    provider: "openai" | "anthropic" | "google" | "ollama" | "azure_openai";
    config: {
      model: string;
      api_key?: string;        // Falls back to env var (OPENAI_API_KEY, etc.)
      temperature?: number;    // Default: 0
      max_tokens?: number;     // Default: 2000
      base_url?: string;       // For custom endpoints
    };
  };

  // Embedding model configuration
  embedder?: {
    provider: "openai" | "ollama" | "huggingface" | "azure_openai" | "google";
    config: {
      model: string;           // e.g., "text-embedding-3-small"
      api_key?: string;
      dimensions?: number;     // Output dimension (default: 1536 for OpenAI)
    };
  };

  // Graph memory (optional)
  graph_store?: {
    provider: "neo4j";
    config: {
      url: string;             // bolt://localhost:7687
      username: string;
      password: string;
    };
  };

  // History store
  history?: {
    db_path?: string;          // SQLite path, default: ~/.memory/history.db
  };

  // Custom prompts (override defaults)
  custom_prompts?: {
    fact_extraction?: string;  // Override FACT_EXTRACTION_PROMPT
    update_decision?: string;  // Override UPDATE_MEMORY_PROMPT
  };

  // Versioning
  version?: "v1.0" | "v1.1";  // API version, affects behavior
}

8.2 Environment Variable Fallbacks

The system checks environment variables as fallbacks for API keys and configuration:

Env Variable	Purpose
`OPENAI_API_KEY`	OpenAI LLM and embedder
`ANTHROPIC_API_KEY`	Anthropic LLM
`GOOGLE_API_KEY`	Google LLM and embedder
`QDRANT_HOST`, `QDRANT_PORT`, `QDRANT_API_KEY`	Qdrant connection
`CHROMA_HOST`, `CHROMA_PORT`	ChromaDB connection
`DATABASE_URL`	PostgreSQL/pgvector connection
`NEO4J_URL`, `NEO4J_USER`, `NEO4J_PASSWORD`	Neo4j graph store
`REDIS_URL`	Redis vector store

9. Embedder Abstraction

9.1 EmbedderBase Interface

interface EmbedderBase {
  embed(text: string): Promise<number[]>;
  embedBatch(texts: string[]): Promise<number[][]>;
  getDimension(): number;
}

9.2 OpenAI Embedder

class OpenAIEmbedder implements EmbedderBase {
  constructor(config: { model: string; apiKey: string; dimensions?: number });

  async embed(text: string): Promise<number[]> {
    // POST https://api.openai.com/v1/embeddings
    // { model: this.model, input: text, dimensions: this.dimensions }
    // Return response.data[0].embedding
  }

  async embedBatch(texts: string[]): Promise<number[][]> {
    // Same endpoint accepts array input
    // Return response.data.map(d => d.embedding)
  }
}

9.3 Ollama Embedder (Local)

class OllamaEmbedder implements EmbedderBase {
  constructor(config: { model: string; baseUrl?: string });

  async embed(text: string): Promise<number[]> {
    // POST http://localhost:11434/api/embeddings
    // { model: this.model, prompt: text }
    // Return response.embedding
  }
}

10. LLM Abstraction

10.1 LLMBase Interface

interface LLMBase {
  generate(
    systemPrompt: string,
    userMessage: string,
    options?: { temperature?: number; maxTokens?: number; responseFormat?: "json" | "text"; tools?: ToolDef[] }
  ): Promise<string>;

  generateWithToolCalls(
    systemPrompt: string,
    userMessage: string,
    tools: ToolDef[],
    options?: { temperature?: number }
  ): Promise<{ content?: string; toolCalls?: Array<{ name: string; arguments: Record<string, any> }> }>;
}

10.2 Provider Implementations

Each LLM provider maps to its respective API:

- **OpenAI**: `POST /v1/chat/completions` with `response_format: { type: "json_object" }` when JSON mode requested

Anthropic: POST /v1/messages with tool use for structured extraction
Google: Gemini API with JSON schema in generationConfig
Ollama: POST /api/chat with local models

11. Async API

11.1 AsyncMemory Class

Provide an async variant that wraps the synchronous Memory class (or implements natively with async I/O):

class AsyncMemory {
  constructor(config?: MemoryConfig);

  async add(messages, ...scope): Promise<{ results: MemoryEvent[] }>;
  async search(query, ...scope): Promise<{ results: MemoryItem[] }>;
  async get(memoryId): Promise<MemoryItem | null>;
  async getAll(...scope): Promise<{ results: MemoryItem[] }>;
  async update(memoryId, newText): Promise<void>;
  async delete(memoryId): Promise<void>;
  async deleteAll(...scope): Promise<void>;
  async history(memoryId): Promise<HistoryRecord[]>;
  async reset(): Promise<void>;
}

In languages with native async (Python asyncio, JavaScript), the async class should use async HTTP clients (aiohttp, fetch) for LLM and vector store calls rather than blocking.

12. REST API Wrapper (Optional Server Mode)

For serving memory as a standalone service:

12.1 Endpoints

POST   /v1/memories/          — Add memories (body: { messages, user_id?, agent_id?, run_id?, metadata? })
GET    /v1/memories/search/   — Search (query: q, user_id, limit)
GET    /v1/memories/:id/      — Get single memory
GET    /v1/memories/           — Get all memories (query: user_id, agent_id, run_id, limit)
PUT    /v1/memories/:id/      — Update memory (body: { text })
DELETE /v1/memories/:id/      — Delete memory
DELETE /v1/memories/           — Delete all (query: user_id, agent_id, run_id)
GET    /v1/memories/:id/history/ — Get history
POST   /v1/reset/             — Reset all

POST   /v1/entities/          — Get graph entities for scope
GET    /v1/entities/:name/relations/ — Get entity relationships

12.2 Authentication

Bearer token authentication via Authorization: Bearer <token> header. Tokens can be project-scoped API keys.

13. Usage Examples

13.1 Basic Usage

const memory = new Memory();

// Add memories from a conversation
const result = await memory.add(
  [
    { role: "user", content: "Hi, I'm Alice. I work at Acme Corp as a data scientist." },
    { role: "assistant", content: "Nice to meet you, Alice! What kind of data science work do you do?" },
    { role: "user", content: "Mostly NLP and recommendation systems. I prefer PyTorch over TensorFlow." }
  ],
  { user_id: "alice" }
);

console.log(result.results);
// [
//   { event: "ADD", id: "abc-123", new_memory: "User's name is Alice" },
//   { event: "ADD", id: "def-456", new_memory: "User works at Acme Corp as a data scientist" },
//   { event: "ADD", id: "ghi-789", new_memory: "User specializes in NLP and recommendation systems" },
//   { event: "ADD", id: "jkl-012", new_memory: "User prefers PyTorch over TensorFlow" }
// ]

// Search memories
const searchResults = await memory.search("What does Alice do?", { user_id: "alice" });
// Returns sorted by relevance: work info, specialization, etc.

// Later conversation updates a memory
await memory.add(
  [
    { role: "user", content: "I just switched jobs. I'm now at BigTech Inc." }
  ],
  { user_id: "alice" }
);
// Result: { event: "UPDATE", id: "def-456",
//           old_memory: "User works at Acme Corp as a data scientist",
//           new_memory: "User works at BigTech Inc as a data scientist" }

// Check history
const history = await memory.history("def-456");
// Shows ADD (original) then UPDATE (job change)

13.2 Multi-Scope Usage

// Agent-specific memories
await memory.add(messages, { user_id: "alice", agent_id: "code-helper" });

// Session-scoped (ephemeral, per conversation)
await memory.add(messages, { user_id: "alice", run_id: "session-20240315" });

// Search across a specific agent's memories for a user
const results = await memory.search("Python frameworks", {
  user_id: "alice",
  agent_id: "code-helper"
});

13.3 Custom Configuration

const memory = new Memory({
  vector_store: {
    provider: "qdrant",
    config: { host: "localhost", port: 6333 }
  },
  llm: {
    provider: "anthropic",
    config: { model: "claude-sonnet-4-20250514", api_key: process.env.ANTHROPIC_API_KEY }
  },
  embedder: {
    provider: "openai",
    config: { model: "text-embedding-3-small", dimensions: 1536 }
  },
  graph_store: {
    provider: "neo4j",
    config: { url: "bolt://localhost:7687", username: "neo4j", password: "password" }
  }
});

13.4 With Filters

// Search with metadata filters
const results = await memory.search("project deadlines", {
  user_id: "alice",
  filters: {
    AND: [
      { field: "category", operator: "eq", value: "work" },
      { field: "created_at", operator: "gte", value: "2024-01-01" }
    ]
  }
});

14. Error Handling

14.1 Error Types

class MemoryError extends Error {
  constructor(message: string, public code: string);
}

// Specific errors
class ScopeError extends MemoryError {}       // Missing user_id/agent_id/run_id
class VectorStoreError extends MemoryError {}  // Backend connection/query failures
class LLMError extends MemoryError {}          // LLM API failures
class EmbeddingError extends MemoryError {}    // Embedding API failures
class NotFoundError extends MemoryError {}     // Memory ID not found

14.2 Retry Logic

LLM and embedding calls should implement exponential backoff retry:

function withRetry(fn, maxRetries = 3, baseDelay = 1000):
  for attempt in 0..maxRetries:
    try:
      return await fn()
    catch error:
      if attempt == maxRetries: throw error
      if error is rate_limit: delay = baseDelay * 2^attempt
      else: throw error  // Don't retry non-transient errors
      await sleep(delay)

14.3 Graceful Degradation

If fact extraction LLM call fails, return empty results (don’t crash)
If embedding call fails for one fact, skip that fact and continue with others
If history DB is unavailable, log warning but continue with memory operations
If graph store is unavailable, skip graph extraction but complete vector operations

15. Behavioral Test Cases

Memory CRUD

1. **Add single fact** — `add("My name is Bob", { user_id: "bob" })` → returns one ADD event with memory text "User's name is Bob"

2. **Add conversation** — `add([{role:"user",content:"..."},{role:"assistant",content:"..."}])` → extracts multiple facts, returns multiple ADD events

3. **Add with empty input** — `add("hello", { user_id: "x" })` → may return empty results if no extractable facts

Search by semantics — After adding “User likes Python”, search("programming languages") → returns the Python memory with score > 0.5

5. **Search with limit** — `search(query, { limit: 3 })` → returns at most 3 results

Get by ID — After ADD, get(returned_id) → returns the memory item
Get nonexistent — get("fake-id") → returns null

8. **Get all for scope** — After adding 3 memories for user "alice", `get_all({ user_id: "alice" })` → returns all 3

Update overwrites — update(id, "new text") → get(id).memory equals “new text”
Update changes hash — After update, hash should equal md5("new text")
Delete removes — delete(id) → get(id) returns null

12. **Delete all for scope** — `delete_all({ user_id: "alice" })` → `get_all({ user_id: "alice" })` returns empty

Reset clears everything — reset() → all collections and history are empty

Memory Update Intelligence

Deduplication — Add “User likes Python” then add “User likes Python” again → second call returns NONE event
Update on contradiction — Add “User lives in NYC” then add “User moved to San Francisco” → returns UPDATE event changing NYC to SF
Merge on refinement — Add “User works in tech” then add “User works at Google as a senior engineer” → returns UPDATE with merged, more specific memory
Delete on negation — Add “User is vegetarian” then add “User started eating meat again” → returns DELETE or UPDATE removing vegetarian claim
Multiple events per add — Single conversation may produce multiple ADD + UPDATE events in one call

Scoping

Scope isolation — Memories added with user_id: "alice" are NOT returned when searching with user_id: "bob"

20. **Multi-scope filter** — Memories added with `{ user_id: "alice", agent_id: "helper" }` require BOTH fields to match in queries

Missing scope error — Calling add(msg, {}) with no scope fields → throws ScopeError
Run ID isolation — Memories for run_id: "session-1" are separate from run_id: "session-2"

History

ADD creates history — After add(), history(memory_id) returns one record with event “ADD”
UPDATE appends history — After update(), history has ADD then UPDATE records
DELETE marks in history — After delete(), history shows DELETE with is_deleted: true
History ordered by time — History records are returned in chronological order

Filters

27. **Equals filter** — `search(query, { filters: { field: "tag", operator: "eq", value: "work" } })` → only returns memories with tag "work"

In filter — operator: "in", value: ["a","b"] matches records where field is “a” or “b”
AND composition — Both conditions must match
OR composition — Either condition matches
NOT negation — Excludes matching records
Contains string — operator: "contains", value: "Python" matches “User likes Python for ML”

Graph Memory

Entity extraction — After adding conversation about “Alice at Google”, graph contains entities “Alice” (person) and “Google” (organization)
Relationship extraction — Graph contains relationship “Alice” —works_at—> “Google”
Graph-enhanced search — Search that matches a graph entity also returns related memories from connected entities

Error Handling

LLM failure graceful — If LLM API is down, add() returns empty results (no crash)
Partial failure continues — If embedding fails for one of 3 facts, the other 2 are still processed
Invalid scope rejected — Empty scope object throws descriptive error

Custom Configuration

Custom extraction prompt — Providing prompt parameter to add() changes the fact extraction behavior
Custom LLM provider — Memory works with Anthropic/Google/Ollama as LLM backend
Custom vector store — Memory works with Qdrant/pgvector/ChromaDB backends
Default config works — new Memory() with no config uses in-memory store and OpenAI defaults

16. Implementation Priorities

Phase 1: Core (MVP)

Memory class with add/search/get/get_all/update/delete
In-memory vector store
OpenAI LLM + embedder
SQLite history
Fact extraction + update decision pipeline

Phase 2: Production Backends

Qdrant vector store backend
pgvector backend
ChromaDB backend
Filter expression system with backend translation

Phase 3: Advanced Features

Graph memory (Neo4j)
Async API
REST server wrapper
Additional LLM providers (Anthropic, Google, Ollama)
Additional vector store backends

Phase 4: Optimization

Batch embedding for multiple facts
Connection pooling for vector stores
LLM response caching for identical conversations
Configurable concurrency for parallel fact processing

Clean Room Specification: Full Stack AI Memory Platform with Hybrid SearchPurpose of This Document This document specifies the architecture for a full stack AI memory platform that ingests, chunks, embeds, and retrieves content fro...

​Clean-Room Specification: Lightweight Fact-Based AI Memory API

​Purpose of This Document

​1. System Overview

​1.1 Core Concept

​1.2 High-Level Architecture

​1.3 Data Flow Summary

​2. Data Model

​2.1 MemoryItem

​2.2 MemoryEvent

​2.3 Message Format

​2.4 Scoping Model

​3. Memory Class — Public API

​3.1 Constructor

​3.2 Method: add(messages, ...scope, metadata?, filters?)

​3.3 Method: search(query, ...scope, limit?, filters?)

​3.4 Method: get(memory_id)

​3.5 Method: get_all(...scope, limit?)

​3.6 Method: update(memory_id, new_text)

​3.7 Method: delete(memory_id)

​3.8 Method: delete_all(...scope)

​3.9 Method: history(memory_id)

​3.10 Method: reset()

​4. LLM-Driven Memory Pipeline (Core Algorithm)

​4.1 Step 1: Message Parsing

​4.2 Step 2: Fact Extraction via LLM

​4.3 Step 3: Per-Fact Processing Loop

​4.3.1 Generate Embedding

​4.3.2 Search Existing Memories

​4.3.3 LLM Memory Update Decision

​4.4 Step 4: Execute Memory Events

​4.5 Step 5: Graph Memory Extraction (Optional)

​Entity Extraction

​Relationship Extraction

​Graph Store Operations

​5. History Store (SQLite)

​5.1 Schema

​5.2 Logging Function

​5.3 Query Function

​6. Vector Store Abstraction

​6.1 VectorStoreBase Interface

​6.2 In-Memory Vector Store (Default)

​6.3 Qdrant Backend

​6.4 PostgreSQL/pgvector Backend

​6.5 ChromaDB Backend

​6.6 Additional Backend Targets

​7. Filter Expression System

​7.1 Filter Syntax

​7.2 Operator Semantics

​7.3 Composition

​7.4 Backend Translation

​8. Configuration System

​8.1 MemoryConfig

​8.2 Environment Variable Fallbacks

​9. Embedder Abstraction

​9.1 EmbedderBase Interface

​9.2 OpenAI Embedder

​9.3 Ollama Embedder (Local)

​10. LLM Abstraction

​10.1 LLMBase Interface

​10.2 Provider Implementations

​11. Async API

​11.1 AsyncMemory Class

​12. REST API Wrapper (Optional Server Mode)

​12.1 Endpoints

​12.2 Authentication

​13. Usage Examples

​13.1 Basic Usage

​13.2 Multi-Scope Usage

​13.3 Custom Configuration

​13.4 With Filters

​14. Error Handling

​14.1 Error Types

​14.2 Retry Logic

​14.3 Graceful Degradation

​15. Behavioral Test Cases

​Memory CRUD

​Memory Update Intelligence

​Scoping

​History

​Filters

Clean-Room Specification: Lightweight Fact-Based AI Memory API

Purpose of This Document

1. System Overview

1.1 Core Concept

1.2 High-Level Architecture

1.3 Data Flow Summary

2. Data Model

2.1 MemoryItem

2.2 MemoryEvent

2.3 Message Format

2.4 Scoping Model

3. Memory Class — Public API

3.1 Constructor

3.2 Method: `add(messages, ...scope, metadata?, filters?)`

3.3 Method: `search(query, ...scope, limit?, filters?)`

3.4 Method: `get(memory_id)`

3.5 Method: `get_all(...scope, limit?)`

3.6 Method: `update(memory_id, new_text)`

3.7 Method: `delete(memory_id)`

3.8 Method: `delete_all(...scope)`

3.9 Method: `history(memory_id)`

3.10 Method: `reset()`

4. LLM-Driven Memory Pipeline (Core Algorithm)

4.1 Step 1: Message Parsing

4.2 Step 2: Fact Extraction via LLM

4.3 Step 3: Per-Fact Processing Loop

4.3.1 Generate Embedding

4.3.2 Search Existing Memories

4.3.3 LLM Memory Update Decision

4.4 Step 4: Execute Memory Events

4.5 Step 5: Graph Memory Extraction (Optional)

Entity Extraction

Relationship Extraction

Graph Store Operations

5. History Store (SQLite)

5.1 Schema

5.2 Logging Function

5.3 Query Function

6. Vector Store Abstraction

6.1 VectorStoreBase Interface

6.2 In-Memory Vector Store (Default)

6.3 Qdrant Backend

6.4 PostgreSQL/pgvector Backend

6.5 ChromaDB Backend

6.6 Additional Backend Targets

7. Filter Expression System

7.1 Filter Syntax

7.2 Operator Semantics

7.3 Composition

7.4 Backend Translation

8. Configuration System

8.1 MemoryConfig

8.2 Environment Variable Fallbacks

9. Embedder Abstraction

9.1 EmbedderBase Interface

9.2 OpenAI Embedder

9.3 Ollama Embedder (Local)

10. LLM Abstraction

10.1 LLMBase Interface

10.2 Provider Implementations

11. Async API

11.1 AsyncMemory Class

12. REST API Wrapper (Optional Server Mode)

12.1 Endpoints

12.2 Authentication

13. Usage Examples

13.1 Basic Usage

13.2 Multi-Scope Usage

13.3 Custom Configuration

13.4 With Filters

14. Error Handling

14.1 Error Types

14.2 Retry Logic

14.3 Graceful Degradation

15. Behavioral Test Cases

Memory CRUD

Memory Update Intelligence

Scoping

History

Filters