Normalized for Mintlify from
knowledge-base/neurigraph-memory-architecture/neurigraph-tool-references/11-Full-Stack-AI-Memory-Platform-Hybrid-Search.mdx.Clean-Room Specification: Full-Stack AI Memory Platform with Hybrid Search
Purpose of This Document
This document specifies the architecture for a full-stack AI memory platform that ingests, chunks, embeds, and retrieves content from multiple sources using hybrid search (combining vector similarity with full-text keyword matching and recency scoring). The platform includes a web application for managing memories and spaces, a browser extension for capturing content from web pages, and an MCP (Model Context Protocol) server for integration with AI assistants. The system handles diverse content types (text, markdown, HTML, PDFs, images, video, code), organizes memories into hierarchical spaces, supports memory versioning and auto-forgetting, and provides a REST API for programmatic access. This specification enables independent implementation from scratch.1. System Overview
1.1 Core Concept
This platform acts as a second brain — users save content from anywhere (browser, API, integrations), the system processes and indexes it, and AI assistants can later recall relevant memories through natural language queries. The key differentiator is hybrid search: combining semantic vector similarity with traditional full-text search and time-based recency scoring for more accurate retrieval than vector-only approaches.1.2 High-Level Architecture
1.3 Technology Stack
| Layer | Technology | Purpose |
|---|---|---|
| Web App | Next.js (App Router) | User-facing dashboard |
| Browser Extension | WXT (cross-browser framework) | Content capture from web pages |
| MCP Server | Node.js / Cloudflare Workers | AI assistant integration |
| API | REST over HTTPS | Programmatic access |
| Relational DB | PostgreSQL | Users, documents, spaces, metadata |
| Vector DB | Qdrant | Embeddings and similarity search |
| Edge Cache | Key-Value store (Redis/KV) | Frequently accessed results |
| Embeddings | OpenAI text-embedding-3-small | Vector generation |
| LLM | OpenAI GPT-4o-mini | Summarization, metadata extraction |
2. Data Model
2.1 Core Entities
Organization
Project (formerly Space)
Projects organize memories into logical groups. Users can have multiple projects.Document
The top-level content unit. A document represents a single piece of saved content (a web page, a note, an uploaded file).Memory
A processed, searchable representation of a document or document section. Multiple memories can come from a single document (one per chunk).Chunk (Vector Store Record)
Stored in Qdrant with the embedding vector:2.2 PostgreSQL Schema
3. Ingestion Pipeline
3.1 Overview
When content enters the system (via API, browser extension, or integration sync), it flows through a multi-stage pipeline:3.2 Content Extraction
Different content types require different extraction strategies:| Content Type | Extraction Method |
|---|---|
| text/markdown | Pass through (strip excessive whitespace) |
| HTML | Parse with DOM parser, extract main content (strip nav, footer, scripts), convert to markdown |
| Extract text via PDF parser (pdfjs-dist or similar), preserve page boundaries | |
| Image | OCR via vision model (send image to GPT-4o with “Extract all text from this image”) |
| Video | Transcription via Whisper API or similar speech-to-text |
| Code | Preserve as-is with language detection, optionally parse AST for structure |
| JSON | Pretty-print and extract human-readable fields |
| Tweet/Social | Extract text, author, date, engagement metrics from structured data |
- Parse HTML into DOM
- Remove
<script>,<style>,<nav>,<footer>,<header>elements - Attempt to find
<article>or<main>element — if found, use its content - If no article/main, use
<body>content - Convert remaining HTML to markdown (preserve links, headings, lists, bold/italic)
- Collapse multiple blank lines into single blank line
- Trim to reasonable length (configurable max, default 100,000 characters)
3.3 Deduplication
Before processing, check if content already exists:- Compute SHA-256 hash of the cleaned content
- Query PostgreSQL:
SELECT id FROM documents WHERE content_hash = $1 AND project_id = $2 - If match found:
- If
updates_memory_idis set, treat as a version update (link to previous) - Otherwise, skip ingestion and return the existing document ID
- If
- If no match, proceed with ingestion
3.4 Chunking
Split content into chunks suitable for embedding (target ~512 tokens per chunk). Chunking algorithm:function, class, def, fn, etc.).
3.5 Embedding Generation
For each chunk, generate an embedding vector:3.6 Summarization
Generate a summary for the entire document:3.7 Metadata Extraction
Use LLM to extract structured metadata:3.8 Storage
After processing, store in all three data stores:- PostgreSQL: Insert
documentandmemory(one per chunk) records - Qdrant: Upsert vectors with payload (one point per chunk)
- Edge Cache: Invalidate any cached results for the affected project
4. Hybrid Search Algorithm
4.1 Overview
The search system combines three signals:- vector_score: Cosine similarity from Qdrant (0 to 1)
- text_score: Full-text keyword match score (0 to 1, normalized)
- recency_bonus: Time-decay bonus for newer content (0 to 0.1)
4.2 Vector Search
Query Qdrant with the embedding of the search query:4.3 Full-Text Search
Query Qdrant’s built-in full-text search (or a separate text index) with the raw query string:normalized = score / max_score_in_batch.
4.4 Recency Bonus
Calculate a time-decay bonus that gives a slight edge to more recent content:- Created today: +0.1 bonus
- Created 30 days ago: +0.037 bonus
- Created 90 days ago: +0.005 bonus
- Created 1 year ago: ~0 bonus
4.5 Score Fusion
Merge results from vector and text search:4.6 Result Grouping
After scoring, group results by document to avoid returning multiple chunks from the same document:4.7 Edge Caching
Cache frequently queried results:5. REST API (v3)
5.1 Authentication
All API requests require a Bearer token:5.2 Endpoints
POST /v3/memory — Save Content
Save new content to the memory store. Request:content: required, string, min length 1, max length 500000content_type: optional, must be one of ContentType enum valuesproject_id: required if user has multiple projects, otherwise uses defaultforget_after: optional, must be ISO 8601 future date
POST /v3/recall — Search Memories
Retrieve relevant memories using hybrid search. Request:GET /v3/projects — List Projects
POST /v3/projects — Create Project
GET /v3/documents/:id — Get Document
Returns full document with all chunks.DELETE /v3/documents/:id — Delete Document
Removes document, all associated memories, and all Qdrant vectors.GET /v3/memory-graph — Knowledge Graph View
Returns entity-relationship data for visualization:GET /v3/whoami — Get Current User
6. MCP Server
6.1 Overview
The MCP server allows AI assistants (Claude, ChatGPT) to read and write memories through the Model Context Protocol. It runs as a separate process communicating via JSON-RPC over stdin/stdout.6.2 Tool Definitions
memory — Save Content
recall — Search Memories
listProjects — List Available Projects
memory-graph — Get Knowledge Graph
whoAmI — Get User Info
6.3 MCP Server Implementation
6.4 MCP Configuration
Users configure the MCP server in their AI assistant’s settings:7. Browser Extension
7.1 Overview
The browser extension lets users save web content to memory with one click. Built with WXT (cross-browser extension framework) targeting Chrome, Firefox, and Safari.7.2 Architecture
7.3 Content Scripts
Platform-specific content scripts for enhanced extraction:| Platform | Script | Extraction Strategy |
|---|---|---|
| Twitter/X | twitter.content.ts | Extract tweet text, author, media, thread context |
| GitHub | github.content.ts | Extract README, issue/PR body, code files |
| YouTube | youtube.content.ts | Extract title, description, transcript (if available) |
| Google Docs | gdocs.content.ts | Extract document content via DOM |
| Default | generic.content.ts | Readability-based article extraction |
7.4 Popup UI
A small popup when the user clicks the extension icon:7.5 Context Menu Integration
Add a right-click context menu item:8. Web Application
8.1 Overview
The web app provides a dashboard for managing memories, browsing projects, searching, and configuring integrations.8.2 Route Structure (Next.js App Router)
8.3 Dashboard Features
- Recent memories: Last 20 saved items with titles, summaries, and timestamps
- Project list: All projects with document counts
- Search bar: Global hybrid search
- Memory graph: Interactive visualization of topic connections (using D3.js or react-force-graph)
- Stats: Total memories, memories this week, storage used
8.4 Memory Detail View
When viewing a single document:- Full content with syntax highlighting for code
- Metadata sidebar (tags, source URL, content type, dates)
- Version history (if
updates_memory_idchain exists) - Related memories (semantic neighbors)
- Edit/delete controls
9. Memory Versioning
9.1 Version Chain
When content at the same URL or with the same title is saved again, the system can create a version chain:9.2 Version Navigation
The API returns the version chain when querying a document:10. Auto-Forgetting
10.1 Mechanism
Documents with aforget_after timestamp are automatically deleted by a background job:
10.2 User Controls
Users can set forget_after via:- API:
"forget_after": "2025-06-01T00:00:00Z"in POST /v3/memory - Browser extension: “Auto-forget after 30 days” checkbox
- Web UI: Edit document settings
11. Platform Integrations
11.1 Connection Model
External platform integrations sync content into the memory store. Each integration uses OAuth2 for authentication and periodic syncing.11.2 Supported Platforms
| Platform | Sync Strategy | Content Extracted |
|---|---|---|
| Google Drive | Incremental (changes API) | Document text, spreadsheet data |
| Notion | Incremental (search API) | Page content, database entries |
| GitHub | Webhook + periodic | README, issues, PRs, code files |
| Twitter/X | Bookmarks API | Bookmarked tweet text and threads |
| Slack | Saved messages API | Saved/bookmarked messages |
11.3 Sync Architecture
- Fetch new/changed items since
connection.last_synced_at - For each item, run through the ingestion pipeline
- Update
connection.last_synced_at
12. Error Handling and Reliability
12.1 API Validation
All API inputs are validated with Zod schemas:12.2 Retry Strategy
External API calls (embedding, LLM, Qdrant) use retry with exponential backoff:12.3 Ingestion Queue
For high-volume ingestion, use a job queue (Bull/BullMQ with Redis, or a simple database-backed queue):13. Behavioral Test Cases
Ingestion
- Save plain text — POST /v3/memory with text content → returns document with summary and chunks
- Save HTML — HTML content is cleaned, scripts/nav removed, converted to searchable text
- Save PDF — PDF content is extracted to text, chunked, and embedded
- Save code — Code is preserved with language metadata, chunked by function boundaries
- Deduplication — Saving identical content twice (same hash) → second call returns existing document
- Version chain — Saving updated content for same URL → creates linked version
- Chunking respects boundaries — Long document is split at section/paragraph breaks, not mid-sentence
- Chunk overlap — Adjacent chunks share ~50 tokens of overlap for context continuity
- Metadata extraction — LLM extracts tags, category, language from content automatically
- Summary generation — Every saved document gets an LLM-generated 2-3 sentence summary
Hybrid Search
- Vector-only match — Query semantically similar but no keyword overlap → returns results (vector score carries it)
- Keyword-only match — Query with exact keyword match but different semantic meaning → returns results (text score)
- Hybrid boost — Result with both vector AND text match scores higher than either alone
- Recency bonus — Between two equally relevant results, the newer one scores slightly higher
- Score formula —
final_score = (vector × 0.6) + (text × 0.4) + recency_bonusis correctly computed - Result grouping — Multiple chunks from same document are grouped, best chunk score used for ranking
- Project scoping — Search in project A does not return results from project B
- Cross-project search — Searching without project_id returns results from all user projects
- Empty query — Returns most recent memories (ordered by created_at desc)
- Filter by content type — Can filter search to only HTML, only code, etc.
Edge Caching
- Cache hit — Identical query within 5 minutes returns cached results (faster response)
- Cache invalidation — After adding new content to a project, cache is cleared for that project
- Cache miss — New query goes to vector + text search (slower response)
MCP Server
- memory tool — Saves content and returns confirmation with title and chunk count
- recall tool — Returns formatted search results with titles, scores, and content previews
- listProjects tool — Returns all user projects with names and document counts
- memory-graph tool — Returns nodes and edges for knowledge visualization
- whoAmI tool — Returns user info and organization
Browser Extension
- Save full page — Clicking extension icon saves entire page content
- Save selection — Right-click selected text → saves only selected text
- Twitter extraction — On Twitter, extracts tweet text, author, and thread context
- GitHub extraction — On GitHub, extracts README or issue body with formatting preserved
- Project selection — User can choose target project from popup dropdown
Memory Management
- Auto-forget — Document with
forget_afterin the past is automatically deleted by cleanup job - Manual delete — DELETE /v3/documents/:id removes document, chunks, and vectors
- Version history — Document with updates_memory_id chain shows full version list
- Last accessed tracking — Search results update
last_accessed_aton returned documents
API Validation
- Missing content — POST /v3/memory with empty content → 400 error with description
- Invalid project ID — Non-existent project_id → 404 error
- Rate limiting — More than 100 requests per minute → 429 Too Many Requests
- Auth required — Request without Bearer token → 401 Unauthorized
Error Recovery
- Embedding API failure — If OpenAI embedding fails, retry 3 times then queue for later
- Qdrant unavailable — If vector DB is down, save to PostgreSQL and queue for indexing when available
- Partial ingestion — If 3 of 5 chunks embed successfully, save those 3 and retry the other 2
14. Implementation Priorities
Phase 1: Core Platform (MVP)
- PostgreSQL schema + Qdrant collection setup
- Ingestion pipeline (extract → chunk → embed → store)
- Hybrid search with score fusion
- REST API (memory + recall + projects endpoints)
- Basic web app (dashboard, search, project list)
Phase 2: AI Integration
- MCP server (memory + recall tools)
- Summary and metadata extraction
- Knowledge graph view
Phase 3: Browser Extension
- WXT extension with popup UI
- Generic content extraction
- Platform-specific scripts (Twitter, GitHub)
- Context menu integration
Phase 4: Advanced Features
- Memory versioning chain
- Auto-forgetting cleanup job
- Edge caching layer
- Platform integrations (Google Drive, Notion, GitHub sync)
Phase 5: Scale & Polish
- Ingestion job queue for async processing
- Connection pooling and query optimization
- Full-text search index optimization
- Export/import functionality