MemoryOS
Retrieval-focused product system with FastAPI, React, PostgreSQL, and pgvector.
Overview
MemoryOS is a personal memory management system: users store notes, experiences, and knowledge fragments, and retrieve them via semantic search. The backend handles ingestion, embedding, vector storage, and ranked retrieval. The frontend provides a simple interface for capture and search. The core design goal was retrieval quality — not just returning results, but returning the right results with explainable relevance.
Stack
- Backend: Python, FastAPI, SQLAlchemy
- Database: PostgreSQL + pgvector extension
- Embeddings: OpenAI
text-embedding-3-small(swappable interface) - Frontend: React, TypeScript
- Search: hybrid — vector similarity + keyword filter
- Deployment: Docker Compose
Key Design Decisions
pgvector over a dedicated vector DB — Keeping embeddings in PostgreSQL alongside relational metadata avoids the operational complexity of a separate vector database. The ivfflat index handles search at this scale. If the dataset grows past a few million records, migrating to a dedicated store is straightforward because the embedding interface is abstracted.
Hybrid retrieval — Pure vector search returns semantically similar results but can miss exact-match keyword queries. Pure keyword search misses paraphrase and concept overlap. The system runs both in parallel and merges results by a weighted score, with the keyword filter acting as a hard gate for time-based or tag-based constraints.
Chunking strategy — Long notes are chunked by paragraph with 10% overlap. Each chunk stores a back-reference to the source document. Retrieval returns chunks, but the UI always surfaces the full document context so users aren’t shown fragments without origin.
Embedding model as a dependency — The embedding client is injected at startup. Swapping from OpenAI to a local model (e.g., sentence-transformers) requires changing one config value, not refactoring call sites.
Challenges
The hardest retrieval problem was temporal relevance: a semantically close result from 3 years ago is often less useful than a moderately relevant result from last week. Adding a recency decay factor to the ranking score helped, but the right decay coefficient required tuning against real user feedback rather than a principled formula.
The second issue was chunking granularity. Chunks too small lost context; chunks too large diluted the embedding signal. Paragraph-level chunking with overlap was a reasonable default but the system exposes chunk size as a configurable parameter.
Results
- Semantic search latency: <100ms p95 for collections up to ~50k chunks (pgvector ivfflat, single-node Postgres)
- Retrieval precision: qualitatively improved with hybrid search vs. vector-only on a test set of 200 queries
- Full stack running in Docker Compose with one
docker compose up