Architecture
Memoturn runs entirely on Cloudflare Workers. There is no application server; everything below the load balancer is V8 isolates and edge-resident state. That’s what makes per-project write serialization, sub-second WebSocket fan-out, and global low latency tractable in the same system.
Data flow
Section titled “Data flow”your agent → /v1/mcp ─┐your SDK → /v1/mcp ─┤ edge worker ──▶ ProjectDO ──▶ ingest queue ──▶ ingest workeryour CLI → /v1/mcp ─┘ │ │ ▼ ▼ WebSocket fan-out vector + SQL stores- The edge worker (
api.memoturn.ai) authenticates the request and routes it to a project-scoped Durable Object. ProjectDOis the single point of write serialization for that project. It deduplicates on a content hash, hot-caches recent turns, fans broadcasts out over hibernating WebSockets, and enqueues durable work onto the ingest queue.- The ingest worker drains the queue: generates a contextual synopsis via LLM (prepended to the embedding input for disambiguation), embeds the content, upserts the vector, writes the canonical row, extracts entities, and runs fact + candidate extraction via LLM (structured triples and durable insights staged for review).
- Rolling summaries run on a timer per session: every ~20 turns or 10 minutes the DO consolidates the session into a summary memory and broadcasts a
consolidation_completedevent. - Daily cron (03:00 UTC) prunes old events, enforces retention, and runs memory consolidation — clustering old session summaries into durable semantic-tier reflections.
Hybrid retrieval
Section titled “Hybrid retrieval”search_memory runs five retrieval legs in parallel and fuses with reciprocal rank fusion:
| leg | wins on |
|---|---|
| dense (vector ANN) | semantic match: paraphrases, related concepts |
| lexical (Postgres FTS) | keyword and boolean queries |
| hot (DO storage scan) | very fresh writes, before ingest catches up |
| entity (structured lookup) | exact identifier / file path / error code matches |
| graph (entity co-occurrence BFS) | related concepts connected in the knowledge graph |
Post-fusion, results are weighted by salience (0.0–1.0 per memory, decaying exponentially by kind-specific half-life, boosted on recall) and then rescored by a cross-encoder (@cf/baai/bge-reranker-v2-m3) that blends 60% reranker score with 40% normalized RRF. An in-memory query cache (60s TTL, 100 entries) on the ProjectDO collapses repeated identical queries from agent loops.
Search modes (auto / chunks / summaries / entities / code / skills) push a kind filter into each leg so retrieval stays focused. code mode enables the entity leg for symbol-aware fusion.
Temporal knowledge graph
Section titled “Temporal knowledge graph”The facts table stores structured subject-predicate-object triples with valid_from / valid_to temporal windows. record_fact auto-supersedes active facts with the same subject+predicate. query_facts supports point-in-time queries (“what was true on date X?”). find_contradictions scans for conflicting active facts via LLM judge.
Facts are auto-extracted during ingest (LLM extracts up to 5 triples per turn) and can be manually recorded via record_fact. The execute_edge tool creates typed relationships (supersedes/contradicts/derives_from/same_as) with cascading side effects.
Candidate staging
Section titled “Candidate staging”LLM-extracted memory proposals land in the candidates table as pending before promotion to durable memory. list_candidates shows pending proposals; review_candidate(accept) promotes to a full pinned memory with embedding and search indexing. This prevents noise from low-confidence extractions polluting the semantic tier.
search_memory and list_recent_turns accept optional actor / tool / since / until filters. Every turn carries a tool provenance column (cursor, claude-code, cli/observe, …) so retrieval can be scoped to “everything Cursor wrote in the last 4 hours”. That’s the kind of cross-tool slice single-vendor memory can’t express.