Skip to content

Architecture

Memoturn runs entirely on Cloudflare Workers. There is no application server; everything below the load balancer is V8 isolates and edge-resident state. That’s what makes per-project write serialization, sub-second WebSocket fan-out, and global low latency tractable in the same system.

your agent → /v1/mcp ─┐
your SDK → /v1/mcp ─┤ edge worker ──▶ ProjectDO ──▶ ingest queue ──▶ ingest worker
your CLI → /v1/mcp ─┘ │ │
▼ ▼
WebSocket fan-out vector + SQL stores
  1. The edge worker (api.memoturn.ai) authenticates the request and routes it to a project-scoped Durable Object.
  2. ProjectDO is the single point of write serialization for that project. It deduplicates on a content hash, hot-caches recent turns, fans broadcasts out over hibernating WebSockets, and enqueues durable work onto the ingest queue.
  3. The ingest worker drains the queue: generates a contextual synopsis via LLM (prepended to the embedding input for disambiguation), embeds the content, upserts the vector, writes the canonical row, extracts entities, and runs fact + candidate extraction via LLM (structured triples and durable insights staged for review).
  4. Rolling summaries run on a timer per session: every ~20 turns or 10 minutes the DO consolidates the session into a summary memory and broadcasts a consolidation_completed event.
  5. Daily cron (03:00 UTC) prunes old events, enforces retention, and runs memory consolidation — clustering old session summaries into durable semantic-tier reflections.

search_memory runs five retrieval legs in parallel and fuses with reciprocal rank fusion:

legwins on
dense (vector ANN)semantic match: paraphrases, related concepts
lexical (Postgres FTS)keyword and boolean queries
hot (DO storage scan)very fresh writes, before ingest catches up
entity (structured lookup)exact identifier / file path / error code matches
graph (entity co-occurrence BFS)related concepts connected in the knowledge graph

Post-fusion, results are weighted by salience (0.0–1.0 per memory, decaying exponentially by kind-specific half-life, boosted on recall) and then rescored by a cross-encoder (@cf/baai/bge-reranker-v2-m3) that blends 60% reranker score with 40% normalized RRF. An in-memory query cache (60s TTL, 100 entries) on the ProjectDO collapses repeated identical queries from agent loops.

Search modes (auto / chunks / summaries / entities / code / skills) push a kind filter into each leg so retrieval stays focused. code mode enables the entity leg for symbol-aware fusion.

The facts table stores structured subject-predicate-object triples with valid_from / valid_to temporal windows. record_fact auto-supersedes active facts with the same subject+predicate. query_facts supports point-in-time queries (“what was true on date X?”). find_contradictions scans for conflicting active facts via LLM judge.

Facts are auto-extracted during ingest (LLM extracts up to 5 triples per turn) and can be manually recorded via record_fact. The execute_edge tool creates typed relationships (supersedes/contradicts/derives_from/same_as) with cascading side effects.

LLM-extracted memory proposals land in the candidates table as pending before promotion to durable memory. list_candidates shows pending proposals; review_candidate(accept) promotes to a full pinned memory with embedding and search indexing. This prevents noise from low-confidence extractions polluting the semantic tier.

search_memory and list_recent_turns accept optional actor / tool / since / until filters. Every turn carries a tool provenance column (cursor, claude-code, cli/observe, …) so retrieval can be scoped to “everything Cursor wrote in the last 4 hours”. That’s the kind of cross-tool slice single-vendor memory can’t express.