Skip to content

Architecture

Memoturn runs on Cloudflare Workers. There is no application server; everything below the load balancer is V8 isolates and edge-resident state.

your agent → /v1/mcp ─┐
your SDK → /v1/mcp ─┤ edge worker ──▶ ProjectDO ──▶ ingest queue ──▶ ingest worker
your CLI → /v1/mcp ─┘ │ │
▼ ▼
WebSocket fan-out Vectorize + Neon Postgres
  1. The edge worker (api.memoturn.ai) authenticates the request and forwards it to a project-scoped Durable Object.
  2. ProjectDO is the single point of write serialization for that project. It deduplicates on a content hash, hot-caches the last 200 turns, fans broadcasts out over hibernating WebSockets, and enqueues durable work onto the ingest queue.
  3. The ingest worker drains the queue: embeds via Workers AI (@cf/baai/bge-large-en-v1.5), upserts the vector to Vectorize, writes pointer rows to Neon Postgres through Hyperdrive, extracts entities (symbols/files/error codes/commands) into the entities table.
  4. Rolling summaries run on a DO alarm — every ~20 turns or 10 minutes per session, the DO calls Llama 3.1 8B Instruct via Workers AI, writes a kind=consolidated memory, and broadcasts consolidation_completed.

search_memory runs three retrieval legs in parallel and fuses with reciprocal rank fusion:

legsourcewins on
denseVectorize ANNsemantic match
lexicalPostgres tsvector + ts_rank_cdkeyword + boolean queries
hotDO storage substring scanvery fresh writes (before ingest catches up)

Search modes (auto / chunks / summaries / entities / code) push a kindFilter into each leg so retrieval stays focused. code mode adds a fourth leg ranked by entity-token match count.

bindingtypepurpose
PROJECT_DODurable Objectper-project write coordinator + WebSocket hub
VECTORIZEVectorize Indexdense embeddings (memoturn-memories, 1024-dim)
INGEST_QUEUEQueueasync embed + Postgres + entity writes
HYPERDRIVEHyperdriveconnection pooling to Neon Postgres
AIWorkers AIembeddings (@cf/baai/bge-large-en-v1.5) + summarization (@cf/meta/llama-3.1-8b-instruct)
API_KEYSKVhashed API key lookups
RATE_LIMITERRate Limit600 req/min per project
storerolewhat’s in it
Postgres (Neon + Hyperdrive)source of truthturns, memories, entities, embeddings, events, broadcasts, presence, usage_meters, auth tables
Vectorizedense index1024-dim vectors with project_slug + kind + memory_kind metadata for filtering
DO storagehot cache + write locklast 200 turns, pinned + consolidated memories, broadcasts, summary state, dedup hashes
KVAPI authapikey:${hash} for fast lookups
Queuedurable workturn ingest, memory ingest
  • UUIDv7 for turn / memory ids. Time-ordered, lex-comparable — chronological scans are free.
  • Content hashing (SHA-256 of content + metadata) for idempotent writes. Dedup is atomic via DO blockConcurrencyWhile.
  • Per-tool usage metering writes to usage_meters keyed (project_id, period) via a fire-and-forget ctx.waitUntil — never adds latency to the user response.