Architecture

Memoturn runs on Cloudflare Workers. There is no application server; everything below the load balancer is V8 isolates and edge-resident state.

Data flow

your agent → /v1/mcp ─┐
your SDK   → /v1/mcp ─┤  edge worker  ──▶  ProjectDO  ──▶  ingest queue ──▶  ingest worker
your CLI   → /v1/mcp ─┘                       │                                    │
                                              ▼                                    ▼
                                       WebSocket fan-out                Vectorize + Neon Postgres

The edge worker (api.memoturn.ai) authenticates the request and forwards it to a project-scoped Durable Object.
ProjectDO is the single point of write serialization for that project. It deduplicates on a content hash, hot-caches the last 200 turns, fans broadcasts out over hibernating WebSockets, and enqueues durable work onto the ingest queue.
The ingest worker drains the queue: embeds via Workers AI (@cf/baai/bge-large-en-v1.5), upserts the vector to Vectorize, writes pointer rows to Neon Postgres through Hyperdrive, extracts entities (symbols/files/error codes/commands) into the entities table.
Rolling summaries run on a DO alarm — every ~20 turns or 10 minutes per session, the DO calls Llama 3.1 8B Instruct via Workers AI, writes a kind=consolidated memory, and broadcasts consolidation_completed.

Hybrid retrieval

search_memory runs three retrieval legs in parallel and fuses with reciprocal rank fusion:

leg	source	wins on
dense	Vectorize ANN	semantic match
lexical	Postgres `tsvector` + `ts_rank_cd`	keyword + boolean queries
hot	DO storage substring scan	very fresh writes (before ingest catches up)

Search modes (auto / chunks / summaries / entities / code) push a kindFilter into each leg so retrieval stays focused. code mode adds a fourth leg ranked by entity-token match count.

Bindings (Cloudflare)

binding	type	purpose
`PROJECT_DO`	Durable Object	per-project write coordinator + WebSocket hub
`VECTORIZE`	Vectorize Index	dense embeddings (`memoturn-memories`, 1024-dim)
`INGEST_QUEUE`	Queue	async embed + Postgres + entity writes
`HYPERDRIVE`	Hyperdrive	connection pooling to Neon Postgres
`AI`	Workers AI	embeddings (`@cf/baai/bge-large-en-v1.5`) + summarization (`@cf/meta/llama-3.1-8b-instruct`)
`API_KEYS`	KV	hashed API key lookups
`RATE_LIMITER`	Rate Limit	600 req/min per project

Storage layout

store	role	what’s in it
Postgres (Neon + Hyperdrive)	source of truth	`turns`, `memories`, `entities`, `embeddings`, `events`, `broadcasts`, `presence`, `usage_meters`, auth tables
Vectorize	dense index	1024-dim vectors with `project_slug` + `kind` + `memory_kind` metadata for filtering
DO storage	hot cache + write lock	last 200 turns, pinned + consolidated memories, broadcasts, summary state, dedup hashes
KV	API auth	`apikey:${hash}` for fast lookups
Queue	durable work	turn ingest, memory ingest

Patterns worth knowing

UUIDv7 for turn / memory ids. Time-ordered, lex-comparable — chronological scans are free.
Content hashing (SHA-256 of content + metadata) for idempotent writes. Dedup is atomic via DO blockConcurrencyWhile.
Per-tool usage metering writes to usage_meters keyed (project_id, period) via a fire-and-forget ctx.waitUntil — never adds latency to the user response.