Architecture
Memoturn runs on Cloudflare Workers. There is no application server; everything below the load balancer is V8 isolates and edge-resident state.
Data flow
Section titled “Data flow”your agent → /v1/mcp ─┐your SDK → /v1/mcp ─┤ edge worker ──▶ ProjectDO ──▶ ingest queue ──▶ ingest workeryour CLI → /v1/mcp ─┘ │ │ ▼ ▼ WebSocket fan-out Vectorize + Neon Postgres- The edge worker (
api.memoturn.ai) authenticates the request and forwards it to a project-scoped Durable Object. ProjectDOis the single point of write serialization for that project. It deduplicates on a content hash, hot-caches the last 200 turns, fans broadcasts out over hibernating WebSockets, and enqueues durable work onto the ingest queue.- The ingest worker drains the queue: embeds via Workers AI (
@cf/baai/bge-large-en-v1.5), upserts the vector to Vectorize, writes pointer rows to Neon Postgres through Hyperdrive, extracts entities (symbols/files/error codes/commands) into theentitiestable. - Rolling summaries run on a DO alarm — every ~20 turns or 10 minutes per session, the DO calls Llama 3.1 8B Instruct via Workers AI, writes a
kind=consolidatedmemory, and broadcastsconsolidation_completed.
Hybrid retrieval
Section titled “Hybrid retrieval”search_memory runs three retrieval legs in parallel and fuses with reciprocal rank fusion:
| leg | source | wins on |
|---|---|---|
| dense | Vectorize ANN | semantic match |
| lexical | Postgres tsvector + ts_rank_cd | keyword + boolean queries |
| hot | DO storage substring scan | very fresh writes (before ingest catches up) |
Search modes (auto / chunks / summaries / entities / code) push a kindFilter into each leg so retrieval stays focused. code mode adds a fourth leg ranked by entity-token match count.
Bindings (Cloudflare)
Section titled “Bindings (Cloudflare)”| binding | type | purpose |
|---|---|---|
PROJECT_DO | Durable Object | per-project write coordinator + WebSocket hub |
VECTORIZE | Vectorize Index | dense embeddings (memoturn-memories, 1024-dim) |
INGEST_QUEUE | Queue | async embed + Postgres + entity writes |
HYPERDRIVE | Hyperdrive | connection pooling to Neon Postgres |
AI | Workers AI | embeddings (@cf/baai/bge-large-en-v1.5) + summarization (@cf/meta/llama-3.1-8b-instruct) |
API_KEYS | KV | hashed API key lookups |
RATE_LIMITER | Rate Limit | 600 req/min per project |
Storage layout
Section titled “Storage layout”| store | role | what’s in it |
|---|---|---|
| Postgres (Neon + Hyperdrive) | source of truth | turns, memories, entities, embeddings, events, broadcasts, presence, usage_meters, auth tables |
| Vectorize | dense index | 1024-dim vectors with project_slug + kind + memory_kind metadata for filtering |
| DO storage | hot cache + write lock | last 200 turns, pinned + consolidated memories, broadcasts, summary state, dedup hashes |
| KV | API auth | apikey:${hash} for fast lookups |
| Queue | durable work | turn ingest, memory ingest |
Patterns worth knowing
Section titled “Patterns worth knowing”- UUIDv7 for turn / memory ids. Time-ordered, lex-comparable — chronological scans are free.
- Content hashing (SHA-256 of content + metadata) for idempotent writes. Dedup is atomic via DO
blockConcurrencyWhile. - Per-tool usage metering writes to
usage_meterskeyed(project_id, period)via a fire-and-forgetctx.waitUntil— never adds latency to the user response.