Memory

ScalyClaw remembers. Not just within a conversation, but across all conversations and all channels, persisted as a structured knowledge store that accumulates over time. Every interaction is an opportunity to extract something durable — a fact, a preference, a decision, an event, a relationship between entities — and surface it later when it becomes relevant.

Memory is not a chat log. It is a typed, indexed, searchable store of extracted statements plus a knowledge graph of entities and relations. An automatic background job writes to it as conversations happen, and a hybrid retrieval pipeline pulls the most relevant entries into the system prompt before each reply.

How Memory Works

Memory has two active systems running alongside normal conversation: an extractor that decides what to remember, and a retriever that decides what to recall. Both run automatically; neither requires any user intervention.

Storage Layout

All memory data lives in a single SQLite database file using bun:sqlite, with the sqlite-vec extension loaded for vector similarity and FTS5 for full-text search. The schema:

  • memories — one row per memory record: id, type, subject, content, tags, source, importance, embedding (BLOB), ttl, access_count, last_accessed_at, consolidated_into, created_at, updated_at.
  • memory_tags — normalized tag index (memory_id, tag).
  • memory_entities — entity catalog: id, name, entity_type, first_seen, last_seen, mention_count.
  • memory_entity_mentions — link table: which memories mention which entity.
  • memory_relations — directed edges in the knowledge graph: source_id, relation, target_id, plus optional backreference to the memory that produced the relation.
  • memory_vecvec0 virtual table (sqlite-vec). Holds one float32 vector per memory. Vector dimensions are parameterized at DB init from the selected embedding model.
  • memory_fts — FTS5 virtual table over subject, content, tags.
Embedding dimensions are locked at first boot

The memory_vec table is created with the dimension count reported by whichever embedding model is enabled on first boot. Switching later to a model with different dimensions will fail the dimension-guard inside generateEmbedding at runtime. There is currently no built-in bulk re-embed tool — to switch embedding models cleanly you must either pick a model with the same dimension count or wipe the SQLite file and let ScalyClaw rebuild on next start.

Record Fields

Every memory has these fields. Note: there is no confidence field. Certainty is expressed by the importance integer (1–10) plus the source enum (user-stated, inferred, observed).

FieldTypeDescription
idstringPrimary key (UUID).
typestringOne of semantic, episodic, procedural (see Types below).
subjectstringShort label for listing and dedup.
contentstringFull, self-contained statement.
tagsoptionalComma-separated string; also indexed in memory_tags.
sourceoptionaluser-stated / inferred / observed / auto-extraction / consolidation.
importanceinteger1–10, default 5. Drives retrieval ranking (see scoring formula below).
embeddingBLOBFloat32 vector from the enabled embedding model. Mirrored into memory_vec.
ttloptionalISO timestamp; expired rows are filtered from search results.
access_countintegerBumped on each successful retrieval (async, non-blocking).
last_accessed_atoptionalUpdated with access_count.
consolidated_intooptionalIf set, this memory was merged into another and is excluded from search.
created_at / updated_atstringISO timestamps.

Memory Types

Every memory is tagged with one of three top-level types. These are the canonical values used throughout the code; legacy strings (fact, conversation, analysis, research) that older clients might send are auto-mapped into these three at write time.

TypeWhat it holdsExample
semantic Facts, knowledge, personal info, preferences, decisions, opinions, people. The durable stuff you want the model to treat as "known". "User works at Acme Corp as a senior backend engineer."
"User prefers TypeScript over JavaScript for code examples."
episodic Events, interactions, what happened — anchored in time. Meetings, past experiences, specific incidents. "User shipped the 2.0 release on 2026-03-15."
"User mentioned a product review meeting scheduled for Friday."
procedural Patterns, workflows, how-to, routines, processes. The "this is how we do X here" knowledge. "User deploys via bun run scalyclaw:worker start --name worker1 on each worker host."

How Memories Are Written

There are two write paths. Both end at the same storeMemory function — the difference is who decides to call it.

Background extraction (primary)

  1. After the orchestrator produces a reply, the user message is pushed onto a per-channel buffer in Redis.
  2. A memory-extraction job is enqueued on the internal queue with a 30-second debounce delay. Further messages within that window are appended to the same buffer instead of enqueuing new jobs.
  3. When the job fires, it pulls the buffered user messages, runs them through the EXTRACTION_PROMPT (temperature 0) against the current chat model, and parses the JSON array of candidate memories.
  4. Each candidate is deduplicated against existing memories via vector search with similarity threshold 0.92. If a match is found, the candidate is skipped.
  5. Survivors are normalized (legacy type mapping), embedded, inserted into memories, memory_vec, and memory_fts, and any attached entities / relations are upserted into the graph tables.

Direct tool call (secondary)

The LLM has a memory_store tool available during any reply. It can call this directly when the user shares something obviously memorable ("I just got promoted to VP of Engineering") — that avoids waiting for the next debounce window. The tool enforces the same dedup check as the background path.

How Memories Are Recalled

Before the orchestrator builds the system prompt for a reply, it runs a retrieval pass. Top results are rendered as a "Relevant Memories" block at the end of the system prompt.

  1. Embed the incoming user message with the configured embedding model.
  2. Run a memory_vec cosine similarity search for the top K × 3 candidates (K = memory.topK, default 10).
  3. Optional tag filter (if the caller supplied tags, all must match). TTL filter (drop expired). Exclude rows where consolidated_into IS NOT NULL.
  4. Re-rank by composite score (see formula below) and return the top K.
  5. If embeddings aren't available or the vector search returned nothing, fall back to an FTS5 query over the same normalized text. Same filtering + scoring applies.
  6. Bump access_count / last_accessed_at on the surfaced rows, in a fire-and-forget async batch.

Composite Scoring

text
final = w_semantic · semanticScore
      + w_recency  · recencyWeight
      + w_importance · importanceWeight

semanticScore    = 1 - vectorDistance  (or FTS5 rank, normalized)
recencyWeight    = exp(-decayRate × daysSinceUpdate)
importanceWeight = importance / 10

Defaults: w_semantic=0.6, w_recency=0.2, w_importance=0.2, decayRate=0.05, topK=10, scoreThreshold=0.5.

All five knobs (weights.semantic, weights.recency, weights.importance, decayRate, topK, scoreThreshold) live in config.memory and can be tuned at runtime without a restart. The LLM can also override weights on a per-call basis via the memory_search tool's weights parameter — useful when it wants to deliberately bias toward older-but-important context or recent-but-unimportant chatter.

Consolidation

Over time a memory store accumulates near-duplicates and fragments. Consolidation is a scheduled background job that clusters similar memories and merges each cluster into a single comprehensive entry.

  • When: cron schedule in config.memory.consolidation.schedule (default 0 3 * * * — daily at 03:00 UTC). Can be disabled via config.memory.consolidation.enabled.
  • Clustering: pairwise vector similarity ≥ similarityThreshold (default 0.85), within the same type, up to maxClusterSize members per cluster (default 5).
  • Merging: the CONSOLIDATION_PROMPT is run against the selected chat model; the LLM produces a merged subject, content, importance, and tags.
  • Bookkeeping: the merged memory is stored with source: "consolidation", a fresh embedding, and importance = max(source importances, merged result). Original rows are updated with consolidated_into = newId, which excludes them from future search. Entity mentions are relinked from originals to the merged row.
  • Manual trigger: the LLM can call memory_reflect to force a consolidation pass, even if the cron is disabled.

Entity Graph

Alongside the flat list of memories, ScalyClaw maintains a lightweight directed graph of the people, places, and things that appear across them. Entities come from two sources: (1) the background extraction job, which can emit an entities array for each memory, and (2) direct memory_store calls that include an entities field.

Entity types (enum): person, project, technology, place, organization, concept.

Each entity carries first_seen, last_seen, and mention_count, so the graph can distinguish recently-relevant entities from long-tail noise. Relations are stored as directed edges (source, relation_label, target) with an optional backreference to the memory that introduced them and an integer strength that increments every time the same edge is re-asserted. Stale entities (mention_count ≤ 1, not seen in 90 days) are pruned automatically.

At reply time, the top 5 most-mentioned entities are injected into the system prompt as a "Known Entities" block, so the model has a running picture of the user's world before it decides what to retrieve in detail. The LLM can also call memory_graph to traverse relations from a named entity up to a configurable depth.

LLM Tools

The model has seven memory-related tools available during every turn. They are schema-validated at the orchestrator so malformed calls fail fast rather than silently corrupting the store.

ToolPurposeRequiredOptional
memory_store Write a new memory (with dedup). type, subject, content tags[], importance (1–10), source, ttl (duration string, e.g. "7d"), entities[] (with name, type, optional relations)
memory_search Composite-scored semantic search. query type, tags[], topK, weights (override semantic/recency/importance)
memory_recall Browse by ID, type, or tags (no semantic scoring). id, type, tags[], includeConsolidated
memory_update Update a memory in place; embedding regenerates if content changed. id subject, content, tags[], importance
memory_delete Permanently remove a memory and its vector / FTS rows. id
memory_reflect Trigger consolidation on demand. force (run even if the cron job is disabled)
memory_graph Traverse the entity graph from a named entity. entity depth (default 2)

Example tool call

json
// Storing a semantic memory with entities + relations in one call
{
  "name": "memory_store",
  "arguments": {
    "type": "semantic",
    "subject": "User's current employer",
    "content": "User works at Acme Corp as a senior backend engineer on the Payments team.",
    "tags": ["work", "employment"],
    "importance": 8,
    "source": "user-stated",
    "entities": [
      { "name": "Acme Corp", "type": "organization" },
      { "name": "Payments", "type": "project",
        "relations": [{ "relation": "part_of", "target": "Acme Corp" }] }
    ]
  }
}

Dashboard API

The same operations available to the LLM are exposed as REST endpoints for the dashboard. These endpoints are what the Memory page calls.

MethodPathPurpose
GET/api/memoryList recent memories (default 20, no filters).
GET/api/memory/searchSearch with query string ?q=…&topK=…&type=…&tags=….
GET/api/memory/:idFetch a single memory.
POST/api/memoryCreate a memory (type, subject, content required).
PUT/api/memory/:idUpdate a memory (subject, content, tags, importance).
DELETE/api/memory/:idRemove a memory.

Dashboard

The Memory page in the dashboard lets you inspect and maintain the store directly. It shows:

  • Recent memories table — each row shows the type badge (blue = semantic, purple = episodic, green = procedural), an importance tier badge (Critical ≥ 8, Important ≥ 6, Useful ≥ 4, Trivial below), the subject, tags, and creation date. Rows expand to reveal the full content.
  • Search form — runs the same /api/memory/search call used by the LLM; topK, type filter, and tag filter are supported.
  • New entry dialog — seed memories manually (type, subject, content, tags, importance). The new entry is embedded immediately and becomes retrievable on the next query.
  • Delete — permanently removes the entry and its vector/FTS rows.
  • Settings dialog — edit config.memory live: embedding model, topK, scoreThreshold, decay rate, weights, consolidation schedule / threshold / max cluster size. No restart needed.
Asking ScalyClaw to forget

You don't have to open the dashboard to remove a memory. Telling the assistant something like "forget that I work at Acme Corp" in any conversation will make it call memory_search to find matching entries and then memory_delete to remove them. Deletion is immediate and takes effect before the reply is built.

Config Reference

All defaults below come from CONFIG_DEFAULTS.memory. Everything in this block is hot-reloadable from the dashboard Settings dialog; no restart needed.

json
{
  "memory": {
    "topK": 10,                           // results returned per retrieval pass
    "scoreThreshold": 0.5,                 // vector results below this are dropped
    "embeddingModel": "auto",                // "auto" or explicit model id
    "weights": {
      "semantic": 0.6,
      "recency": 0.2,
      "importance": 0.2
    },
    "decayRate": 0.05,                    // exp decay per day since updated_at
    "consolidation": {
      "enabled": true,
      "schedule": "0 3 * * *",              // cron — daily 03:00 UTC
      "similarityThreshold": 0.85,
      "maxClusterSize": 5
    }
  }
}
Memories are shared across all channels

The memory store is global, not per-channel. A fact learned during a Telegram conversation is immediately available in a Discord conversation, and vice versa. There is no channel_id column — recall is always system-wide. If you want isolated knowledge stores for different users, run separate ScalyClaw instances; the memory DB is just a SQLite file on disk, so each instance owns its own.