Memory
ScalyClaw remembers. Not just within a conversation, but across all conversations and all channels, persisted as a structured knowledge store that accumulates over time. Every interaction is an opportunity to extract something durable — a fact, a preference, a decision, an event, a relationship between entities — and surface it later when it becomes relevant.
Memory is not a chat log. It is a typed, indexed, searchable store of extracted statements plus a knowledge graph of entities and relations. An automatic background job writes to it as conversations happen, and a hybrid retrieval pipeline pulls the most relevant entries into the system prompt before each reply.
How Memory Works
Memory has two active systems running alongside normal conversation: an extractor that decides what to remember, and a retriever that decides what to recall. Both run automatically; neither requires any user intervention.
Storage Layout
All memory data lives in a single SQLite database file using bun:sqlite, with the sqlite-vec extension loaded for vector similarity and FTS5 for full-text search. The schema:
memories— one row per memory record:id,type,subject,content,tags,source,importance,embedding(BLOB),ttl,access_count,last_accessed_at,consolidated_into,created_at,updated_at.memory_tags— normalized tag index (memory_id, tag).memory_entities— entity catalog:id,name,entity_type,first_seen,last_seen,mention_count.memory_entity_mentions— link table: which memories mention which entity.memory_relations— directed edges in the knowledge graph:source_id,relation,target_id, plus optional backreference to the memory that produced the relation.memory_vec—vec0virtual table (sqlite-vec). Holds one float32 vector per memory. Vector dimensions are parameterized at DB init from the selected embedding model.memory_fts— FTS5 virtual table oversubject,content,tags.
The memory_vec table is created with the dimension count reported by whichever embedding model is enabled on first boot. Switching later to a model with different dimensions will fail the dimension-guard inside generateEmbedding at runtime. There is currently no built-in bulk re-embed tool — to switch embedding models cleanly you must either pick a model with the same dimension count or wipe the SQLite file and let ScalyClaw rebuild on next start.
Record Fields
Every memory has these fields. Note: there is no confidence field. Certainty is expressed by the importance integer (1–10) plus the source enum (user-stated, inferred, observed).
| Field | Type | Description |
|---|---|---|
id | string | Primary key (UUID). |
type | string | One of semantic, episodic, procedural (see Types below). |
subject | string | Short label for listing and dedup. |
content | string | Full, self-contained statement. |
tags | optional | Comma-separated string; also indexed in memory_tags. |
source | optional | user-stated / inferred / observed / auto-extraction / consolidation. |
importance | integer | 1–10, default 5. Drives retrieval ranking (see scoring formula below). |
embedding | BLOB | Float32 vector from the enabled embedding model. Mirrored into memory_vec. |
ttl | optional | ISO timestamp; expired rows are filtered from search results. |
access_count | integer | Bumped on each successful retrieval (async, non-blocking). |
last_accessed_at | optional | Updated with access_count. |
consolidated_into | optional | If set, this memory was merged into another and is excluded from search. |
created_at / updated_at | string | ISO timestamps. |
Memory Types
Every memory is tagged with one of three top-level types. These are the canonical values used throughout the code; legacy strings (fact, conversation, analysis, research) that older clients might send are auto-mapped into these three at write time.
| Type | What it holds | Example |
|---|---|---|
| semantic | Facts, knowledge, personal info, preferences, decisions, opinions, people. The durable stuff you want the model to treat as "known". | "User works at Acme Corp as a senior backend engineer." "User prefers TypeScript over JavaScript for code examples." |
| episodic | Events, interactions, what happened — anchored in time. Meetings, past experiences, specific incidents. | "User shipped the 2.0 release on 2026-03-15." "User mentioned a product review meeting scheduled for Friday." |
| procedural | Patterns, workflows, how-to, routines, processes. The "this is how we do X here" knowledge. | "User deploys via bun run scalyclaw:worker start --name worker1 on each worker host." |
How Memories Are Written
There are two write paths. Both end at the same storeMemory function — the difference is who decides to call it.
Background extraction (primary)
- After the orchestrator produces a reply, the user message is pushed onto a per-channel buffer in Redis.
- A
memory-extractionjob is enqueued on the internal queue with a 30-second debounce delay. Further messages within that window are appended to the same buffer instead of enqueuing new jobs. - When the job fires, it pulls the buffered user messages, runs them through the
EXTRACTION_PROMPT(temperature 0) against the current chat model, and parses the JSON array of candidate memories. - Each candidate is deduplicated against existing memories via vector search with similarity threshold
0.92. If a match is found, the candidate is skipped. - Survivors are normalized (legacy type mapping), embedded, inserted into
memories,memory_vec, andmemory_fts, and any attached entities / relations are upserted into the graph tables.
Direct tool call (secondary)
The LLM has a memory_store tool available during any reply. It can call this directly when the user shares something obviously memorable ("I just got promoted to VP of Engineering") — that avoids waiting for the next debounce window. The tool enforces the same dedup check as the background path.
How Memories Are Recalled
Before the orchestrator builds the system prompt for a reply, it runs a retrieval pass. Top results are rendered as a "Relevant Memories" block at the end of the system prompt.
- Embed the incoming user message with the configured embedding model.
- Run a
memory_veccosine similarity search for the topK × 3candidates (K = memory.topK, default 10). - Optional tag filter (if the caller supplied tags, all must match). TTL filter (drop expired). Exclude rows where
consolidated_into IS NOT NULL. - Re-rank by composite score (see formula below) and return the top
K. - If embeddings aren't available or the vector search returned nothing, fall back to an FTS5 query over the same normalized text. Same filtering + scoring applies.
- Bump
access_count/last_accessed_aton the surfaced rows, in a fire-and-forget async batch.
Composite Scoring
final = w_semantic · semanticScore + w_recency · recencyWeight + w_importance · importanceWeight semanticScore = 1 - vectorDistance (or FTS5 rank, normalized) recencyWeight = exp(-decayRate × daysSinceUpdate) importanceWeight = importance / 10 Defaults: w_semantic=0.6, w_recency=0.2, w_importance=0.2, decayRate=0.05, topK=10, scoreThreshold=0.5.
All five knobs (weights.semantic, weights.recency, weights.importance, decayRate, topK, scoreThreshold) live in config.memory and can be tuned at runtime without a restart. The LLM can also override weights on a per-call basis via the memory_search tool's weights parameter — useful when it wants to deliberately bias toward older-but-important context or recent-but-unimportant chatter.
Consolidation
Over time a memory store accumulates near-duplicates and fragments. Consolidation is a scheduled background job that clusters similar memories and merges each cluster into a single comprehensive entry.
- When: cron schedule in
config.memory.consolidation.schedule(default0 3 * * *— daily at 03:00 UTC). Can be disabled viaconfig.memory.consolidation.enabled. - Clustering: pairwise vector similarity ≥
similarityThreshold(default0.85), within the same type, up tomaxClusterSizemembers per cluster (default5). - Merging: the
CONSOLIDATION_PROMPTis run against the selected chat model; the LLM produces a merged subject, content, importance, and tags. - Bookkeeping: the merged memory is stored with
source: "consolidation", a fresh embedding, and importance =max(source importances, merged result). Original rows are updated withconsolidated_into = newId, which excludes them from future search. Entity mentions are relinked from originals to the merged row. - Manual trigger: the LLM can call
memory_reflectto force a consolidation pass, even if the cron is disabled.
Entity Graph
Alongside the flat list of memories, ScalyClaw maintains a lightweight directed graph of the people, places, and things that appear across them. Entities come from two sources: (1) the background extraction job, which can emit an entities array for each memory, and (2) direct memory_store calls that include an entities field.
Entity types (enum): person, project, technology, place, organization, concept.
Each entity carries first_seen, last_seen, and mention_count, so the graph can distinguish recently-relevant entities from long-tail noise. Relations are stored as directed edges (source, relation_label, target) with an optional backreference to the memory that introduced them and an integer strength that increments every time the same edge is re-asserted. Stale entities (mention_count ≤ 1, not seen in 90 days) are pruned automatically.
At reply time, the top 5 most-mentioned entities are injected into the system prompt as a "Known Entities" block, so the model has a running picture of the user's world before it decides what to retrieve in detail. The LLM can also call memory_graph to traverse relations from a named entity up to a configurable depth.
LLM Tools
The model has seven memory-related tools available during every turn. They are schema-validated at the orchestrator so malformed calls fail fast rather than silently corrupting the store.
| Tool | Purpose | Required | Optional |
|---|---|---|---|
memory_store |
Write a new memory (with dedup). | type, subject, content |
tags[], importance (1–10), source, ttl (duration string, e.g. "7d"), entities[] (with name, type, optional relations) |
memory_search |
Composite-scored semantic search. | query |
type, tags[], topK, weights (override semantic/recency/importance) |
memory_recall |
Browse by ID, type, or tags (no semantic scoring). | — | id, type, tags[], includeConsolidated |
memory_update |
Update a memory in place; embedding regenerates if content changed. | id |
subject, content, tags[], importance |
memory_delete |
Permanently remove a memory and its vector / FTS rows. | id |
— |
memory_reflect |
Trigger consolidation on demand. | — | force (run even if the cron job is disabled) |
memory_graph |
Traverse the entity graph from a named entity. | entity |
depth (default 2) |
Example tool call
// Storing a semantic memory with entities + relations in one call { "name": "memory_store", "arguments": { "type": "semantic", "subject": "User's current employer", "content": "User works at Acme Corp as a senior backend engineer on the Payments team.", "tags": ["work", "employment"], "importance": 8, "source": "user-stated", "entities": [ { "name": "Acme Corp", "type": "organization" }, { "name": "Payments", "type": "project", "relations": [{ "relation": "part_of", "target": "Acme Corp" }] } ] } }
Dashboard API
The same operations available to the LLM are exposed as REST endpoints for the dashboard. These endpoints are what the Memory page calls.
| Method | Path | Purpose |
|---|---|---|
GET | /api/memory | List recent memories (default 20, no filters). |
GET | /api/memory/search | Search with query string ?q=…&topK=…&type=…&tags=…. |
GET | /api/memory/:id | Fetch a single memory. |
POST | /api/memory | Create a memory (type, subject, content required). |
PUT | /api/memory/:id | Update a memory (subject, content, tags, importance). |
DELETE | /api/memory/:id | Remove a memory. |
Dashboard
The Memory page in the dashboard lets you inspect and maintain the store directly. It shows:
- Recent memories table — each row shows the type badge (blue = semantic, purple = episodic, green = procedural), an importance tier badge (Critical ≥ 8, Important ≥ 6, Useful ≥ 4, Trivial below), the subject, tags, and creation date. Rows expand to reveal the full content.
- Search form — runs the same
/api/memory/searchcall used by the LLM; topK, type filter, and tag filter are supported. - New entry dialog — seed memories manually (type, subject, content, tags, importance). The new entry is embedded immediately and becomes retrievable on the next query.
- Delete — permanently removes the entry and its vector/FTS rows.
- Settings dialog — edit
config.memorylive: embedding model, topK, scoreThreshold, decay rate, weights, consolidation schedule / threshold / max cluster size. No restart needed.
You don't have to open the dashboard to remove a memory. Telling the assistant something like "forget that I work at Acme Corp" in any conversation will make it call memory_search to find matching entries and then memory_delete to remove them. Deletion is immediate and takes effect before the reply is built.
Config Reference
All defaults below come from CONFIG_DEFAULTS.memory. Everything in this block is hot-reloadable from the dashboard Settings dialog; no restart needed.
{
"memory": {
"topK": 10, // results returned per retrieval pass
"scoreThreshold": 0.5, // vector results below this are dropped
"embeddingModel": "auto", // "auto" or explicit model id
"weights": {
"semantic": 0.6,
"recency": 0.2,
"importance": 0.2
},
"decayRate": 0.05, // exp decay per day since updated_at
"consolidation": {
"enabled": true,
"schedule": "0 3 * * *", // cron — daily 03:00 UTC
"similarityThreshold": 0.85,
"maxClusterSize": 5
}
}
}
The memory store is global, not per-channel. A fact learned during a Telegram conversation is immediately available in a Discord conversation, and vice versa. There is no channel_id column — recall is always system-wide. If you want isolated knowledge stores for different users, run separate ScalyClaw instances; the memory DB is just a SQLite file on disk, so each instance owns its own.