Working Memory
Working memory is Engram's term for the dynamically assembled context block that gets injected into an AI prompt before it is sent to the model. It is the bridge between the persistent memory store and a live AI conversation.
How working memory is assembled
Every time an AI makes a recall_context call (or the proxy intercepts a request), Engram builds a working memory snapshot in four stages:
Scoring formula
score = (similarity × 0.45)
+ (recency × 0.25)
+ (importance × 0.20)
+ (accessFreq × 0.10)
Where:
similarity = cosine similarity (0–1)
recency = exp(–ln(2) × age_seconds / 604800) // 7-day half-life
importance = stored importance value (0–1)
accessFreq = min(1.0, log10(access_count + 1) / 3)The weights prioritise semantic relevance while giving a meaningful boost to recent and important memories. Access frequency rewards memories that are retrieved often, helping the system learn from usage patterns.
Injection format
Working memory is injected as a structured block prepended to the system prompt. The format is designed to be human-readable for the model:
[NEURAL MEMORY CONTEXT] [KNOWLEDGE] • Tech stack: Fastify + SQLite WAL-mode + HNSW vector index [PATTERNS & SKILLS] • When schema changes → drizzle-kit generate → drizzle-kit migrate [PAST EVENTS & CONVERSATIONS] • 2025-03-18: User asked about embedding compression — discussed FP16 trade-offs [END MEMORY CONTEXT]
Context budget
Each working memory entry consumes tokens from the model's context window. Engram does not yet enforce a hard token budget, but the recallLimitconfig option controls the maximum number of entries injected.
| Parameter | Default | Effect |
|---|---|---|
| maxTokens | 2000 | Approximate token budget for assembled context (max 8000). |
| topK | 20 | Max candidate memories from vector search. |
| threshold | 0.3 | Cosine similarity floor — entries below this are excluded. |
| graphDepth | 2 | BFS depth for knowledge-graph expansion. |
maxTokens to 500 in the recall_context call to keep injected context tight and highly relevant.Auto-linking & neural graph
Every new memory automatically connects to its most similar neighbors. When you store a memory, Engram searches for the top 3 most similar existing memories (threshold ≥ 0.5) and creates bidirectional relates_to edges. The knowledge graph grows organically — no manual wiring needed.
During recall, graph expansion (step 3) traverses these edges to surface related facts the query didn't directly match. This mimics how human memory works: recalling one fact activates nearby neurons.
Consolidation
Like sleep consolidation in the human brain, Engram can merge clusters of similar episodic memories into stable semantic summaries. Call POST /api/consolidate to trigger it — or run it on a schedule.
Episodes that repeat a pattern (e.g., "user asked about deploys" appearing 5 times) are collapsed into a single semantic fact with boosted importance. The original episodes are archived, keeping the database lean while preserving knowledge.
Decay & retention
Decayed memories — those whose retention score has dropped below the archive threshold — are automatically excluded from vector search and recall. When you call recall_context, only live (non-archived) memories are returned.
The decay engine runs automatically every ENGRAM_DECAY_INTERVAL milliseconds (default: 1 hour) or can be triggered manually via POST /api/decay or the decay_sweep MCP tool. Each sweep:
- Evaluates all non-archived memories in batches
- Skips protected memories (high-importance semantics, recently accessed, pinned)
- Archives memories with retention score below threshold
- Progressively reduces importance on surviving memories
- Optionally triggers auto-consolidation of old episodic clusters
Auto-concepts
When a memory is stored without an explicit concept label, Engram extracts one automatically from the content (2–5 words). These labels appear as node names in the 3D visualization and help the knowledge graph stay navigable.