EngramEngramdocs
v0.1.0
Search docs…⌘K
GitHub
Core Concepts

Working Memory

Working memory is Engram's term for the dynamically assembled context block that gets injected into an AI prompt before it is sent to the model. It is the bridge between the persistent memory store and a live AI conversation.

long-term store │ (50 000 memories) │ ▼ recall_context(query, limit=20) │ working memory │ (top 8–20 most relevant) │ ▼ inject into system prompt │ LLM context window

How working memory is assembled

Every time an AI makes a recall_context call (or the proxy intercepts a request), Engram builds a working memory snapshot in four stages:

1
Embed the query
The incoming query is converted to a 384-dimensional vector using the local ONNX model (~12 ms).
2
HNSW vector search
The vector index finds the top-K candidates by cosine similarity (<5 ms over 50k memories).
3
Graph expansion
For each retrieved memory, Engram follows knowledge-graph edges to pull in related semantic facts — capturing context the query didn't directly mention.
4
Score & rank
Each candidate gets a composite score: similarity × importanceWeight + recencyBonus. The top N are selected for injection.

Scoring formula

score = (similarity × 0.45)
      + (recency   × 0.25)
      + (importance × 0.20)
      + (accessFreq × 0.10)

Where:
  similarity = cosine similarity (0–1)
  recency    = exp(–ln(2) × age_seconds / 604800)   // 7-day half-life
  importance = stored importance value (0–1)
  accessFreq = min(1.0, log10(access_count + 1) / 3)

The weights prioritise semantic relevance while giving a meaningful boost to recent and important memories. Access frequency rewards memories that are retrieved often, helping the system learn from usage patterns.

Injection format

Working memory is injected as a structured block prepended to the system prompt. The format is designed to be human-readable for the model:

[NEURAL MEMORY CONTEXT]

[KNOWLEDGE]
• Tech stack: Fastify + SQLite WAL-mode + HNSW vector index

[PATTERNS & SKILLS]
• When schema changes → drizzle-kit generate → drizzle-kit migrate

[PAST EVENTS & CONVERSATIONS]
• 2025-03-18: User asked about embedding compression — discussed FP16 trade-offs

[END MEMORY CONTEXT]

Context budget

Each working memory entry consumes tokens from the model's context window. Engram does not yet enforce a hard token budget, but the recallLimitconfig option controls the maximum number of entries injected.

ParameterDefaultEffect
maxTokens2000Approximate token budget for assembled context (max 8000).
topK20Max candidate memories from vector search.
threshold0.3Cosine similarity floor — entries below this are excluded.
graphDepth2BFS depth for knowledge-graph expansion.
For models with small context windows (e.g., 4k tokens), setmaxTokens to 500 in the recall_context call to keep injected context tight and highly relevant.

Auto-linking & neural graph

Every new memory automatically connects to its most similar neighbors. When you store a memory, Engram searches for the top 3 most similar existing memories (threshold ≥ 0.5) and creates bidirectional relates_to edges. The knowledge graph grows organically — no manual wiring needed.

During recall, graph expansion (step 3) traverses these edges to surface related facts the query didn't directly match. This mimics how human memory works: recalling one fact activates nearby neurons.

Consolidation

Like sleep consolidation in the human brain, Engram can merge clusters of similar episodic memories into stable semantic summaries. Call POST /api/consolidate to trigger it — or run it on a schedule.

Episodes that repeat a pattern (e.g., "user asked about deploys" appearing 5 times) are collapsed into a single semantic fact with boosted importance. The original episodes are archived, keeping the database lean while preserving knowledge.

Decay & retention

Decayed memories — those whose retention score has dropped below the archive threshold — are automatically excluded from vector search and recall. When you call recall_context, only live (non-archived) memories are returned.

The decay engine runs automatically every ENGRAM_DECAY_INTERVAL milliseconds (default: 1 hour) or can be triggered manually via POST /api/decay or the decay_sweep MCP tool. Each sweep:

  • Evaluates all non-archived memories in batches
  • Skips protected memories (high-importance semantics, recently accessed, pinned)
  • Archives memories with retention score below threshold
  • Progressively reduces importance on surviving memories
  • Optionally triggers auto-consolidation of old episodic clusters
Decay keeps the vector index lean and recall quality high as the brain grows. Without it, old low-value memories accumulate and dilute search results over time.

Auto-concepts

When a memory is stored without an explicit concept label, Engram extracts one automatically from the content (2–5 words). These labels appear as node names in the 3D visualization and help the knowledge graph stay navigable.

Working Memory — Engram Docs