EngramEngramdocs
v0.1.0
Search docs…⌘K
GitHub
Integrations

Ollama Proxy

Engram ships a transparent Ollama proxy that sits between your Ollama client and the Ollama daemon. It intercepts every chat request, injects relevant memories into the system prompt, and forwards the enriched request to Ollama — all without any changes to your client code.

your app ──:11435──▶ engram proxy ──recall──▶ memory store │ inject context ▼ engram proxy ──:11434──▶ ollama daemon

Start the proxy

The proxy is part of the @engram-ai-memory/adapter-ollama package. Start it alongside your Engram server:

# From the Engram repository root
node adapters/ollama/dist/proxy.js

# Output:
# Engram × Ollama Proxy
#   Listening:     http://localhost:11435
#   Ollama target: http://localhost:11434
#   Engram:        http://localhost:4901
The Engram server must be running on port 4901 and Ollama on port 11434. The proxy listens on 11435 (one above Ollama's default).

Environment variables

VariableDefaultDescription
OLLAMA_PROXY_PORT11435Proxy listen port
OLLAMA_TARGEThttp://localhost:11434Real Ollama daemon URL
ENGRAM_APIhttp://localhost:4901Engram REST API URL
ENGRAM_MAX_TOKENS1500Max context tokens to inject

Configure your Ollama client

Point your client at the proxy port instead of the native Ollama port. Everything else stays the same:

Environment variable

export OLLAMA_HOST=http://localhost:11435
ollama run llama3

Python (ollama-python)

import ollama

client = ollama.Client(host='http://localhost:11435')
response = client.chat(
    model='llama3',
    messages=[{'role': 'user', 'content': 'What did we discuss last week?'}]
)
# Engram silently injected relevant memories into the system prompt

JavaScript / TypeScript

import { Ollama } from 'ollama'

const ollama = new Ollama({ host: 'http://localhost:11435' })

const response = await ollama.chat({
  model: 'llama3',
  messages: [{ role: 'user', content: 'Continue where we left off' }]
})

Open WebUI

In Open WebUI settings, change the Ollama API URL from http://localhost:11434 to http://localhost:11435. All conversations will automatically gain memory.

How injection works

When the proxy receives a /api/chat or /api/generate request, it runs the following pipeline:

1
Extract query
Takes the last user message from the request body.
2
Recall from Engram
Calls POST /api/recall with a 3-second timeout. If Engram is unavailable, passes through unchanged.
3
Inject context
For /api/chat: prepends or appends to the system message. For /api/generate: injects into the system field.
4
Forward to Ollama
Sends the enriched request to the real Ollama daemon and streams back the response.
5
Store response
After the response completes, stores the user query + assistant reply as an episodic memory (fire-and-forget).
The proxy gracefully degrades — if Engram is down, requests pass through to Ollama unchanged. No errors, no latency penalty beyond the 3-second timeout.

Running in production

Use PM2 or systemd to keep the proxy running:

# PM2
pm2 start /path/to/engram/adapters/ollama/dist/proxy.js \
  --name engram-ollama \
  --env OLLAMA_PROXY_PORT=11435

# Verify
curl http://localhost:11435/api/tags
Ollama Proxy — Engram Docs