Ollama Proxy
Engram ships a transparent Ollama proxy that sits between your Ollama client and the Ollama daemon. It intercepts every chat request, injects relevant memories into the system prompt, and forwards the enriched request to Ollama — all without any changes to your client code.
Start the proxy
The proxy is part of the @engram-ai-memory/adapter-ollama package. Start it alongside your Engram server:
# From the Engram repository root node adapters/ollama/dist/proxy.js # Output: # Engram × Ollama Proxy # Listening: http://localhost:11435 # Ollama target: http://localhost:11434 # Engram: http://localhost:4901
11435 (one above Ollama's default).Environment variables
| Variable | Default | Description |
|---|---|---|
| OLLAMA_PROXY_PORT | 11435 | Proxy listen port |
| OLLAMA_TARGET | http://localhost:11434 | Real Ollama daemon URL |
| ENGRAM_API | http://localhost:4901 | Engram REST API URL |
| ENGRAM_MAX_TOKENS | 1500 | Max context tokens to inject |
| ENGRAM_TOOL_RETRY | true | Retry once when model misses a tool call |
Configure your client
The proxy intercepts three endpoint paths: /api/chat, /api/generate (Ollama native), and /v1/chat/completions (OpenAI-compatible). Point any client at port 11435:
Environment variable
export OLLAMA_HOST=http://localhost:11435 ollama run llama3
Python (ollama-python)
import ollama
client = ollama.Client(host='http://localhost:11435')
response = client.chat(
model='llama3',
messages=[{'role': 'user', 'content': 'What did we discuss last week?'}]
)
# Engram silently injected relevant memories into the system promptJavaScript / TypeScript
import { Ollama } from 'ollama'
const ollama = new Ollama({ host: 'http://localhost:11435' })
const response = await ollama.chat({
model: 'llama3',
messages: [{ role: 'user', content: 'Continue where we left off' }]
})OpenAI-compatible clients
# Any OpenAI-compatible SDK — point base_url at :11435/v1
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11435/v1", api_key="ollama")
response = client.chat.completions.create(
model="llama3",
messages=[{"role": "user", "content": "What did we discuss last week?"}]
)
# Memory context was automatically injectedOpen WebUI
In Open WebUI settings, change the Ollama API URL from http://localhost:11434 to http://localhost:11435. All conversations will automatically gain memory.
How injection works
When the proxy receives a /api/chat, /api/generate, or /v1/chat/completions request, it runs the following pipeline:
Running in production
Use PM2 or systemd to keep the proxy running:
# PM2 pm2 start /path/to/engram/adapters/ollama/dist/proxy.js \ --name engram-ollama \ --env OLLAMA_PROXY_PORT=11435 # Verify curl http://localhost:11435/api/tags