Integrations

Ollama Proxy

Engram ships a transparent Ollama proxy that sits between your Ollama client and the Ollama daemon. It intercepts every chat request, injects relevant memories into the system prompt, and forwards the enriched request to Ollama — all without any changes to your client code.

your app  ──:11435──▶  engram proxy  ──recall──▶  memory store
                          │ inject context
                          ▼
                   engram proxy  ──:11434──▶  ollama daemon

Start the proxy

The proxy is part of the @engram-ai-memory/adapter-ollama package. Start it alongside your Engram server:

# From the Engram repository root
node adapters/ollama/dist/proxy.js

# Output:
# Engram × Ollama Proxy
#   Listening:     http://localhost:11435
#   Ollama target: http://localhost:11434
#   Engram:        http://localhost:4901

ℹ

The Engram server must be running on port 4901 and Ollama on port 11434. The proxy listens on 11435 (one above Ollama's default).

Environment variables

Variable	Default	Description
OLLAMA_PROXY_PORT	11435	Proxy listen port
OLLAMA_TARGET	http://localhost:11434	Real Ollama daemon URL
ENGRAM_API	http://localhost:4901	Engram REST API URL
ENGRAM_MAX_TOKENS	1500	Max context tokens to inject
ENGRAM_TOOL_RETRY	true	Retry once when model misses a tool call

Configure your client

The proxy intercepts three endpoint paths: /api/chat, /api/generate (Ollama native), and /v1/chat/completions (OpenAI-compatible). Point any client at port 11435:

Environment variable

export OLLAMA_HOST=http://localhost:11435
ollama run llama3

Python (ollama-python)

import ollama

client = ollama.Client(host='http://localhost:11435')
response = client.chat(
    model='llama3',
    messages=[{'role': 'user', 'content': 'What did we discuss last week?'}]
)
# Engram silently injected relevant memories into the system prompt

JavaScript / TypeScript

import { Ollama } from 'ollama'

const ollama = new Ollama({ host: 'http://localhost:11435' })

const response = await ollama.chat({
  model: 'llama3',
  messages: [{ role: 'user', content: 'Continue where we left off' }]
})

OpenAI-compatible clients

# Any OpenAI-compatible SDK — point base_url at :11435/v1
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11435/v1", api_key="ollama")
response = client.chat.completions.create(
    model="llama3",
    messages=[{"role": "user", "content": "What did we discuss last week?"}]
)
# Memory context was automatically injected

Open WebUI

In Open WebUI settings, change the Ollama API URL from http://localhost:11434 to http://localhost:11435. All conversations will automatically gain memory.

How injection works

When the proxy receives a /api/chat, /api/generate, or /v1/chat/completions request, it runs the following pipeline:

Extract query

Takes the last user message from the request body.

Recall from Engram

Calls POST /api/recall with a 3-second timeout. If Engram is unavailable, passes through unchanged.

Inject context

For /api/chat: prepends or appends to the system message. For /api/generate: injects into the system field.

Forward to Ollama

Sends the enriched request to the real Ollama daemon and streams back the response.

Store response

After the response completes, stores the user query + assistant reply as an episodic memory (fire-and-forget).

✓

The proxy gracefully degrades — if Engram is down, requests pass through to Ollama unchanged. No errors, no latency penalty beyond the 3-second timeout.

Running in production

Use PM2 or systemd to keep the proxy running:

# PM2
pm2 start /path/to/engram/adapters/ollama/dist/proxy.js \
  --name engram-ollama \
  --env OLLAMA_PROXY_PORT=11435

# Verify
curl http://localhost:11435/api/tags

← PreviousMCP Integration Next →OpenClaw