Multi-tenancy
The framework is multi-tenant by default. One Agent instance
serves N users; the framework partitions everything.
from loomflow import Agent
agent = Agent(
"...",
model="claude-opus-4-7",
memory="postgres://...",
budget=StandardBudget(BudgetConfig(per_user_max_tokens=100_000)),
permissions=PerUserPermissions(policies=..., default=...),
audit_log=FileAuditLog("./audit.jsonl"),
)
# Same Agent, different user — partitioned automatically
await agent.run("...", user_id="alice", session_id="conv_42")
await agent.run("...", user_id="bob", session_id="conv_99")Alice’s memory never surfaces in Bob’s recall. Alice’s tokens count
against Alice’s budget. Audit entries land with user_id="alice".
Permissions check against Alice’s policy.
What partitions by user_id
| Primitive | Partition behaviour |
|---|---|
Memory | Episodes, working blocks, and facts all carry user_id. Recalls are scoped. |
Memory.facts | Bi-temporal queries scoped to the user. |
StandardBudget | Tracks tokens / cost / wall-clock both globally and per user_id. |
PerUserPermissions | Routes the policy decision per user_id. |
FileAuditLog | Every entry has a top-level user_id. HMAC covers it. |
OTelTelemetry metrics | Every metric tagged with user_id. |
RunContext (visible to tools) | get_run_context().user_id. |
approval_handler | Receives the live user_id so Slack approvals can be addressed correctly. |
Memory partition contract
For every memory backend (InMemory / SQLite / Chroma / Postgres / Redis):
- Recalls are scoped by
user_id. Amemory.recall(query, user_id="alice")will never return Bob’s episodes. Even if Bob’s episode is more semantically similar. - Writes carry
user_id.memory.remember(episode)reads the activeRunContext.user_id(or the explicit kwarg) and stores it. - GDPR ops are scoped.
memory.forget(user_id="alice")deletes only Alice’s data.memory.export(user_id="alice")returns only Alice’s data. - The anonymous bucket. When
user_id=None, data lands in a reserved sentinel bucket (__jeeves_anon_user__). It’s a real partition just like any other; you canforget(user_id=None)for the anonymous data. Attempting to use the sentinel as a realuser_idraisesValueError. Defense against impersonation.
Bounded in-process state
StandardBudget._by_user and InMemoryMemory._blocks hold per-user
state in-process. Without bounds, a runaway tenant or one-shot
user_id explosion grows the dict until OOM. Both default to
bounded state with LRU + idle-TTL eviction:
from loomflow import InMemoryMemory
from loomflow.governance.budget import BudgetConfig, StandardBudget
# Defaults: 100k users, 24h idle TTL.
budget = StandardBudget(BudgetConfig())
memory = InMemoryMemory()
# Tune:
budget = StandardBudget(
BudgetConfig(),
max_users=10_000,
user_idle_ttl_seconds=3_600,
)
# Disable bounding (single-tenant or fixed N tenants):
budget = StandardBudget(BudgetConfig(), max_users=None, user_idle_ttl_seconds=None)Verifying isolation under load
bench/multi_tenant.py simulates N concurrent users × M turns
through one shared Agent and asserts:
- p50 / p99 latency stays bounded.
- RSS growth stays linear in tenants.
- Zero isolation violations. Alice’s data never surfaces in Bob’s recall.
- Zero budget mismatches. Per-user totals match the sum of charged calls.
python bench/multi_tenant.py --users 500 --turns 5A smoke-test variant runs in CI as tests/test_multi_tenant_load.py.
See Load testing.
Cross-tenant traps to avoid
Even though the framework partitions automatically, two patterns defeat it:
Bypassing user_id in a custom Memory
If you write a custom Memory impl, accept user_id= in every
method and use it as the partition key. The framework forwards it on
every call. Skipping it means cross-tenant leaks.
Caching across tenants in tools
# WRONG — single dict, no partition
cache = {}
@tool
async def expensive_query(q: str) -> str:
if q in cache:
return cache[q]
cache[q] = await api.query(q)
return cache[q]If the cache key doesn’t include user_id, two tenants see each
other’s responses. Use a (user_id, q) key:
from loomflow import get_run_context
@tool
async def expensive_query(q: str) -> str:
ctx = get_run_context()
key = (ctx.user_id, q)
if key in cache:
return cache[key]
cache[key] = await api.query(q, user_id=ctx.user_id)
return cache[key]user_id is a typed primitive, not a free-form string. The
framework treats it consistently across all primitives. But if you
plumb tenant scope through your own structures (caches, DB queries,
external API calls), be deliberate about including user_id in
every key. The framework can’t help with cross-tenant traps in your
application code.