Multi-tenancy

The framework is multi-tenant by default. One Agent instance serves N users; the framework partitions everything.


from loomflow import Agent
 
agent = Agent(
    "...",
    model="claude-opus-4-7",
    memory="postgres://...",
    budget=StandardBudget(BudgetConfig(per_user_max_tokens=100_000)),
    permissions=PerUserPermissions(policies=..., default=...),
    audit_log=FileAuditLog("./audit.jsonl"),
)
 
# Same Agent, different user — partitioned automatically
await agent.run("...", user_id="alice", session_id="conv_42")
await agent.run("...", user_id="bob",   session_id="conv_99")

Alice’s memory never surfaces in Bob’s recall. Alice’s tokens count against Alice’s budget. Audit entries land with user_id="alice". Permissions check against Alice’s policy.

What partitions by `user_id`

Primitive	Partition behaviour
`Memory`	Episodes, working blocks, and facts all carry `user_id`. Recalls are scoped.
`Memory.facts`	Bi-temporal queries scoped to the user.
`StandardBudget`	Tracks tokens / cost / wall-clock both globally and per `user_id`.
`PerUserPermissions`	Routes the policy decision per `user_id`.
`FileAuditLog`	Every entry has a top-level `user_id`. HMAC covers it.
`OTelTelemetry` metrics	Every metric tagged with `user_id`.
`RunContext` (visible to tools)	`get_run_context().user_id`.
`approval_handler`	Receives the live `user_id` so Slack approvals can be addressed correctly.

Memory partition contract

For every memory backend (InMemory / SQLite / Chroma / Postgres / Redis):

Recalls are scoped by user_id. A memory.recall(query, user_id="alice") will never return Bob’s episodes. Even if Bob’s episode is more semantically similar.
Writes carry user_id. memory.remember(episode) reads the active RunContext.user_id (or the explicit kwarg) and stores it.
GDPR ops are scoped. memory.forget(user_id="alice") deletes only Alice’s data. memory.export(user_id="alice") returns only Alice’s data.
The anonymous bucket. When user_id=None, data lands in a reserved sentinel bucket (__jeeves_anon_user__). It’s a real partition just like any other; you can forget(user_id=None) for the anonymous data. Attempting to use the sentinel as a real user_id raises ValueError. Defense against impersonation.

Bounded in-process state

StandardBudget._by_user and InMemoryMemory._blocks hold per-user state in-process. Without bounds, a runaway tenant or one-shot user_id explosion grows the dict until OOM. Both default to bounded state with LRU + idle-TTL eviction:


from loomflow import InMemoryMemory
from loomflow.governance.budget import BudgetConfig, StandardBudget
 
# Defaults: 100k users, 24h idle TTL.
budget = StandardBudget(BudgetConfig())
memory = InMemoryMemory()
 
# Tune:
budget = StandardBudget(
    BudgetConfig(),
    max_users=10_000,
    user_idle_ttl_seconds=3_600,
)
 
# Disable bounding (single-tenant or fixed N tenants):
budget = StandardBudget(BudgetConfig(), max_users=None, user_idle_ttl_seconds=None)

See Bounded in-process state.

Verifying isolation under load

bench/multi_tenant.py simulates N concurrent users × M turns through one shared Agent and asserts:

p50 / p99 latency stays bounded.
RSS growth stays linear in tenants.
Zero isolation violations. Alice’s data never surfaces in Bob’s recall.
Zero budget mismatches. Per-user totals match the sum of charged calls.


python bench/multi_tenant.py --users 500 --turns 5

A smoke-test variant runs in CI as tests/test_multi_tenant_load.py. See Load testing.

Cross-tenant traps to avoid

Even though the framework partitions automatically, two patterns defeat it:

Bypassing `user_id` in a custom Memory

If you write a custom Memory impl, accept user_id= in every method and use it as the partition key. The framework forwards it on every call. Skipping it means cross-tenant leaks.

Caching across tenants in tools


# WRONG — single dict, no partition
cache = {}
 
@tool
async def expensive_query(q: str) -> str:
    if q in cache:
        return cache[q]
    cache[q] = await api.query(q)
    return cache[q]

If the cache key doesn’t include user_id, two tenants see each other’s responses. Use a (user_id, q) key:


from loomflow import get_run_context
 
@tool
async def expensive_query(q: str) -> str:
    ctx = get_run_context()
    key = (ctx.user_id, q)
    if key in cache:
        return cache[key]
    cache[key] = await api.query(q, user_id=ctx.user_id)
    return cache[key]

user_id is a typed primitive, not a free-form string. The framework treats it consistently across all primitives. But if you plumb tenant scope through your own structures (caches, DB queries, external API calls), be deliberate about including user_id in every key. The framework can’t help with cross-tenant traps in your application code.

Multi-tenancy

What partitions by user_id