Production hardening

Everything here is opt-in: the framework’s defaults already work for single-tenant scripts and demos. The settings below are what you flip on when you’re putting the agent in front of real users on real infrastructure.

The whole section is organised around a single theme. Multi-tenancy without footguns. One shared Agent (and one Memory, one Budget, one AuditLog) backing N users requires more than just passing user_id= everywhere; it requires bounded state, per-user caps, scoped permissions, observable extraction, and pluggable secret resolution.

Tokens, cost, wall-clock. Globally and per user_id.Per-user budget caps Route policy decisions per user_id. Staff in BYPASS, users gated.Per-user permissions Slack / ticket / human gate for Decision.ask_(...). No silent allows.Approval handlers LRU + idle-TTL for StandardBudget._by_user and InMemoryMemory._blocks.Bounded in-process state Latency, failure rate, and cost attribution for the default-on extractor.Auto-extract observability Pluggable Secrets protocol. Env / Dict / Vault / your custom backend.Secrets resolution Top-level user_id on every entry; HMAC covers it.Audit log attribution bench/multi_tenant.py. N users × M turns, isolation + budget assertions.Load testing isolation What’s additive, what’s bounded by default, what to migrate.Upgrading 0.9 → 0.10

What you turn on, listed

Concern	Opt-in
Per-user quota	`BudgetConfig(per_user_max_*)`
Tenant-specific permissions	`PerUserPermissions(policies=, default=)`
Human-in-the-loop for destructive tools	`Agent(approval_handler=...)`
Bounded in-process state	Defaults active; tune `max_users` / `user_idle_ttl_seconds`
Vault-backed API keys	`Agent(tuning=Tuning(secrets=VaultSecrets(...)))`
Auto-extract metrics	`Agent(telemetry=OTelTelemetry(...))`
Per-user audit	`FileAuditLog(...)` (attribution is automatic)
Load-test isolation	`bench/multi_tenant.py`

Pair this with the production checklist for the broader operational concerns (durable runtime, persistent memory, sandbox, etc.).