Reasoning effort (0.9.36+)
Every lab ships a “think harder before answering” knob under a
different name and shape. OpenAI’s reasoning_effort, Anthropic’s
adaptive thinking with output_config.effort, older Claude Sonnets’
thinking.budget_tokens integer, LiteLLM’s normalized passthrough,
Gemini’s own integer budget. Loom unifies all of them behind one
enum:
effort = "minimal" | "low" | "medium" | "high" | "xhigh" | "max"Pass it once; the framework picks the right provider-native shape for the model you’re talking to.
Quick start
from loomflow import Agent
# Agent-level default. Every run thinks medium-hard.
agent = Agent("...", model="claude-opus-4-7", effort="medium")
# Per-call override. Wins over the agent default for this run.
result = await agent.run("A hard reasoning question", effort="high")Two ways to wire it on an Agent
Pick whichever reads better at your call site. They do the same thing.
Explicit kwargs, good for short inline construction:
agent = Agent("...", model="claude-opus-4-7", effort="high")Dict form, good when you want everything model-related in one place:
agent = Agent(
"...",
model={
"name": "claude-opus-4-7",
"effort": "high",
"strict_effort": True,
},
)Same shape philosophy as audit_log={...}: one parameter, structured
config. The framework normalises both forms to the same internal state,
so they’re interchangeable.
If both are present, the explicit top-level kwarg wins. That’s useful when you’ve got a shared config dict but want to flip the dial per environment:
agent = Agent(
"...",
model={"name": "claude-opus-4-7", "effort": "low"},
effort="high", # this wins; the dict's "low" is shadowed
)Where each value lands
| effort | OpenAI (o1/o3/o4/GPT-5) | Anthropic Opus 4.7 | Anthropic 4.6 (Opus/Sonnet) | Anthropic legacy (Sonnet 3.7/4/4.5) | LiteLLM | Gemini |
|---|---|---|---|---|---|---|
minimal | reasoning_effort="minimal" | output_config.effort="minimal" | output_config.effort="low" | thinking.budget_tokens=1024 | reasoning_effort="minimal" | min budget |
low | ="low" | ="low" | ="low" | =2048 | ="low" | low budget |
medium | ="medium" | ="medium" | ="medium" | =4096 | ="medium" | mid budget |
high | ="high" | ="high" | ="high" | =8192 | ="high" | high budget |
xhigh | clamped to "high" | ="xhigh" | clamped to "high" | =16384 | clamped to "high" | max budget |
max | clamped to "high" | ="max" | clamped to "high" | =32768 | clamped to "high" | max budget |
Anthropic Opus 4.7 is the only regime that honours xhigh and
max natively. Everyone else clamps to their highest legal
value. That’s deliberate: xhigh / max exist on the enum so code
that targets Opus 4.7 doesn’t need a different vocabulary, and code
that targets the rest still runs (just at the provider’s ceiling).
Models that don’t support reasoning effort at all (Haiku, older
Claudes, base GPT-4) drop the kwarg and emit a one-time warning per
(model, effort) pair.
Constructor + run signatures
class Agent:
def __init__(
self,
instructions: str,
*,
model: str | Model,
effort: str | None = None,
strict_effort: bool = False,
# ... other params
) -> None: ...
async def run(
self,
prompt: str,
*,
effort: str | None = None,
# ... other params
) -> RunResult: ...Resolution order (most specific wins):
agent.run(..., effort=). Per-call overrideAgent(effort=). Agent defaultNone. Provider’s own default (usually “medium” for reasoning models, no thinking for non-reasoning)
strict_effort is agent-level only. There’s no per-call
override. Whether a model can honour effort is a property of the
adapter, not of any single call.
Warn-and-drop vs strict mode
By default, wiring effort= to a model that can’t honour it is a
soft fallback. The kwarg is dropped, the model runs as it
would have without effort, and a UserWarning is emitted exactly
once per (model, effort) pair:
agent = Agent("...", model="claude-haiku-3-5", effort="high")
await agent.run("hi")
# UserWarning: Model 'claude-haiku-3-5' does not support
# effort='high': Anthropic supports thinking only on Sonnet 3.7+...
# (The kwarg has been dropped for this and future calls;
# emitted once per (model, effort) pair.)This is the right default for production. A model swap shouldn’t break a working agent. But it’s the wrong default during development, where silently dropping the dial is exactly the bug you want surfaced.
Opt into strict mode to make the same situation a hard error:
from loomflow import Agent
from loomflow.model._effort import EffortNotSupportedError
agent = Agent(
"...",
model="claude-haiku-3-5",
effort="high",
strict_effort=True, # raise instead of warning
)
try:
await agent.run("hi")
except EffortNotSupportedError as exc:
print(exc)
# Model 'claude-haiku-3-5' does not support effort='high':
# ... Pass strict_effort=False on the Agent to downgrade
# this to a warning + drop.Wire strict_effort=True in CI / pre-prod so typos and model
mismatches surface immediately. Wire strict_effort=False (the
default) in production so a vendor outage that pushes you onto a
fallback model doesn’t take the whole agent down.
Provider-by-provider detail
OpenAI (o1 / o3 / o4 / GPT-5)
Maps directly to OpenAI’s native reasoning_effort request kwarg.
Model gating is prefix-matched against o1, o3, o4, gpt-5,
so future minor versions (e.g. o3-mini-2025-…) match
automatically.
OpenAI’s enum doesn’t accept xhigh or max, so those clamp to
"high". Everything else passes through unchanged.
Base GPT-4 and GPT-4.1 don’t honour reasoning effort. They drop with a warning.
Anthropic. Three regimes
The Anthropic adapter inspects the model name and picks one of three regimes:
| Regime | Models | Request shape |
|---|---|---|
| 4.7 (adaptive-only) | Opus 4.7, Mythos | thinking={"type": "adaptive"} + output_config={"effort": "<enum>"}. Full enum including xhigh / max |
| 4.6 (adaptive + enum) | Opus 4.6, Sonnet 4.6 | Same shape, enum clamped (xhigh / max → "high") |
| legacy (budget_tokens) | Sonnet 3.7, Sonnet 4, Sonnet 4.5, Opus 4.5 | thinking={"type": "enabled", "budget_tokens": <int>}. Integer budget from the effort dial |
Haiku and older Claudes fall through to warn-and-drop / strict raise.
The legacy budget mapping is minimal=1024, low=2048,
medium=4096, high=8192, xhigh=16384, max=32768. These aren’t
provider-documented constants. They’re defensible defaults tuned to
match the perceived effort tiers. If you need an exact integer for a
benchmark, configure the adapter directly with thinking={"type": "enabled", "budget_tokens": N} instead of going through effort=.
LiteLLM
LiteLLM normalises every supported provider to OpenAI’s
reasoning_effort shape, so Loom forwards the dial via
{"reasoning_effort": ...} and lets LiteLLM handle the
provider-specific translation downstream. xhigh / max clamp to
"high" (the OpenAI ceiling).
This means any LiteLLM-supported reasoning model. Vertex AI Gemini, Mistral reasoning models, etc. Picks up effort automatically without per-adapter wiring.
Gemini (via LiteLLM)
Routes through LiteLLM’s reasoning_effort normalisation. For
direct Gemini integration without LiteLLM, the integer
thinking_budget kwarg is the native shape. Pass it to the
provider client directly if you need precise control.
Example 14. See the dial actually do something
The framework ships examples/14_effort_dial.py
running the same hard reasoning question (the 3L / 5L jug puzzle)
at every tier on Claude Opus 4.7 and printing token usage at each:
effort=low tokens= 127+ 312 turns=1
effort=medium tokens= 127+ 564 turns=1
effort=high tokens= 127+1148 turns=1
effort=xhigh tokens= 127+2387 turns=1The output token count is the visible signal. Higher effort spends more on internal reasoning, which is what you’re paying for.
Effort isn’t a magic correctness dial. xhigh makes the model
spend more thinking budget; it does not make it solve a problem
that’s outside its capability. Use higher effort for problems where
the model’s first-attempt answer is close-but-wrong (multi-step
arithmetic, constraint satisfaction, code review). Not for
out-of-distribution requests where no amount of thinking will help.
See also
Agent.effort. The constructor parameter reference- Providers. Which adapter handles which model
- RetryPolicy. Separate dial for transient errors