Reasoning effort (0.9.36+)

Every lab ships a “think harder before answering” knob under a different name and shape. OpenAI’s reasoning_effort, Anthropic’s adaptive thinking with output_config.effort, older Claude Sonnets’ thinking.budget_tokens integer, LiteLLM’s normalized passthrough, Gemini’s own integer budget. Loom unifies all of them behind one enum:


effort = "minimal" | "low" | "medium" | "high" | "xhigh" | "max"

Pass it once; the framework picks the right provider-native shape for the model you’re talking to.

Quick start


from loomflow import Agent
 
# Agent-level default. Every run thinks medium-hard.
agent = Agent("...", model="claude-opus-4-7", effort="medium")
 
# Per-call override. Wins over the agent default for this run.
result = await agent.run("A hard reasoning question", effort="high")

Two ways to wire it on an Agent

Pick whichever reads better at your call site. They do the same thing.

Explicit kwargs, good for short inline construction:


agent = Agent("...", model="claude-opus-4-7", effort="high")

Dict form, good when you want everything model-related in one place:


agent = Agent(
    "...",
    model={
        "name": "claude-opus-4-7",
        "effort": "high",
        "strict_effort": True,
    },
)

Same shape philosophy as audit_log={...}: one parameter, structured config. The framework normalises both forms to the same internal state, so they’re interchangeable.

If both are present, the explicit top-level kwarg wins. That’s useful when you’ve got a shared config dict but want to flip the dial per environment:


agent = Agent(
    "...",
    model={"name": "claude-opus-4-7", "effort": "low"},
    effort="high",   # this wins; the dict's "low" is shadowed
)

Where each value lands

effort	OpenAI (o1/o3/o4/GPT-5)	Anthropic Opus 4.7	Anthropic 4.6 (Opus/Sonnet)	Anthropic legacy (Sonnet 3.7/4/4.5)	LiteLLM	Gemini
`minimal`	`reasoning_effort="minimal"`	`output_config.effort="minimal"`	`output_config.effort="low"`	`thinking.budget_tokens=1024`	`reasoning_effort="minimal"`	min budget
`low`	`="low"`	`="low"`	`="low"`	`=2048`	`="low"`	low budget
`medium`	`="medium"`	`="medium"`	`="medium"`	`=4096`	`="medium"`	mid budget
`high`	`="high"`	`="high"`	`="high"`	`=8192`	`="high"`	high budget
`xhigh`	clamped to `"high"`	`="xhigh"`	clamped to `"high"`	`=16384`	clamped to `"high"`	max budget
`max`	clamped to `"high"`	`="max"`	clamped to `"high"`	`=32768`	clamped to `"high"`	max budget

Anthropic Opus 4.7 is the only regime that honours xhigh and max natively. Everyone else clamps to their highest legal value. That’s deliberate: xhigh / max exist on the enum so code that targets Opus 4.7 doesn’t need a different vocabulary, and code that targets the rest still runs (just at the provider’s ceiling).

Models that don’t support reasoning effort at all (Haiku, older Claudes, base GPT-4) drop the kwarg and emit a one-time warning per (model, effort) pair.

Constructor + run signatures


class Agent:
    def __init__(
        self,
        instructions: str,
        *,
        model: str | Model,
        effort: str | None = None,
        strict_effort: bool = False,
        # ... other params
    ) -> None: ...
 
    async def run(
        self,
        prompt: str,
        *,
        effort: str | None = None,
        # ... other params
    ) -> RunResult: ...

Resolution order (most specific wins):

agent.run(..., effort=). Per-call override
Agent(effort=). Agent default
None. Provider’s own default (usually “medium” for reasoning models, no thinking for non-reasoning)

strict_effort is agent-level only. There’s no per-call override. Whether a model can honour effort is a property of the adapter, not of any single call.

Warn-and-drop vs strict mode

By default, wiring effort= to a model that can’t honour it is a soft fallback. The kwarg is dropped, the model runs as it would have without effort, and a UserWarning is emitted exactly once per (model, effort) pair:


agent = Agent("...", model="claude-haiku-3-5", effort="high")
await agent.run("hi")
# UserWarning: Model 'claude-haiku-3-5' does not support
# effort='high': Anthropic supports thinking only on Sonnet 3.7+...
# (The kwarg has been dropped for this and future calls;
#  emitted once per (model, effort) pair.)

This is the right default for production. A model swap shouldn’t break a working agent. But it’s the wrong default during development, where silently dropping the dial is exactly the bug you want surfaced.

Opt into strict mode to make the same situation a hard error:


from loomflow import Agent
from loomflow.model._effort import EffortNotSupportedError
 
agent = Agent(
    "...",
    model="claude-haiku-3-5",
    effort="high",
    strict_effort=True,    # raise instead of warning
)
 
try:
    await agent.run("hi")
except EffortNotSupportedError as exc:
    print(exc)
    # Model 'claude-haiku-3-5' does not support effort='high':
    # ... Pass strict_effort=False on the Agent to downgrade
    # this to a warning + drop.

Wire strict_effort=True in CI / pre-prod so typos and model mismatches surface immediately. Wire strict_effort=False (the default) in production so a vendor outage that pushes you onto a fallback model doesn’t take the whole agent down.

Provider-by-provider detail

OpenAI (o1 / o3 / o4 / GPT-5)

Maps directly to OpenAI’s native reasoning_effort request kwarg. Model gating is prefix-matched against o1, o3, o4, gpt-5, so future minor versions (e.g. o3-mini-2025-…) match automatically.

OpenAI’s enum doesn’t accept xhigh or max, so those clamp to "high". Everything else passes through unchanged.

Base GPT-4 and GPT-4.1 don’t honour reasoning effort. They drop with a warning.

Anthropic. Three regimes

The Anthropic adapter inspects the model name and picks one of three regimes:

Regime	Models	Request shape
4.7 (adaptive-only)	Opus 4.7, Mythos	`thinking={"type": "adaptive"}` + `output_config={"effort": "<enum>"}`. Full enum including `xhigh` / `max`
4.6 (adaptive + enum)	Opus 4.6, Sonnet 4.6	Same shape, enum clamped (`xhigh` / `max` → `"high"`)
legacy (budget_tokens)	Sonnet 3.7, Sonnet 4, Sonnet 4.5, Opus 4.5	`thinking={"type": "enabled", "budget_tokens": <int>}`. Integer budget from the effort dial

Haiku and older Claudes fall through to warn-and-drop / strict raise.

The legacy budget mapping is minimal=1024, low=2048, medium=4096, high=8192, xhigh=16384, max=32768. These aren’t provider-documented constants. They’re defensible defaults tuned to match the perceived effort tiers. If you need an exact integer for a benchmark, configure the adapter directly with thinking={"type": "enabled", "budget_tokens": N} instead of going through effort=.

LiteLLM

LiteLLM normalises every supported provider to OpenAI’s reasoning_effort shape, so Loom forwards the dial via {"reasoning_effort": ...} and lets LiteLLM handle the provider-specific translation downstream. xhigh / max clamp to "high" (the OpenAI ceiling).

This means any LiteLLM-supported reasoning model. Vertex AI Gemini, Mistral reasoning models, etc. Picks up effort automatically without per-adapter wiring.

Gemini (via LiteLLM)

Routes through LiteLLM’s reasoning_effort normalisation. For direct Gemini integration without LiteLLM, the integer thinking_budget kwarg is the native shape. Pass it to the provider client directly if you need precise control.

Example 14. See the dial actually do something

The framework ships examples/14_effort_dial.py running the same hard reasoning question (the 3L / 5L jug puzzle) at every tier on Claude Opus 4.7 and printing token usage at each:


effort=low     tokens=  127+ 312    turns=1
effort=medium  tokens=  127+ 564    turns=1
effort=high    tokens=  127+1148    turns=1
effort=xhigh   tokens=  127+2387    turns=1

The output token count is the visible signal. Higher effort spends more on internal reasoning, which is what you’re paying for.

Effort isn’t a magic correctness dial. xhigh makes the model spend more thinking budget; it does not make it solve a problem that’s outside its capability. Use higher effort for problems where the model’s first-attempt answer is close-but-wrong (multi-step arithmetic, constraint satisfaction, code review). Not for out-of-distribution requests where no amount of thinking will help.