Skip to Content
DocsModelsReasoning effort

Reasoning effort (0.9.36+)

Every lab ships a “think harder before answering” knob under a different name and shape. OpenAI’s reasoning_effort, Anthropic’s adaptive thinking with output_config.effort, older Claude Sonnets’ thinking.budget_tokens integer, LiteLLM’s normalized passthrough, Gemini’s own integer budget. Loom unifies all of them behind one enum:

effort = "minimal" | "low" | "medium" | "high" | "xhigh" | "max"

Pass it once; the framework picks the right provider-native shape for the model you’re talking to.

Quick start

from loomflow import Agent # Agent-level default. Every run thinks medium-hard. agent = Agent("...", model="claude-opus-4-7", effort="medium") # Per-call override. Wins over the agent default for this run. result = await agent.run("A hard reasoning question", effort="high")

Two ways to wire it on an Agent

Pick whichever reads better at your call site. They do the same thing.

Explicit kwargs, good for short inline construction:

agent = Agent("...", model="claude-opus-4-7", effort="high")

Dict form, good when you want everything model-related in one place:

agent = Agent( "...", model={ "name": "claude-opus-4-7", "effort": "high", "strict_effort": True, }, )

Same shape philosophy as audit_log={...}: one parameter, structured config. The framework normalises both forms to the same internal state, so they’re interchangeable.

If both are present, the explicit top-level kwarg wins. That’s useful when you’ve got a shared config dict but want to flip the dial per environment:

agent = Agent( "...", model={"name": "claude-opus-4-7", "effort": "low"}, effort="high", # this wins; the dict's "low" is shadowed )

Where each value lands

effortOpenAI (o1/o3/o4/GPT-5)Anthropic Opus 4.7Anthropic 4.6 (Opus/Sonnet)Anthropic legacy (Sonnet 3.7/4/4.5)LiteLLMGemini
minimalreasoning_effort="minimal"output_config.effort="minimal"output_config.effort="low"thinking.budget_tokens=1024reasoning_effort="minimal"min budget
low="low"="low"="low"=2048="low"low budget
medium="medium"="medium"="medium"=4096="medium"mid budget
high="high"="high"="high"=8192="high"high budget
xhighclamped to "high"="xhigh"clamped to "high"=16384clamped to "high"max budget
maxclamped to "high"="max"clamped to "high"=32768clamped to "high"max budget

Anthropic Opus 4.7 is the only regime that honours xhigh and max natively. Everyone else clamps to their highest legal value. That’s deliberate: xhigh / max exist on the enum so code that targets Opus 4.7 doesn’t need a different vocabulary, and code that targets the rest still runs (just at the provider’s ceiling).

Models that don’t support reasoning effort at all (Haiku, older Claudes, base GPT-4) drop the kwarg and emit a one-time warning per (model, effort) pair.

Constructor + run signatures

class Agent: def __init__( self, instructions: str, *, model: str | Model, effort: str | None = None, strict_effort: bool = False, # ... other params ) -> None: ... async def run( self, prompt: str, *, effort: str | None = None, # ... other params ) -> RunResult: ...

Resolution order (most specific wins):

  1. agent.run(..., effort=). Per-call override
  2. Agent(effort=). Agent default
  3. None. Provider’s own default (usually “medium” for reasoning models, no thinking for non-reasoning)

strict_effort is agent-level only. There’s no per-call override. Whether a model can honour effort is a property of the adapter, not of any single call.

Warn-and-drop vs strict mode

By default, wiring effort= to a model that can’t honour it is a soft fallback. The kwarg is dropped, the model runs as it would have without effort, and a UserWarning is emitted exactly once per (model, effort) pair:

agent = Agent("...", model="claude-haiku-3-5", effort="high") await agent.run("hi") # UserWarning: Model 'claude-haiku-3-5' does not support # effort='high': Anthropic supports thinking only on Sonnet 3.7+... # (The kwarg has been dropped for this and future calls; # emitted once per (model, effort) pair.)

This is the right default for production. A model swap shouldn’t break a working agent. But it’s the wrong default during development, where silently dropping the dial is exactly the bug you want surfaced.

Opt into strict mode to make the same situation a hard error:

from loomflow import Agent from loomflow.model._effort import EffortNotSupportedError agent = Agent( "...", model="claude-haiku-3-5", effort="high", strict_effort=True, # raise instead of warning ) try: await agent.run("hi") except EffortNotSupportedError as exc: print(exc) # Model 'claude-haiku-3-5' does not support effort='high': # ... Pass strict_effort=False on the Agent to downgrade # this to a warning + drop.

Wire strict_effort=True in CI / pre-prod so typos and model mismatches surface immediately. Wire strict_effort=False (the default) in production so a vendor outage that pushes you onto a fallback model doesn’t take the whole agent down.

Provider-by-provider detail

OpenAI (o1 / o3 / o4 / GPT-5)

Maps directly to OpenAI’s native reasoning_effort request kwarg. Model gating is prefix-matched against o1, o3, o4, gpt-5, so future minor versions (e.g. o3-mini-2025-…) match automatically.

OpenAI’s enum doesn’t accept xhigh or max, so those clamp to "high". Everything else passes through unchanged.

Base GPT-4 and GPT-4.1 don’t honour reasoning effort. They drop with a warning.

Anthropic. Three regimes

The Anthropic adapter inspects the model name and picks one of three regimes:

RegimeModelsRequest shape
4.7 (adaptive-only)Opus 4.7, Mythosthinking={"type": "adaptive"} + output_config={"effort": "<enum>"}. Full enum including xhigh / max
4.6 (adaptive + enum)Opus 4.6, Sonnet 4.6Same shape, enum clamped (xhigh / max"high")
legacy (budget_tokens)Sonnet 3.7, Sonnet 4, Sonnet 4.5, Opus 4.5thinking={"type": "enabled", "budget_tokens": <int>}. Integer budget from the effort dial

Haiku and older Claudes fall through to warn-and-drop / strict raise.

The legacy budget mapping is minimal=1024, low=2048, medium=4096, high=8192, xhigh=16384, max=32768. These aren’t provider-documented constants. They’re defensible defaults tuned to match the perceived effort tiers. If you need an exact integer for a benchmark, configure the adapter directly with thinking={"type": "enabled", "budget_tokens": N} instead of going through effort=.

LiteLLM

LiteLLM normalises every supported provider to OpenAI’s reasoning_effort shape, so Loom forwards the dial via {"reasoning_effort": ...} and lets LiteLLM handle the provider-specific translation downstream. xhigh / max clamp to "high" (the OpenAI ceiling).

This means any LiteLLM-supported reasoning model. Vertex AI Gemini, Mistral reasoning models, etc. Picks up effort automatically without per-adapter wiring.

Gemini (via LiteLLM)

Routes through LiteLLM’s reasoning_effort normalisation. For direct Gemini integration without LiteLLM, the integer thinking_budget kwarg is the native shape. Pass it to the provider client directly if you need precise control.

Example 14. See the dial actually do something

The framework ships examples/14_effort_dial.py running the same hard reasoning question (the 3L / 5L jug puzzle) at every tier on Claude Opus 4.7 and printing token usage at each:

effort=low tokens= 127+ 312 turns=1 effort=medium tokens= 127+ 564 turns=1 effort=high tokens= 127+1148 turns=1 effort=xhigh tokens= 127+2387 turns=1

The output token count is the visible signal. Higher effort spends more on internal reasoning, which is what you’re paying for.

Effort isn’t a magic correctness dial. xhigh makes the model spend more thinking budget; it does not make it solve a problem that’s outside its capability. Use higher effort for problems where the model’s first-attempt answer is close-but-wrong (multi-step arithmetic, constraint satisfaction, code review). Not for out-of-distribution requests where no amount of thinking will help.

See also

Last updated on