Skip to Content
DocsModelsRetryPolicy + error taxonomy

RetryPolicy + error taxonomy

Network model adapters (AnthropicModel / OpenAIModel / LiteLLMModel) auto-wrap their stream() calls in a typed retry policy. You don’t write try / except for transient failures , the loop retries with exponential backoff and gives up cleanly on permanent errors.

Default policy

3 attempts · 1s → 2s → 4s exponential backoff capped at 30s · ±10% jitter · honours provider Retry-After

Roughly equivalent to:

from loomflow import Tuning from loomflow.governance import RetryPolicy agent = Agent( "...", model="claude-opus-4-7", tuning=Tuning(retry_policy=RetryPolicy.default()), )

For most users this is invisible. The agent just keeps working through provider blips.

Tuning the policy

from loomflow import Tuning from loomflow.governance import RetryPolicy # Aggressive — tolerates long provider outages agent = Agent("...", tuning=Tuning(retry_policy=RetryPolicy.aggressive())) # Disabled — handle errors yourself agent = Agent("...", tuning=Tuning(retry_policy=RetryPolicy.disabled())) # Custom agent = Agent("...", tuning=Tuning(retry_policy=RetryPolicy( max_attempts=5, base_delay_s=2.0, max_delay_s=60.0, jitter=0.2, honor_retry_after=True, )))
FieldDefaultEffect
max_attempts3Total attempts including the first.
base_delay_s1.0First backoff.
max_delay_s30.0Cap on the exponential growth.
jitter0.1±jitter fraction applied to each delay.
honor_retry_afterTrueUse the provider’s Retry-After header when present.

Error taxonomy

Adapters classify provider exceptions into a typed hierarchy:

LoomError ├── ModelError (base for any model issue) │ ├── TransientModelError (retried) │ │ ├── RateLimitError (429; retry-after honored) │ │ └── ... (5xx, network timeouts, connection resets) │ ├── PermanentModelError (NOT retried) │ │ ├── AuthenticationError (401) │ │ ├── InvalidRequestError (400 — bad prompt, missing field) │ │ ├── ContentFilterError (provider safety filter) │ │ └── ... │ └── OutputValidationError (output_schema= validation failed) └── ...

classify_model_error(exc) is the helper the adapters use; you can call it from your own code:

from loomflow.governance import classify_model_error try: ... except Exception as exc: typed = classify_model_error(exc) if isinstance(typed, RateLimitError): ...

What gets retried

ErrorRetried?
RateLimitError (429)yes, with Retry-After honored
TransientModelError (5xx, network blips, timeouts)yes
PermanentModelError (401, 400, content filter)no. Fail fast
OutputValidationError (schema validation failed)no. Handled separately

For OutputValidationError the framework follows a different path: it appends the validation message to the conversation and asks the model to retry, up to a separate output_schema_max_retries limit.

What about tool errors?

Tool errors are not retried at the model layer. Each tool’s exception is captured in its ToolResult(ok=False, error=...); the model sees the error in the next turn and can decide whether to retry. To retry at the framework level, wrap the tool body yourself:

@tool async def fetch(url: str) -> str: """Fetch a URL with up to 3 retries.""" for attempt in range(3): try: return await client.get(url) except httpx.NetworkError: if attempt == 2: raise await asyncio.sleep(2**attempt)

Observability

Retry attempts emit structured logs at WARN level:

WARN loomflow.model.retrying: retrying after RateLimitError; attempt 2/3, sleeping 4.2s (provider Retry-After=4.0).

When telemetry=OTelTelemetry(...) is wired, the retry count is attached to the loom.model.stream span as the loom.model.retries attribute.

Don’t double-retry. If you’ve configured your provider client with its own max_retries=3, set it to max_retries=0 and let the framework’s RetryPolicy own the retry loop. Otherwise you compound 3×3 = 9 attempts on a single call and the user-visible latency explodes.

Last updated on