Rate Limiting¶

DeepFabric includes intelligent retry handling for API rate limits across all LLM providers.

Overview¶

The system provides:

Provider-aware defaults for OpenAI, Anthropic, Gemini, Ollama, OpenRouter
Exponential backoff with jitter to prevent thundering herd
Retry-after header support when providers specify wait times
Fail-fast detection for non-retryable errors (e.g., daily quota exhaustion)

Provider Defaults¶

Each provider has optimized defaults:

Provider	Max Retries	Base Delay	Max Delay
OpenAI	5	1.0s	60s
Anthropic	5	1.0s	60s
Gemini	5	2.0s	120s
Ollama	2	0.5s	5s
OpenRouter	5	1.0s	60s

Configuration¶

Using Defaults¶

Omit rate limiting config to use provider defaults:

data_engine:
  provider: "gemini"
  model: "gemini-2.0-flash-exp"
  # Rate limiting uses defaults automatically

Custom Configuration¶

data_engine:
  provider: "gemini"
  model: "gemini-2.0-flash-exp"

  rate_limit:
    max_retries: 7
    base_delay: 3.0
    max_delay: 180.0
    backoff_strategy: "exponential_jitter"
    exponential_base: 2.0
    jitter: true
    respect_retry_after: true

Options Reference¶

Option	Type	Default	Description
`max_retries`	int	5	Maximum retry attempts (0-20)
`base_delay`	float	1.0	Initial delay in seconds (0.1-60)
`max_delay`	float	60.0	Maximum delay cap in seconds (1-300)
`backoff_strategy`	string	exponential_jitter	See strategies below
`exponential_base`	float	2.0	Multiplier for exponential backoff (1.1-10)
`jitter`	bool	true	Add randomization to prevent synchronized retries
`respect_retry_after`	bool	true	Honor server-specified wait times

Backoff Strategies¶

exponential_jitter (recommended): delay = base * (exp_base ^ attempt) ± 25%
exponential: delay = base * (exp_base ^ attempt)
linear: delay = base * attempt
constant: Always use base_delay

Provider-Specific Behavior¶

OpenAI¶

Monitors x-ratelimit-* headers and respects retry-after:

rate_limit:
  max_retries: 5
  respect_retry_after: true

Anthropic¶

Uses token bucket algorithm with RPM/ITPM/OTPM limits:

rate_limit:
  max_retries: 5
  base_delay: 1.0

Gemini¶

No retry-after header. Detects daily quota exhaustion and fails fast:

rate_limit:
  max_retries: 5
  base_delay: 2.0      # Higher default
  max_delay: 120.0     # Longer for daily quota

Daily Quota: When Gemini's RPD (requests per day) limit is hit, the system fails fast rather than retrying since quota resets at midnight Pacific time.

Ollama¶

Minimal retries for local deployment:

rate_limit:
  max_retries: 2
  base_delay: 0.5
  max_delay: 5.0

Python API¶

from deepfabric import DataSetGenerator

generator = DataSetGenerator(
    generation_system_prompt="You are a helpful assistant.",
    provider="gemini",
    model_name="gemini-2.0-flash-exp",
    rate_limit={
        "max_retries": 7,
        "base_delay": 3.0,
        "max_delay": 180.0,
        "backoff_strategy": "exponential_jitter",
    }
)

Retry Behavior¶

Retries on: - 429 (rate limit) - 500, 502, 503, 504 (server errors) - Timeout, connection, network errors

Does NOT retry: - 4xx errors (except 429) - Authentication failures - Daily quota exhaustion (Gemini)

Best Practices¶

High Volume (Paid Tier)¶

rate_limit:
  max_retries: 3
  base_delay: 0.5
  max_delay: 10.0

Free Tier (Aggressive Limits)¶

rate_limit:
  max_retries: 10
  base_delay: 5.0
  max_delay: 300.0

Combined with Batch Size¶

dataset:
  creation:
    batch_size: 2    # Smaller batches reduce rate limit pressure
    num_steps: 20

data_engine:
  rate_limit:
    max_retries: 5
    base_delay: 2.0

Troubleshooting¶

Still Hitting Rate Limits¶

Reduce batch_size in dataset creation
Increase base_delay
Check your provider tier/quota

Daily Quota Exhausted (Gemini)¶

The system detects this and fails immediately:

ERROR - Failing fast for gemini: 429 RESOURCE_EXHAUSTED (daily_quota_exhausted=True)

Options: - Wait until midnight Pacific time - Upgrade Gemini tier - Switch providers temporarily

Too Many Retries¶

Reduce max_retries to fail faster:

rate_limit:
  max_retries: 3