Skip to content

Rate Limiting

DeepFabric includes intelligent retry handling for API rate limits across all LLM providers.

Overview

The system provides:

  • Provider-aware defaults for OpenAI, Anthropic, Gemini, Ollama, OpenRouter
  • Exponential backoff with jitter to prevent thundering herd
  • Retry-after header support when providers specify wait times
  • Fail-fast detection for non-retryable errors (e.g., daily quota exhaustion)

Provider Defaults

Each provider has optimized defaults:

Provider Max Retries Base Delay Max Delay
OpenAI 5 1.0s 60s
Anthropic 5 1.0s 60s
Gemini 5 2.0s 120s
Ollama 2 0.5s 5s
OpenRouter 5 1.0s 60s

Configuration

Using Defaults

Omit rate limiting config to use provider defaults:

data_engine:
  provider: "gemini"
  model: "gemini-2.0-flash-exp"
  # Rate limiting uses defaults automatically

Custom Configuration

data_engine:
  provider: "gemini"
  model: "gemini-2.0-flash-exp"

  rate_limit:
    max_retries: 7
    base_delay: 3.0
    max_delay: 180.0
    backoff_strategy: "exponential_jitter"
    exponential_base: 2.0
    jitter: true
    respect_retry_after: true

Options Reference

Option Type Default Description
max_retries int 5 Maximum retry attempts (0-20)
base_delay float 1.0 Initial delay in seconds (0.1-60)
max_delay float 60.0 Maximum delay cap in seconds (1-300)
backoff_strategy string exponential_jitter See strategies below
exponential_base float 2.0 Multiplier for exponential backoff (1.1-10)
jitter bool true Add randomization to prevent synchronized retries
respect_retry_after bool true Honor server-specified wait times

Backoff Strategies

  • exponential_jitter (recommended): delay = base * (exp_base ^ attempt) ± 25%
  • exponential: delay = base * (exp_base ^ attempt)
  • linear: delay = base * attempt
  • constant: Always use base_delay

Provider-Specific Behavior

OpenAI

Monitors x-ratelimit-* headers and respects retry-after:

rate_limit:
  max_retries: 5
  respect_retry_after: true

Anthropic

Uses token bucket algorithm with RPM/ITPM/OTPM limits:

rate_limit:
  max_retries: 5
  base_delay: 1.0

Gemini

No retry-after header. Detects daily quota exhaustion and fails fast:

rate_limit:
  max_retries: 5
  base_delay: 2.0      # Higher default
  max_delay: 120.0     # Longer for daily quota

Daily Quota: When Gemini's RPD (requests per day) limit is hit, the system fails fast rather than retrying since quota resets at midnight Pacific time.

Ollama

Minimal retries for local deployment:

rate_limit:
  max_retries: 2
  base_delay: 0.5
  max_delay: 5.0

Python API

from deepfabric import DataSetGenerator

generator = DataSetGenerator(
    generation_system_prompt="You are a helpful assistant.",
    provider="gemini",
    model_name="gemini-2.0-flash-exp",
    rate_limit={
        "max_retries": 7,
        "base_delay": 3.0,
        "max_delay": 180.0,
        "backoff_strategy": "exponential_jitter",
    }
)

Retry Behavior

Retries on: - 429 (rate limit) - 500, 502, 503, 504 (server errors) - Timeout, connection, network errors

Does NOT retry: - 4xx errors (except 429) - Authentication failures - Daily quota exhaustion (Gemini)

Best Practices

High Volume (Paid Tier)

rate_limit:
  max_retries: 3
  base_delay: 0.5
  max_delay: 10.0

Free Tier (Aggressive Limits)

rate_limit:
  max_retries: 10
  base_delay: 5.0
  max_delay: 300.0

Combined with Batch Size

dataset:
  creation:
    batch_size: 2    # Smaller batches reduce rate limit pressure
    num_steps: 20

data_engine:
  rate_limit:
    max_retries: 5
    base_delay: 2.0

Troubleshooting

Still Hitting Rate Limits

  1. Reduce batch_size in dataset creation
  2. Increase base_delay
  3. Check your provider tier/quota

Daily Quota Exhausted (Gemini)

The system detects this and fails immediately:

ERROR - Failing fast for gemini: 429 RESOURCE_EXHAUSTED (daily_quota_exhausted=True)

Options: - Wait until midnight Pacific time - Upgrade Gemini tier - Switch providers temporarily

Too Many Retries

Reduce max_retries to fail faster:

rate_limit:
  max_retries: 3