Skip to content

Rate Limiting

DeepFabric includes intelligent retry handling for API rate limits across all LLM providers.

Overview

The system provides:

  • Provider-aware defaults for OpenAI, Anthropic, Gemini, Ollama, OpenRouter
  • Exponential backoff with jitter to prevent thundering herd
  • Retry-after header support when providers specify wait times
  • Fail-fast detection for non-retryable errors (e.g., daily quota exhaustion)

Provider Defaults

Each provider has optimized defaults:

Provider Max Retries Base Delay Max Delay
OpenAI 5 1.0s 60s
Anthropic 5 1.0s 60s
Gemini 5 2.0s 120s
Ollama 2 0.5s 5s
OpenRouter 5 1.0s 60s

Configuration

Using Defaults

Omit rate limiting config to use provider defaults:

config.yaml
generation:
  llm:
    provider: "gemini"
    model: "gemini-2.0-flash-exp"
  # Rate limiting uses defaults automatically

Custom Configuration

config.yaml
generation:
  llm:
    provider: "gemini"
    model: "gemini-2.0-flash-exp"

  rate_limit:
    max_retries: 7
    base_delay: 3.0
    max_delay: 180.0
    backoff_strategy: "exponential_jitter"
    exponential_base: 2.0
    jitter: true
    respect_retry_after: true

Options Reference

Option Type Default Description
max_retries int 5 Maximum retry attempts (0-20)
base_delay float 1.0 Initial delay in seconds (0.1-60)
max_delay float 60.0 Maximum delay cap in seconds (1-300)
backoff_strategy string exponential_jitter See strategies below
exponential_base float 2.0 Multiplier for exponential backoff (1.1-10)
jitter bool true Add randomization to prevent synchronized retries
respect_retry_after bool true Honor server-specified wait times

Backoff Strategies

Strategy Formula Use Case
exponential_jitter delay = base * (exp_base ^ attempt) +/- 25% Recommended default
exponential delay = base * (exp_base ^ attempt) Predictable timing
linear delay = base * attempt Gentle increase
constant Always use base_delay Fixed intervals

Provider-Specific Behavior

Monitors x-ratelimit-* headers and respects retry-after:

config.yaml
rate_limit:
  max_retries: 5
  respect_retry_after: true

Uses token bucket algorithm with RPM/ITPM/OTPM limits:

config.yaml
rate_limit:
  max_retries: 5
  base_delay: 1.0

No retry-after header. Detects daily quota exhaustion and fails fast:

config.yaml
rate_limit:
  max_retries: 5
  base_delay: 2.0      # Higher default
  max_delay: 120.0     # Longer for daily quota

Daily Quota

When Gemini's RPD (requests per day) limit is hit, the system fails fast rather than retrying since quota resets at midnight Pacific time.

Minimal retries for local deployment:

config.yaml
rate_limit:
  max_retries: 2
  base_delay: 0.5
  max_delay: 5.0

Python API

rate_limit_example.py
from deepfabric import DataSetGenerator

generator = DataSetGenerator(
    generation_system_prompt="You are a helpful assistant.",
    provider="gemini",
    model_name="gemini-2.0-flash-exp",
    rate_limit={
        "max_retries": 7,
        "base_delay": 3.0,
        "max_delay": 180.0,
        "backoff_strategy": "exponential_jitter",
    }
)

Retry Behavior

Retries On

  • 429 (rate limit)
  • 500, 502, 503, 504 (server errors)
  • Timeout, connection, network errors

Does NOT Retry

  • 4xx errors (except 429)
  • Authentication failures
  • Daily quota exhaustion (Gemini)

Best Practices

Paid tier config
rate_limit:
  max_retries: 3
  base_delay: 0.5
  max_delay: 10.0
Free tier config
rate_limit:
  max_retries: 10
  base_delay: 5.0
  max_delay: 300.0
Batch size optimization
output:
  batch_size: 2    # Smaller batches reduce rate limit pressure
  num_samples: 20

generation:
  rate_limit:
    max_retries: 5
    base_delay: 2.0

Troubleshooting

Still Hitting Rate Limits

Solutions

  1. Reduce batch_size in dataset creation
  2. Increase base_delay
  3. Check your provider tier/quota

Daily Quota Exhausted (Gemini)

The system detects this and fails immediately:

ERROR - Failing fast for gemini: 429 RESOURCE_EXHAUSTED (daily_quota_exhausted=True)

Options

  • Wait until midnight Pacific time
  • Upgrade Gemini tier
  • Switch providers temporarily

Too Many Retries

Reduce max_retries to fail faster:

rate_limit:
  max_retries: 3