Rate Limiting¶
DeepFabric includes intelligent retry handling for API rate limits across all LLM providers.
Overview¶
The system provides:
- Provider-aware defaults for OpenAI, Anthropic, Gemini, Ollama, OpenRouter
- Exponential backoff with jitter to prevent thundering herd
- Retry-after header support when providers specify wait times
- Fail-fast detection for non-retryable errors (e.g., daily quota exhaustion)
Provider Defaults¶
Each provider has optimized defaults:
| Provider | Max Retries | Base Delay | Max Delay |
|---|---|---|---|
| OpenAI | 5 | 1.0s | 60s |
| Anthropic | 5 | 1.0s | 60s |
| Gemini | 5 | 2.0s | 120s |
| Ollama | 2 | 0.5s | 5s |
| OpenRouter | 5 | 1.0s | 60s |
Configuration¶
Using Defaults¶
Omit rate limiting config to use provider defaults:
data_engine:
provider: "gemini"
model: "gemini-2.0-flash-exp"
# Rate limiting uses defaults automatically
Custom Configuration¶
data_engine:
provider: "gemini"
model: "gemini-2.0-flash-exp"
rate_limit:
max_retries: 7
base_delay: 3.0
max_delay: 180.0
backoff_strategy: "exponential_jitter"
exponential_base: 2.0
jitter: true
respect_retry_after: true
Options Reference¶
| Option | Type | Default | Description |
|---|---|---|---|
max_retries |
int | 5 | Maximum retry attempts (0-20) |
base_delay |
float | 1.0 | Initial delay in seconds (0.1-60) |
max_delay |
float | 60.0 | Maximum delay cap in seconds (1-300) |
backoff_strategy |
string | exponential_jitter | See strategies below |
exponential_base |
float | 2.0 | Multiplier for exponential backoff (1.1-10) |
jitter |
bool | true | Add randomization to prevent synchronized retries |
respect_retry_after |
bool | true | Honor server-specified wait times |
Backoff Strategies¶
- exponential_jitter (recommended):
delay = base * (exp_base ^ attempt) ± 25% - exponential:
delay = base * (exp_base ^ attempt) - linear:
delay = base * attempt - constant: Always use
base_delay
Provider-Specific Behavior¶
OpenAI¶
Monitors x-ratelimit-* headers and respects retry-after:
Anthropic¶
Uses token bucket algorithm with RPM/ITPM/OTPM limits:
Gemini¶
No retry-after header. Detects daily quota exhaustion and fails fast:
rate_limit:
max_retries: 5
base_delay: 2.0 # Higher default
max_delay: 120.0 # Longer for daily quota
Daily Quota: When Gemini's RPD (requests per day) limit is hit, the system fails fast rather than retrying since quota resets at midnight Pacific time.
Ollama¶
Minimal retries for local deployment:
Python API¶
from deepfabric import DataSetGenerator
generator = DataSetGenerator(
generation_system_prompt="You are a helpful assistant.",
provider="gemini",
model_name="gemini-2.0-flash-exp",
rate_limit={
"max_retries": 7,
"base_delay": 3.0,
"max_delay": 180.0,
"backoff_strategy": "exponential_jitter",
}
)
Retry Behavior¶
Retries on:
- 429 (rate limit)
- 500, 502, 503, 504 (server errors)
- Timeout, connection, network errors
Does NOT retry:
- 4xx errors (except 429)
- Authentication failures
- Daily quota exhaustion (Gemini)
Best Practices¶
High Volume (Paid Tier)¶
Free Tier (Aggressive Limits)¶
Combined with Batch Size¶
dataset:
creation:
batch_size: 2 # Smaller batches reduce rate limit pressure
num_steps: 20
data_engine:
rate_limit:
max_retries: 5
base_delay: 2.0
Troubleshooting¶
Still Hitting Rate Limits¶
- Reduce
batch_sizein dataset creation - Increase
base_delay - Check your provider tier/quota
Daily Quota Exhausted (Gemini)¶
The system detects this and fails immediately:
Options: - Wait until midnight Pacific time - Upgrade Gemini tier - Switch providers temporarily
Too Many Retries¶
Reduce max_retries to fail faster: