Rate Limiting¶
DeepFabric includes intelligent retry handling for API rate limits across all LLM providers.
Overview¶
The system provides:
- Provider-aware defaults for OpenAI, Anthropic, Gemini, Ollama, OpenRouter
- Exponential backoff with jitter to prevent thundering herd
- Retry-after header support when providers specify wait times
- Fail-fast detection for non-retryable errors (e.g., daily quota exhaustion)
Provider Defaults¶
Each provider has optimized defaults:
| Provider | Max Retries | Base Delay | Max Delay |
|---|---|---|---|
| OpenAI | 5 | 1.0s | 60s |
| Anthropic | 5 | 1.0s | 60s |
| Gemini | 5 | 2.0s | 120s |
| Ollama | 2 | 0.5s | 5s |
| OpenRouter | 5 | 1.0s | 60s |
Configuration¶
Using Defaults¶
Omit rate limiting config to use provider defaults:
config.yaml
generation:
llm:
provider: "gemini"
model: "gemini-2.0-flash-exp"
# Rate limiting uses defaults automatically
Custom Configuration¶
config.yaml
generation:
llm:
provider: "gemini"
model: "gemini-2.0-flash-exp"
rate_limit:
max_retries: 7
base_delay: 3.0
max_delay: 180.0
backoff_strategy: "exponential_jitter"
exponential_base: 2.0
jitter: true
respect_retry_after: true
Options Reference¶
| Option | Type | Default | Description |
|---|---|---|---|
max_retries |
int | 5 | Maximum retry attempts (0-20) |
base_delay |
float | 1.0 | Initial delay in seconds (0.1-60) |
max_delay |
float | 60.0 | Maximum delay cap in seconds (1-300) |
backoff_strategy |
string | exponential_jitter | See strategies below |
exponential_base |
float | 2.0 | Multiplier for exponential backoff (1.1-10) |
jitter |
bool | true | Add randomization to prevent synchronized retries |
respect_retry_after |
bool | true | Honor server-specified wait times |
Backoff Strategies¶
| Strategy | Formula | Use Case |
|---|---|---|
exponential_jitter |
delay = base * (exp_base ^ attempt) +/- 25% |
Recommended default |
exponential |
delay = base * (exp_base ^ attempt) |
Predictable timing |
linear |
delay = base * attempt |
Gentle increase |
constant |
Always use base_delay |
Fixed intervals |
Provider-Specific Behavior¶
Monitors x-ratelimit-* headers and respects retry-after:
Uses token bucket algorithm with RPM/ITPM/OTPM limits:
No retry-after header. Detects daily quota exhaustion and fails fast:
config.yaml
rate_limit:
max_retries: 5
base_delay: 2.0 # Higher default
max_delay: 120.0 # Longer for daily quota
Daily Quota
When Gemini's RPD (requests per day) limit is hit, the system fails fast rather than retrying since quota resets at midnight Pacific time.
Python API¶
rate_limit_example.py
from deepfabric import DataSetGenerator
generator = DataSetGenerator(
generation_system_prompt="You are a helpful assistant.",
provider="gemini",
model_name="gemini-2.0-flash-exp",
rate_limit={
"max_retries": 7,
"base_delay": 3.0,
"max_delay": 180.0,
"backoff_strategy": "exponential_jitter",
}
)
Retry Behavior¶
Retries On
429(rate limit)500,502,503,504(server errors)- Timeout, connection, network errors
Does NOT Retry
4xxerrors (except 429)- Authentication failures
- Daily quota exhaustion (Gemini)
Best Practices¶
Troubleshooting¶
Still Hitting Rate Limits¶
Solutions
- Reduce
batch_sizein dataset creation - Increase
base_delay - Check your provider tier/quota
Daily Quota Exhausted (Gemini)¶
The system detects this and fails immediately:
Options
- Wait until midnight Pacific time
- Upgrade Gemini tier
- Switch providers temporarily
Too Many Retries¶
Reduce max_retries to fail faster: