5.1 KiB
5.1 KiB
Provider Model Fallback Configuration
This guide explains how to configure ZeroClaw to handle quota exhaustion and model failures gracefully using provider and model fallback chains.
Problem: Rate Limits and Model Quota
When using AI providers, you may encounter:
- Rate limits (429) - Your primary provider has no available quota
- Model-specific quota - A specific model (e.g.,
gemini-2.0-flash-exp) is exhausted but other models from the same provider work - Model incompatibility - Fallback providers don't support the same model names
Solution 1: Provider Fallback with Model-Specific Defaults
Configure fallback providers with their own default models:
# config.toml
default_provider = "google"
default_model = "gemini-2.0-flash-exp"
[reliability]
# When Gemini fails, fall back to OpenAI Codex
fallback_providers = ["openai-codex:codex-1"]
# Provider-specific configuration
[[providers.openai_codex]]
profile = "codex-1"
model = "gpt-4o-mini" # Use this model when falling back to Codex
How it works:
- Primary request goes to Gemini with
gemini-2.0-flash-exp - If Gemini returns 429 (rate limited), ZeroClaw tries the fallback
- Fallback provider (OpenAI Codex) uses its configured model (
gpt-4o-mini) instead of trying the Gemini-specific model - No "400 Bad Request: model not supported" errors!
Solution 2: Model Fallback Within Same Provider
When quota for one model is exhausted but other models from the same provider work:
# config.toml
default_provider = "google"
default_model = "gemini-2.0-flash-exp"
[reliability]
# Try alternative Gemini models when quota is exhausted
[reliability.model_fallbacks]
"gemini-2.0-flash-exp" = ["gemini-1.5-pro", "gemini-1.5-flash"]
How it works:
- Request tries
gemini-2.0-flash-expfirst - If it fails with rate limit (429), retry with
gemini-1.5-pro - If that also fails, retry with
gemini-1.5-flash - All retries use the same provider (Gemini)
Solution 3: Combined Provider + Model Fallback
For maximum reliability, combine both strategies:
# config.toml
default_provider = "google"
default_model = "gemini-2.0-flash-exp"
[reliability]
# Provider fallback chain
fallback_providers = ["anthropic", "openai-codex:codex-1"]
# Model fallback within each provider
[reliability.model_fallbacks]
"gemini-2.0-flash-exp" = ["gemini-1.5-pro"]
"claude-opus-4" = ["claude-sonnet-4"]
# Provider-specific models for fallbacks
[[providers.openai_codex]]
profile = "codex-1"
model = "gpt-4o-mini"
Fallback order:
gemini-2.0-flash-expon Googlegemini-1.5-proon Google (model fallback)claude-opus-4on Anthropic (provider fallback)claude-sonnet-4on Anthropic (model fallback)gpt-4o-minion OpenAI Codex (provider fallback with default model)
Retry Configuration
Fine-tune retry behavior:
[reliability]
provider_retries = 2 # Retry each provider 2 times before moving to next
provider_backoff_ms = 500 # Wait 500ms between retries (exponential backoff)
Real-World Example: Multi-Region Gemini Setup
# config.toml
default_provider = "google"
default_model = "gemini-2.0-flash-exp"
[reliability]
# Rotate through API keys (round-robin on 429)
api_keys = [
"sk-key-for-project-a",
"sk-key-for-project-b",
"sk-key-for-project-c"
]
# Model fallback within Gemini
[reliability.model_fallbacks]
"gemini-2.0-flash-exp" = [
"gemini-1.5-pro-latest",
"gemini-1.5-flash-8b"
]
# Provider fallback if all Gemini quota exhausted
fallback_providers = ["anthropic", "openai"]
What happens when quota runs out:
- Try
gemini-2.0-flash-expwithsk-key-for-project-a - 429 → rotate to
sk-key-for-project-band retry - 429 → rotate to
sk-key-for-project-cand retry - Still failing → try
gemini-1.5-pro-latest(model fallback) - Still failing → try
gemini-1.5-flash-8b(model fallback) - Still failing → fall back to Anthropic Claude
- Still failing → fall back to OpenAI
Logging and Monitoring
When fallback occurs, ZeroClaw logs:
[INFO] Provider recovered (failover/retry)
provider="openai-codex"
model="gpt-4o-mini"
original_model="gemini-2.0-flash-exp"
requested_model="gemini-2.0-flash-exp"
attempt=1
Monitor these logs to:
- Detect when quotas are running low
- Identify which fallback paths are used most
- Optimize your provider and model configuration
Best Practices
- Test your fallback chain - Ensure all providers in your chain have valid credentials
- Use model fallbacks first - Cheaper to try different models from same provider than switching providers
- Set appropriate retry counts - Too many retries slow down responses; too few miss transient failures
- Monitor costs - Fallback models may have different pricing
- Keep provider-specific models updated - When adding new providers, configure their default models
See Also
- Config Reference - Full configuration schema
- Providers Reference - Supported providers and authentication
- Operations Runbook - Production deployment guide