- Two-tier response cache: in-memory LRU (hot) + SQLite (warm) with TTL-aware eviction - Wire response cache into agent turn loop (temp==0.0, text-only responses only) - Parse Anthropic cache_creation_input_tokens/cache_read_input_tokens - Parse OpenAI prompt_tokens_details.cached_tokens - Add cached_input_tokens to TokenUsage, prompt_caching to ProviderCapabilities - Add CacheHit/CacheMiss observer events with Prometheus counters - Add response_cache_hot_entries config field (default: 256) |
||
|---|---|---|
| .. | ||
| component | ||
| fixtures | ||
| integration | ||
| live | ||
| manual | ||
| support | ||
| system | ||
| test_component.rs | ||
| test_integration.rs | ||
| test_live.rs | ||
| test_system.rs | ||