- Two-tier response cache: in-memory LRU (hot) + SQLite (warm) with TTL-aware eviction - Wire response cache into agent turn loop (temp==0.0, text-only responses only) - Parse Anthropic cache_creation_input_tokens/cache_read_input_tokens - Parse OpenAI prompt_tokens_details.cached_tokens - Add cached_input_tokens to TokenUsage, prompt_caching to ProviderCapabilities - Add CacheHit/CacheMiss observer events with Prometheus counters - Add response_cache_hot_entries config field (default: 256) |
||
|---|---|---|
| .. | ||
| assertions.rs | ||
| helpers.rs | ||
| mock_channel.rs | ||
| mock_provider.rs | ||
| mock_tools.rs | ||
| mod.rs | ||
| trace.rs | ||