- Two-tier response cache: in-memory LRU (hot) + SQLite (warm) with TTL-aware eviction - Wire response cache into agent turn loop (temp==0.0, text-only responses only) - Parse Anthropic cache_creation_input_tokens/cache_read_input_tokens - Parse OpenAI prompt_tokens_details.cached_tokens - Add cached_input_tokens to TokenUsage, prompt_caching to ProviderCapabilities - Add CacheHit/CacheMiss observer events with Prometheus counters - Add response_cache_hot_entries config field (default: 256) |
||
|---|---|---|
| .. | ||
| agent.rs | ||
| classifier.rs | ||
| dispatcher.rs | ||
| loop_.rs | ||
| memory_loader.rs | ||
| mod.rs | ||
| prompt.rs | ||
| tests.rs | ||