When the LLM hallucinates an invalid model ID through the
model_routing_config tool's set_default action, the invalid model gets
persisted to config.toml. The channel hot-reload then picks it up and
every subsequent message fails with a non-retryable 404, permanently
killing the connection with no user recovery path.
Fix with two layers of defense:
1. Tool probe-and-rollback: after saving the new model, send a minimal
chat request to verify the model is accessible. If the API returns a
non-retryable error (404, auth failure, etc.), automatically restore
the previous config and return a failure notice to the LLM.
2. Channel safety net: in maybe_apply_runtime_config_update, reject
config reloads when warmup fails with a non-retryable error instead
of applying the broken config anyway.
Co-authored-by: Christian Pojoni <christian.pojoni@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>