- Reorder convert-before-delete in action_write, action_update_block,
and write_single_cell to prevent data loss if markdown conversion fails
- Separate create POST from verification retry loop in action_create
to prevent duplicate document creation on retry
- Add resolve_doc_token to upload_image and upload_file so wiki
node_token resolution works for upload actions
- Add SSRF protection to download_media: validate URL scheme (http/https
only), block local/private hosts via existing url_validation module
- Guard empty credentials in mod.rs: skip FeishuDocTool registration
when app_id or app_secret are empty/whitespace-only
(cherry picked from commit feb1d46f41)
Summary
- Problem: Agent cannot read DOCX files — file_read returns garbled binary/XML, making Word documents inaccessible to the
agent
- Why it matters: DOCX is the most common business document format; without native extraction, users must manually convert
files, breaking autonomous workflows
- What changed: Added docx_read tool using zip (existing) + quick-xml (new) to extract plain text from DOCX Office Open XML
- What did not change: No changes to file_read, agent loop, security policy, config schema, or any existing tool behavior
Label Snapshot (required)
- Risk label: risk: low
- Size label: size: S
- Scope labels: tool
- Module labels: tool: docx_read
- If any auto-label is incorrect: N/A
Change Metadata
- Change type: feature
- Primary scope: tool
Linked Issue
- Closes #(issue number)
Validation Evidence (required)
cargo fmt --all -- --check # pass
cargo clippy --all-targets -- -D warnings # pass (zero new warnings)
cargo test docx_read # 14/14 passed
- Evidence provided: test results, manual verification with zeroclaw agent -m against real DOCX file
Security Impact (required)
- New permissions/capabilities? No (mirrors existing pdf_read security model exactly)
- New external network calls? No
- Secrets/tokens handling changed? No
- File system access scope changed? No
Privacy and Data Hygiene (required)
- Data-hygiene status: pass
- Redaction/anonymization notes: Test fixtures use neutral content ("Hello DOCX", "First", "Second")
- Neutral wording confirmation: Yes
Compatibility / Migration
- Backward compatible? Yes
- Config/env changes? No
- Migration needed? No
i18n Follow-Through
- i18n follow-through triggered? No (no docs or user-facing wording changes)
Human Verification (required)
- Verified scenarios: zeroclaw agent -m "read the file test-test.docx and output the content" — model selected docx_read,
extracted text correctly
- Edge cases checked: invalid ZIP, missing word/document.xml, symlink escape, path traversal, rate limiting, truncation
- What was not verified: encrypted DOCX (out of scope), extremely large files (>50MB)
Side Effects / Blast Radius (required)
- Affected subsystems/workflows: Tool registry only — one new tool added
- Potential unintended effects: None — additive only, no existing behavior changed
- Guardrails/monitoring: Tool follows identical security chain as pdf_read
Rollback Plan (required)
- Fast rollback command/path: git revert <commit>
- Feature flags or config toggles: None needed (always-on, like pdf_read)
- Observable failure symptoms: docx_read tool missing from tool list
Risks and Mitigations
- Risk: quick-xml new dependency adds to compile time
- Mitigation: quick-xml is lightweight pure Rust (~15K LOC), widely used (100M+ downloads), and will be shared when
XLSX/PPTX tools are added later
- report api_key_configured via provider credential resolution (env + overrides)\n- set agent.compact_context default to true for new configs\n- align docs and tests with the new default\n\nRefs: #1983\nRefs: #1984\nContext: #1358\n\nCo-authored-by: Argenis <144828210+theonlyhennygod@users.noreply.github.com>
Compute api_key_configured through provider credential resolution so env-variable credentials are reported correctly for scenarios and delegate agents.
Closes#1983
Fixes all 4 issues from CodeRabbit review:
1. Race condition in spawn: replaced separate running_count() check +
insert() with atomic try_insert(session, max) that holds the write
lock for both the count check and insertion.
2. UTF-8 byte slice panic in subagent_manage: output truncation now
uses char_indices().nth(500) to find a safe byte boundary.
3. UTF-8 byte slice panic in truncate_task: now uses chars().count()
for length check and char_indices().nth() for safe slicing.
Added truncate_task_multibyte_safe test with emoji input.
4. cast_unsigned() replaced with 'as u64' — standard Rust cast for
duration milliseconds.
Test count: 57 (56 + 1 new multibyte safety test).
Improve docstring coverage to meet the 80% threshold required
by CI. Adds //! module docs and /// item docs to all public
types and functions in the subagent tool modules.
Add background sub-agent orchestration tools that extend the existing
delegate tool with async execution, session tracking, and lifecycle
management.
New tools:
- subagent_spawn: Spawn delegate agents in background via tokio::spawn,
returns session_id immediately. Respects security policy, depth limits,
rate limits, and configurable concurrent session cap.
- subagent_list: List running/completed/failed/killed sessions with
status filtering. Read-only, allowed in all autonomy modes.
- subagent_manage: Kill running sessions via CancellationToken or
query status with partial output. Enforces Act policy for kill.
Shared state:
- SubAgentRegistry: Thread-safe session store using
Arc<parking_lot::RwLock<HashMap>> with lazy cleanup of sessions
older than 1 hour. Tracks session metadata, status, timing, and
results.
Test coverage: 56 tests across all 4 modules covering happy paths,
error handling, security enforcement, concurrency, parameter
validation, and edge cases.
No new dependencies added. No existing tests broken.
Ports remaining changes from feat/unify-web-fetch-providers that were
not yet integrated into dev:
- config/schema.rs: add `user_agent` field (default "ZeroClaw/1.0") to
HttpRequestConfig, WebFetchConfig, and WebSearchConfig, with a shared
default_user_agent() helper. Field is serde-default so existing configs
remain backward compatible.
- tools/http_request.rs: accept user_agent in constructor; pass it to
reqwest::Client via .user_agent() replacing the implicit default.
- tools/web_fetch.rs: accept user_agent in constructor; replace hardcoded
"ZeroClaw/0.1 (web_fetch)" in build_http_client with the configured value.
- tools/web_search_tool.rs: accept user_agent in constructor; replace
hardcoded Chrome UA string in search_duckduckgo and add .user_agent()
to the Brave and Firecrawl client builders.
- tools/mod.rs: wire user_agent from each config struct into the
corresponding tool constructor (HttpRequestTool, WebFetchTool,
WebSearchTool).
- onboard/wizard.rs: add setup_web_tools() as wizard Step 6 "Web &
Internet Tools" (total steps bumped from 9 to 10). Configures
WebSearchConfig, WebFetchConfig, and HttpRequestConfig interactively
with provider selection and optional API key/URL prompts. Step 5
setup_tool_mode() http_request and web_search outputs are now discarded
(_, _) since step 6 owns that configuration. Uses dev's generic
api_key/api_url schema fields unchanged.
Co-authored-by: Cursor <cursoragent@cursor.com>
(cherry picked from commit fb83da8db021903cf5844852bdb67b9b259941d7)
`supports_vision` is currently hardcoded per-provider. The same Ollama instance can run `llava` (vision) or
`codellama` (no vision), but the code fixes vision support at the provider level with no user override.
This adds a top-level `model_support_vision: Option<bool>` config key — tri-state:
- **Unset (default):** provider's built-in value, zero behavior change
- **`true`:** force vision on (e.g. Ollama + llava)
- **`false`:** force vision off
Follows the exact same pattern as `reasoning_enabled`. Override is applied at the wrapper layer (`ReliableProvider` /
`RouterProvider`) — no concrete provider code is touched.
## Changes
**Config surface:**
- Top-level `model_support_vision` field in `Config` struct with `#[serde(default)]`
- Env override: `ZEROCLAW_MODEL_SUPPORT_VISION` / `MODEL_SUPPORT_VISION`
**Provider wrappers (core logic):**
- `ReliableProvider`: `vision_override` field + `with_vision_override()` builder + `supports_vision()` override
- `RouterProvider`: same pattern
**Wiring (1-line each):**
- `ProviderRuntimeOptions` struct + factory functions
- 5 construction sites: `loop_.rs`, `channels/mod.rs`, `gateway/mod.rs`, `tools/mod.rs`, `onboard/wizard.rs`
**Docs (i18n parity):**
- `config-reference.md` — Core Keys table
- `providers-reference.md` — new "Ollama Vision Override" section
- Vietnamese sync: `docs/i18n/vi/` + `docs/vi/` (4 files)
## Non-goals
- Does not change any concrete provider implementation
- Does not auto-detect model vision capability
## Test plan
- [x] `cargo fmt --all -- --check`
- [x] `cargo clippy --all-targets -- -D warnings` (no new errors)
- [x] 5 new tests passing:
- `model_support_vision_deserializes` — TOML parse + default None
- `env_override_model_support_vision` — env var override + invalid value ignored
- `vision_override_forces_true` — ReliableProvider override
- `vision_override_forces_false` — ReliableProvider override
- `vision_override_none_defers_to_provider` — passthrough behavior
## Risk and Rollback
- **Risk:** Low. `None` default = zero behavior change for existing users.
- **Rollback:** Revert commit. Field is `#[serde(default)]` so old configs without it will deserialize fine.
(cherry picked from commit a1b8dee785)
Extract shared init logic (pragmas, schema creation, agent registration)
into IpcDb::init(), eliminating ~45 lines of duplication between open()
and open_with_id(). Extract SQL strings into PRAGMA_SQL and SCHEMA_SQL
constants for single source of truth. Remove unused (i64, Value) tuple
in AgentsInboxTool by collecting directly into Vec<Value>.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>