Commit Graph

63 Commits

Author SHA1 Message Date
Argenis
c41009d29f
fix(cron): persist allowed_tools for agent jobs (#3993)
Persist allowed_tools in cron_jobs table, threading it through CLI add/update and cron_add/cron_update tool APIs. Add regression coverage for store, tool, and CLI roundtrip paths.

Fixups over original PR #3929: add allowed_tools to all_overdue_jobs SELECT (merge gap), resolve merge conflicts.

Closes #3920
Supersedes #3929
2026-03-24 15:26:29 +03:00
Giulio V
49d68e55f2
fix(cron): add startup catch-up and drop login shell flag (#3948)
* fix(cron): add startup catch-up and drop login shell flag

Problems:
1. When ZeroClaw started after downtime (late boot, daemon restart),
   overdue jobs were picked up via `due_jobs()` but limited by
   `max_tasks` per poll cycle — with many overdue jobs, catch-up
   could take many cycles.
2. Cron shell jobs used `sh -lc` (login shell), which loads the
   full user profile on every execution — slow and may cause
   unexpected side effects.

Fixes:
- Add `all_overdue_jobs()` store query without `max_tasks` limit
- Add `catch_up_overdue_jobs()` startup phase that runs ALL overdue
  jobs once before entering the normal polling loop
- Extract `build_cron_shell_command()` helper using `sh -c` (non-login)
- Add structured tracing for catch-up progress
- Add tests for all new functions

* feat(cron): make catch-up configurable via API and control panel

Add `catch_up_on_startup` boolean to `[cron]` config (default: true).
When enabled, the scheduler runs all overdue jobs at startup before
entering the normal polling loop. Users can toggle this from:

- The Cron page toggle switch in the control panel
- PATCH /api/cron/settings { "catch_up_on_startup": false }
- The `[cron]` section of the TOML config editor

Also adds GET /api/cron/settings endpoint to read cron subsystem
settings without parsing the full config.

* fix(config): add catch_up_on_startup to CronConfig test constructors

The CI Lint job failed because the `cron_config_serde_roundtrip` test
constructs CronConfig directly and was missing the new field.
2026-03-24 15:26:27 +03:00
Argenis
e556ad3d3e
fix: handle double-serialized schedule in cron_add and cron_update (#3860) (#3905)
When LLMs pass the schedule parameter as a JSON string instead of a JSON
object, serde fails with "invalid type: string, expected internally
tagged enum Schedule". Add a deserialize_maybe_stringified helper that
detects stringified JSON values and parses the inner string before
deserializing, providing backward compatibility for both object and
string representations.

Fixes #3860
2026-03-24 15:17:32 +03:00
Argenis
88693dda59
fix(cron): prevent one-shot jobs from re-executing indefinitely (#3886)
Handle Schedule::At jobs in reschedule_after_run by disabling them
instead of rescheduling to a past timestamp. Also add a fallback in
persist_job_result to disable one-shot jobs if removal fails.

Closes #3868
2026-03-24 15:17:31 +03:00
Argenis
6105cc4c3d
fix(lint): Box::pin crate::agent::run calls to satisfy large_futures (#3675)
Wrap all crate::agent::run() calls with Box::pin() across scheduler,
daemon, gateway tests, and main.rs to satisfy clippy::large_futures.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 15:17:11 +03:00
Argenis
1bea48dba1
feat(security): add Nevis IAM integration for SSO/MFA authentication (#3651)
* feat(security): add Nevis IAM integration for SSO/MFA authentication

Add NevisAuthProvider supporting OAuth2/OIDC token validation (local JWKS +
remote introspection), FIDO2/passkey/OTP MFA verification, session management,
and health checks. Add IamPolicy engine mapping Nevis roles to ZeroClaw tool
and workspace permissions with deny-by-default enforcement and audit logging.

Add NevisConfig and NevisRoleMappingConfig to config schema with client_secret
wired through SecretStore encrypt/decrypt. All features disabled by default.

Rebased on latest master to resolve merge conflicts in security/mod.rs (redact
function) and config/schema.rs (test section).

Original work by @rareba. Supersedes #3593.

Co-Authored-By: rareba <5985289+rareba@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: cargo fmt Box::pin calls

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: rareba <5985289+rareba@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 15:17:11 +03:00
Argenis
7844ab371c
style: cargo fmt Box::pin calls in cron scheduler (#3667)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 15:17:11 +03:00
Argenis
cc582f4d86
fix(lint): Box::pin large futures in cron scheduler and cron_run tool (#3666)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 15:17:11 +03:00
argenis de la rosa
6fcb64489b
feat(security): add capability-based tool access control
Add an optional `allowed_tools` parameter that restricts which tools are
available to the agent. When `Some(list)`, only tools whose name appears
in the list are retained; when `None`, all tools remain available
(backward compatible). This enables fine-grained capability control for
cron jobs, heartbeat tasks, and CLI invocations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 15:16:07 +03:00
simianastronaut
8e204b1f9b
fix(cron): add --agent flag to CLI cron commands to bypass shell security validation
The CLI `cron add` command always routed the second positional argument
through shell security policy validation, which blocked natural language
prompts like "Check server health: disk space, memory, CPU load". This
adds an `--agent` flag to `cron add`, `cron add-at`, `cron add-every`,
and `cron once` so that natural language prompts are correctly stored as
agent jobs without shell command validation.

Closes #3563

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 15:16:04 +03:00
Argenis
8c0b491f33
fix: wire Signal channel into scheduled announcement delivery (#3511)
Add SignalChannel import and match arm in deliver_announcement() so
cron jobs with delivery.channel = "signal" are handled instead of
rejected as unsupported.

Closes #3476
2026-03-24 15:16:00 +03:00
Asuta
348c0c37b7
feat(agent): 支持交互会话状态持久化与恢复 (#3421)
Co-authored-by: Argenis <theonlyhennygod@gmail.com>
2026-03-13 18:55:42 -04:00
Marcelo Correa
2e2c1da4fa
fix(cron): skip unparseable job rows instead of aborting the scheduler (#3405)
A single cron job with a malformed `next_run` timestamp in the database
was silently stopping all scheduled jobs. The `due_jobs` query matched
rows whose `next_run` was lexicographically past-due (including
non-RFC3339 values like "2026-03-12 03:11:13" which sort before valid
RFC3339 strings), then `map_cron_job_row` failed to parse the timestamp,
the `row?` propagation caused `due_jobs` to return `Err`, and the
scheduler marked itself as `error` and skipped every subsequent tick —
taking down all other healthy jobs with it.

The fix changes the row iteration in `due_jobs` to log a warning and
skip unparseable rows rather than aborting the entire result set. Valid
jobs continue to fire; the broken row is surfaced in the logs without
collateral damage to the scheduler.

Co-authored-by: ZeroClaw <zeroclaw@users.noreply.github.com>
Co-authored-by: Argenis <theonlyhennygod@gmail.com>
2026-03-13 18:17:08 -04:00
Alix-007
e5e3761020
fix(cron): support Matrix announce delivery (#3373)
* fix(cron): support Matrix announce delivery

* fix(cron): expose Matrix delivery in tool schemas
2026-03-13 15:16:10 -04:00
Argenis
e03dc4bfce
fix(security): unify cron shell validation across API/CLI/scheduler (#3270)
Centralize cron shell command validation so all entrypoints enforce the
same security policy (allowlist + risk gate + approval) before
persistence and execution.

Changes:
- Add validate_shell_command() and validate_shell_command_with_security()
  as the single validation gate for all cron shell paths
- Add add_shell_job_with_approval() and update_shell_job_with_approval()
  that validate before persisting
- Add add_once_validated() and add_once_at_validated() for one-shot jobs
- Make raw add_shell_job/add_job/add_once/add_once_at pub(crate) to
  prevent unvalidated writes from outside the cron module
- Route gateway API through validated creation path
- Route schedule tool through validated helpers (single validation)
- Route cron_add/cron_update tools through validated helpers
- Unify scheduler execution validation via validate_shell_command_with_security
- CLI update handler uses full validate_command_execution instead of
  just is_command_allowed
- Add focused tests for validation parity across entrypoints
- Standardize error format to "blocked by security policy: {reason}"

Closes #2741
Closes #2742

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 12:48:13 +00:00
Sid Jain
7b7e08dc21 fix(slack): align cron workspace dir and socket retry backoff 2026-03-11 04:32:56 -04:00
Sid Jain
526d63fd75 feat: add slack file reading capability to zeroclaw 2026-03-11 04:32:56 -04:00
Chummy
921132575d
test: add regression coverage for provider parser cron and telegram 2026-02-24 16:03:01 +08:00
Kevin Syong
ae3f348a15
fix(scheduler): include failure reason in job failure warning
- Return output string from 'execute_and_persist_job' alongside job id and success flag.
- Include failure reason in 'tracing::warn' when a scheduler job fails.
- Makes failed cron job errors visible in logs without inspecting the database.
2026-02-24 16:03:00 +08:00
Chummy
1b131b5256
fix: route heartbeat outputs to configured channels 2026-02-24 16:02:59 +08:00
Chummy
4a2503605d
test(cron): add shell one-shot regression coverage 2026-02-24 16:02:59 +08:00
reidliu41
d6283d2bab
fix(cron): set delete_after_run for one-shot shell jobs 2026-02-24 16:02:59 +08:00
Chummy
e5bc9514a4 security: close shell path-policy bypasses 2026-02-21 22:35:52 +08:00
Allen Huang
7d81715b60 fix(agent): skip interactive approval in daemon/cron context
Daemon heartbeat and cron tasks called agent::run() which hardcoded
channel_name as "cli" and always created an ApprovalManager, causing
[Y]es / [N]o / [A]lways stdin prompts on the unattended daemon terminal.

Add interactive parameter to agent::run(): CLI passes true (preserving
approval flow), daemon/cron pass false (no ApprovalManager, channel
marked as "daemon").

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 14:52:44 +08:00
Aleksandr Prilipko
2af6a25ac2 fix: resolve all compilation, test, and fmt errors on main
- Remove duplicate `chat` method in reliable.rs (E0201)
- Fix `futures` → `futures_util` imports in agent.rs and loop_.rs (E0433)
- Gate PostgresMemory behind `memory-postgres` feature in cli.rs (E0433)
- Fix regex backreference in XML tool parser (unsupported by regex crate)
- Add missing `skills_prompt_mode` argument in test
- Apply rustfmt to files with formatting issues on main
2026-02-21 12:09:06 +08:00
Le Song
645515145e test(cron): add tests for job_type SQL reading and validation 2026-02-21 02:35:54 +08:00
Le Song
42cab231e6 test(crom): add tests for JobType::try_from to handle case-insensitive and invalid values 2026-02-21 02:35:54 +08:00
Le Song
b45afa15fd fix(cron): map job_type via FromSql and standardize persistence 2026-02-21 02:35:54 +08:00
Le Song
7faff05dae fix(cron): align JobType conversions: add JobType <-> &str conversion via From/TryFrom 2026-02-21 02:35:54 +08:00
Chummy
c611ffa43b fix(scheduler): harden idle health heartbeat behavior 2026-02-20 21:39:52 +08:00
Will Sarg
a9a35d50d1
fix(ci): restore containerized validation on main (#1096) 2026-02-20 07:48:58 -05:00
Chummy
b26bf262b8 fix(doctor): prevent false scheduler/channel unhealthy states 2026-02-20 19:35:53 +08:00
fettpl
c649ced585
fix(security): enforce cron agent autonomy and rate gates (#626) 2026-02-20 05:23:20 -05:00
Edvard
8b4607a1ef feat(cron): add cron update CLI subcommand for in-place job updates
Add Update variant to CronCommands in both main.rs and lib.rs, with
handler in cron/mod.rs that constructs a CronJobPatch and calls
update_job(). Includes security policy check for command changes.

Fixes from review feedback:
- --tz alone now correctly updates timezone (fetches existing schedule)
- --expression alone preserves existing timezone instead of clearing it
- All-None patch (no flags) now returns an error
- Output uses consistent emoji prefix

Tests exercise handle_command directly to cover schedule construction.

Closes #809

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 15:11:37 +08:00
Chummy
275d3e7791 style: apply rustfmt to async fs updates 2026-02-19 14:52:29 +08:00
Jayson Reis
b9af601943 chore: Remove blocking read strings 2026-02-19 14:52:29 +08:00
Vernon Stinebaker
d97866a640 feat(mattermost): add mention_only config for @-mention filtering
Add mention_only support for the Mattermost channel, matching the existing
Discord implementation. When enabled, the bot only processes messages that
contain an @-mention of the bot username, reducing noise in busy channels.

- Add mention_only field to MattermostConfig schema (Option<bool>, default false)
- Rename get_bot_user_id() to get_bot_identity() returning (user_id, username)
- Add contains_bot_mention_mm() with case-insensitive word-boundary matching
  and metadata.mentions array support
- Add normalize_mattermost_content() to strip @-mentions from processed text
- Wire mention_only through channel and cron factory constructors
- Add 23 new tests covering mention detection, stripping, case-insensitivity,
  word boundaries, metadata mentions, empty-after-strip, and disabled passthrough
2026-02-18 21:25:28 +08:00
ZeroClaw Contributor
c0a80ad656 feat(channel): add mention_only option for Telegram groups
Adds mention_only config option to Telegram channel, allowing the bot
to only respond to messages that @-mention the bot in group chats.
Direct messages are always processed regardless of this setting.

Behavior:
- When mention_only = true: Bot only responds to group messages containing @botname
- When mention_only = false (default): Bot responds to all allowed messages
- DM/private chats always work regardless of mention_only setting

Implementation:
- Fetch and cache bot username from Telegram API on startup
- Check for @botname mention in group messages
- Strip mention from message content before processing

Config example:
[channels.telegram]
bot_token = "your_token"
mention_only = true

Changes:
- src/config/schema.rs: Add mention_only to TelegramConfig
- src/channels/telegram.rs: Implement mention_only logic + 6 new tests
- src/channels/mod.rs: Update factory calls
- src/cron/scheduler.rs: Update constructor call
- src/onboard/wizard.rs: Update wizard config
- src/daemon/mod.rs: Update test config
- src/integrations/registry.rs: Update test config
- TESTING_TELEGRAM.md: Add mention_only test section
- CHANGELOG.md: Document feature

Risk: medium
Backward compatible: Yes (default: false)
2026-02-18 19:51:42 +08:00
Chummy
c70d9b181d test: stabilize cron shell output capture and gemini warmup noop 2026-02-18 19:26:07 +08:00
Chummy
1bfd50bce9 fix(mattermost): preserve threaded default and docs 2026-02-18 17:46:19 +08:00
Vernon Stinebaker
58120b1c69 feat(mattermost): add thread_replies config and typing indicator
Add two Mattermost channel enhancements:

1. thread_replies config option (default: false)
   - When false, replies go to the channel root instead of threading.
   - When true, replies thread on the original post.
   - Existing thread replies always stay in-thread regardless of setting.

2. Typing indicator (start_typing/stop_typing)
   - Implements the Channel trait's typing methods for Mattermost.
   - Fires POST /api/v4/users/me/typing every 4s in a background task.
   - Supports parent_id for threaded typing indicators.
   - Aborts cleanly on stop_typing via JoinHandle.

Updated all MattermostChannel::new call sites (start_channels, scheduler)
and added 9 unit tests covering thread routing and edge cases.
2026-02-18 17:46:19 +08:00
Xiangjun Ma
118cd53922 feat(channel): stream LLM responses to Telegram via draft message edits
Wire the existing provider-layer streaming infrastructure through the
channel trait and agent loop so Telegram users see tokens arrive
progressively via editMessageText, instead of waiting for the full
response.

Changes:
- Add StreamMode enum (off/partial/block) and draft_update_interval_ms
  to TelegramConfig (backward-compatible defaults: off, 1000ms)
- Add supports_draft_updates/send_draft/update_draft/finalize_draft to
  Channel trait with no-op defaults (zero impact on existing channels)
- Implement draft methods on TelegramChannel using sendMessage +
  editMessageText with rate limiting and Markdown fallback
- Add on_delta mpsc::Sender<String> parameter to run_tool_call_loop
  (None preserves existing behavior)
- Wire streaming in process_channel_message: when channel supports
  drafts, send initial draft, spawn updater task, finalize on completion

Edge cases handled:
- 4096-char limit: finalize draft and fall back to chunked send
- Broken Markdown: use no parse_mode during streaming, apply on finalize
- Edit failures: fall back to sending complete response as new message
- Rate limiting: configurable draft_update_interval_ms (default 1s)
2026-02-18 16:33:33 +08:00
Chummy
431287184b style(tests): apply rustfmt to brittle-test hardening changes 2026-02-18 14:17:58 +08:00
Alex Gorevski
45cdd25b3d fix(tests): harden brittle tests for cross-platform stability and refactoring resilience
## Problem

The test suite contained several categories of latent brittleness
identified in docs/testing-brittle-tests.md that would surface during
refactoring or cross-platform (Windows) CI execution:

1. Hardcoded Unix paths: \Path::new("/tmp")\ and \PathBuf::from("/tmp")\
   used as workspace directories in agent tests, which fail on Windows
   where /tmp does not exist.

2. Exact string match assertions: ~20 \ssert_eq!(response, "exact text")\
   assertions in agent unit and e2e tests that break on any mock wording
   change, even when the underlying orchestration behavior is correct.

3. Fragile error message string matching: \.contains("specific message")\
   assertions coupled to internal error wording rather than testing the
   error category or behavioral outcome.

## What Changed

### Hardcoded paths → platform-agnostic temp dirs (4 files, 7 locations)
- \src/agent/tests.rs\: Replaced all 4 instances of \Path::new("/tmp")\
  and \PathBuf::from("/tmp")\ with \std::env::temp_dir()\ in
  \make_memory()\, \uild_agent_with()\, \uild_agent_with_memory()\,
  and \uild_agent_with_config()\ helpers.
- \	ests/agent_e2e.rs\: Replaced all 3 instances in \make_memory()\,
  \uild_agent()\, and \uild_agent_xml()\ helpers.

### Exact string assertions → behavioral checks (2 files, ~20 locations)
- \src/agent/tests.rs\: Converted 10 \ssert_eq!(response, "...")\ to
  \ssert!(!response.is_empty(), "descriptive message")\ across tests for
  text pass-through, tool execution, tool failure recovery, XML dispatch,
  mixed text+tool responses, multi-tool batch, and run_single delegation.
- \	ests/agent_e2e.rs\: Converted 9 exact-match assertions to behavioral
  checks. Multi-turn test now uses \ssert_ne!(r1, r2)\ to verify
  sequential responses are distinct without coupling to exact wording.
- Provider error propagation test simplified to \ssert!(result.is_err())\
  without asserting on the error message string.

### Fragile error message assertions → structural checks (2 files)
- \src/tools/git_operations.rs\: Replaced fragile OR-branch string match
  (\contains("git repository") || contains("Git command failed")\) with
  structural assertions: checks \!result.success\, error is non-empty,
  and error does NOT mention autonomy/read-only (verifying the failure
  is git-related, not permission-related).
- \src/cron/scheduler.rs\: Replaced \contains("agent job failed:")\ with
  \!success\ and \!output.is_empty()\ checks that verify failure behavior
  without coupling to exact log format.

## What Was NOT Changed (and why)
- \src/agent/loop_.rs\ parser tests: Exact string assertions are the
  contract for XML tool call parsing — the exact output IS the spec.
- \src/providers/reliable.rs\: Error message assertions test the error
  format contract (provider/model attribution in failure messages).
- \src/service/mod.rs\: Already platform-gated with \#[cfg]\; XML escape
  test is a formatting contract where exact match is appropriate.
- \src/config/schema.rs\: TOML test strings use /tmp as data values for
  deserialization tests, not filesystem access; HOME tests already use
  \std::env::temp_dir()\.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-02-18 14:17:58 +08:00
Alex Gorevski
21c5f58363 perf(cron): wrap record_run INSERT+DELETE in explicit transaction
Problem:
In record_run(), an INSERT into cron_runs followed by a pruning DELETE
ran as separate implicit transactions. If the INSERT succeeded but the
DELETE failed (e.g., due to disk pressure or lock contention), the run
table would grow unboundedly since the pruning step was lost while the
new row persisted.

Fix:
Wrap both statements in an explicit transaction using
conn.unchecked_transaction(). If either statement fails, the entire
transaction is rolled back, maintaining the invariant that the run
history stays bounded by max_run_history.

Ref: zeroclaw-labs/zeroclaw#710 (Item 5)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-02-18 14:07:31 +08:00
Alex Gorevski
9967eeb954 perf(cron): add composite index on cron_runs(job_id, started_at)
Problem:
The pruning query in record_run uses WHERE job_id = ?1 with
ORDER BY started_at DESC, but only single-column indexes exist
for job_id and started_at separately. SQLite must scan one index
and then sort or scan the other, which is suboptimal for the
combined filter + sort pattern used during pruning.

Fix:
Add a composite index CREATE INDEX IF NOT EXISTS
idx_cron_runs_job_started ON cron_runs(job_id, started_at).
This lets SQLite satisfy the WHERE job_id = ?1 ORDER BY
started_at DESC subquery in a single index scan without a
separate sort step. The existing single-column indexes are
retained for other queries that filter on only one column.

Ref: zeroclaw-labs/zeroclaw#710 (Item 7)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-02-18 14:06:59 +08:00
fettpl
7de052c7d2 fix(cron): add timeout and bounded execution for due jobs 2026-02-18 12:55:21 +08:00
Alex Gorevski
5f5cb27690
fix(cron): handle ALTER TABLE race condition in schema migration
Problem: add_column_if_missing() checks PRAGMA table_info for column existence, then issues ALTER TABLE ADD COLUMN if not found. When two concurrent processes both pass the check before either executes the ALTER, the second process fails with a 'duplicate column name' error.

Fix: Catch the 'duplicate column name' SQLite error after the ALTER TABLE and treat it as a benign no-op. Also explicitly drop statement/rows handles before ALTER to release locks.

Ref: #710 (Item 8)
2026-02-17 23:50:08 -05:00
Vernon Stinebaker
7e3f5ff497 feat(channels): add Mattermost integration for sovereign communication 2026-02-18 00:19:20 +08:00
Chummy
cd0dd13476 fix(channels): complete SendMessage migration after rebase 2026-02-17 23:28:08 +08:00