diff --git a/.github/workflows/ci-build-fast.yml b/.github/workflows/ci-build-fast.yml index 4af444802..da1486483 100644 --- a/.github/workflows/ci-build-fast.yml +++ b/.github/workflows/ci-build-fast.yml @@ -2,8 +2,8 @@ name: CI Build (Fast) # Optional accelerated release build using cargo-slicer. # Runs alongside the normal Build (Smoke) job — does not gate merges. -# Stubs 2,059 unreachable library functions to skip LLVM codegen, -# saving ~29% wall time on 2-vCPU runners. +# Stubs unreachable library functions to skip LLVM codegen, +# saving ~27% wall time on a 48-core server (more on fewer cores). # # See docs/cargo-slicer-speedup.md for benchmarks and details. @@ -76,8 +76,10 @@ jobs: - name: Pre-analyze workspace run: cargo-slicer pre-analyze - - name: Build release binary (virtual slicing) + - name: Build release binary (virtual slicing + MIR-precise) run: | - CARGO_SLICER_VIRTUAL=1 CARGO_SLICER_CODEGEN_FILTER=1 \ + CARGO_SLICER_MIR_PRECISE=1 \ + CARGO_SLICER_WORKSPACE_CRATES=zeroclaw,zeroclaw_robot_kit \ + CARGO_SLICER_VIRTUAL=1 CARGO_SLICER_CODEGEN_FILTER=1 \ RUSTC_WRAPPER=$(which cargo_slicer_dispatch) \ cargo +nightly build --release --verbose diff --git a/docs/cargo-slicer-speedup.md b/docs/cargo-slicer-speedup.md index 687d50591..42d1fb3e4 100644 --- a/docs/cargo-slicer-speedup.md +++ b/docs/cargo-slicer-speedup.md @@ -1,16 +1,16 @@ # Faster Builds with cargo-slicer -[cargo-slicer](https://github.com/nickel-org/cargo-slicer) is a `RUSTC_WRAPPER` that stubs unreachable library functions at the MIR level, skipping LLVM codegen for code the final binary never calls. It identified **2,059 unreachable functions** in ZeroClaw's workspace crates. +[cargo-slicer](https://github.com/nickel-org/cargo-slicer) is a `RUSTC_WRAPPER` that stubs unreachable library functions at the MIR level, skipping LLVM codegen for code the final binary never calls. ## Benchmark Results -| Environment | Baseline | With cargo-slicer | Wall-time savings | -|---|---|---|---| -| 48-core server (AMD EPYC) | 192.9 s | 170.4 s | **-11.7%** | -| Raspberry Pi 4 (4-core ARM) | 25m 03s | 17m 54s | **-28.6%** | -| 2-vCPU CI runner (estimated) | — | — | **~25-30%** | +| Environment | Mode | Baseline | With cargo-slicer | Wall-time savings | +|---|---|---|---|---| +| 48-core server | syn pre-analysis | 3m 52s | 3m 31s | **-9.1%** | +| 48-core server | MIR-precise | 3m 52s | 2m 49s | **-27.2%** | +| Raspberry Pi 4 | syn pre-analysis | 25m 03s | 17m 54s | **-28.6%** | -All measurements are clean `cargo build --release` on nightly. Fewer cores = larger relative improvement, because each crate's compile time is a bigger fraction of total wall time. The 2-vCPU CI runners should see savings similar to the Pi. +All measurements are clean `cargo +nightly build --release`. MIR-precise mode reads actual compiler MIR to build a more accurate call graph, stubbing 1,060 mono items vs 799 with syn-based analysis. ## CI Integration @@ -26,17 +26,26 @@ cargo +nightly install cargo-slicer --profile release-rustc \ --bin cargo-slicer-rustc --bin cargo_slicer_dispatch \ --features rustc-driver -# Build (from zeroclaw root) +# Build with syn pre-analysis (from zeroclaw root) cargo-slicer pre-analyze CARGO_SLICER_VIRTUAL=1 CARGO_SLICER_CODEGEN_FILTER=1 \ RUSTC_WRAPPER=$(which cargo_slicer_dispatch) \ cargo +nightly build --release + +# Build with MIR-precise analysis (more stubs, bigger savings) +# Step 1: generate .mir-cache (first build with MIR_PRECISE) +CARGO_SLICER_MIR_PRECISE=1 CARGO_SLICER_WORKSPACE_CRATES=zeroclaw,zeroclaw_robot_kit \ + CARGO_SLICER_VIRTUAL=1 CARGO_SLICER_CODEGEN_FILTER=1 \ + RUSTC_WRAPPER=$(which cargo_slicer_dispatch) \ + cargo +nightly build --release +# Step 2: subsequent builds automatically use .mir-cache ``` ## How It Works -1. **Pre-analysis** scans workspace sources via `syn` to build a cross-crate call graph (~5 s). +1. **Pre-analysis** scans workspace sources via `syn` to build a cross-crate call graph (~2 s). 2. **Cross-crate BFS** from `main()` identifies which public library functions are actually reachable. 3. **MIR stubbing** replaces unreachable bodies with `Unreachable` terminators — the mono collector finds no callees and prunes entire codegen subtrees. +4. **MIR-precise mode** (optional) reads actual compiler MIR from the binary crate's perspective, building a ground-truth call graph that identifies even more unreachable functions. No source files are modified. The output binary is functionally identical.