mono/packages/kbot/cpp/README.md

112 lines
5.5 KiB
Markdown

# polymech-cli
Cross-platform C++ CLI built with CMake.
## Prerequisites
| Tool | Version |
|------|---------|
| CMake | ≥ 3.20 |
| C++ compiler | C++17 (MSVC, GCC, or Clang) |
## Build
```bash
# Debug
cmake --preset dev
cmake --build --preset dev
# Release
cmake --preset release
cmake --build --preset release
```
## Usage
```bash
polymech-cli --help
polymech-cli --version
```
## Worker Mode & Gridsearch
The `worker` subcommand is designed to be spawned by the Node.js frontend orchestrator (`GridSearchUdsManager`) for background gridsearch execution. It accepts length-prefixed JSON frames over a Unix Domain Socket (UDS) or a local TCP port on Windows.
```bash
polymech-cli worker --uds <path_or_port> --daemon --user-uid <id> --config <path>
```
### IPC Resiliency and Logging
The C++ worker pipeline incorporates extensive feedback and retry instrumentation:
1. **Watchdog Heartbeats (`ping` / `pong`)**
- The Node orchestrator sweeps the active worker pool every 15 seconds. It explicitly logs when a ping is sent and when a `pong` (or other active events like `log`, `job_progress`, or `ack`) are received.
- If a C++ worker stops responding to IPC events for 60 seconds (hanging thread or deadlock), it is automatically killed (`SIGKILL`) and evicted from the pool.
2. **Socket Traceability**
- The UDS socket actively traps unexpected closures and TCP faults (like `ECONNRESET`). If the pipe breaks mid-job, explicit socket `error` event handlers in the Node orchestrator will instantly fail the job and log the stack trace, preventing indefinite client-side UI hangs, especially during heavy re-runs.
3. **Persistent Crash Logging (`logs/uds.json`)**
- The C++ worker initializes a multi-sink logger (`logger::init_uds`). It pumps standard logs to `stderr` while simultaneously persisting an append-only file trace to `server/logs/uds.json`.
- The file sink guarantees synchronization to disk aggressively (every 1 second, and immediately on `info` severity). If the worker process vanishes or crashes, `uds.json` acts as the black-box flight recorder for post-mortem debugging.
4. **Job Specification Transparency**
- Gridsearch payloads (including `retry` and `expand` endpoints) aggressively log their input shape (`guided` bounds flag, `enrichers` subset) within the Node console before passing work to the C++ orchestrator. This allows for clear traceability from UI action -> Node submission -> C++ execution.
5. **Thread Safety & Frame Synchronization (Mutexes)**
- The UDS socket handles dual-direction asynchronous streams. The background execution graph (powered by Taskflow) emits high-frequency events (`location`, `waypoint-start`) via `GridsearchCallbacks`. Concurrently, the orchestrator Node.js process sends periodic commands (`ping`, `cancel`) that the C++ socket loop must instantly acknowledge.
- To prevent overlapping payload frames (which corrupt the critical 4-byte `len` header), a global `g_uds_socket_mutex` is strictly enforced. It guarantees that direct UI acknowledgments (`pong`, `cancel_ack`) and background logging (`uds_sink` / Taskflow events) never interleave their `asio::write` bursts onto the pipe.
### IPC Framing & Payload Protocol
Communication runs strictly via length-prefixed JSON frames. This safeguards against TCP fragmentation during heavy event streams.
**Binary Frame Format:**
`[4-byte Unsigned Little-Endian Integer (Payload Length)] [UTF-8 JSON Object]`
#### Control Commands (Node → C++)
If the JSON object contains an `"action"` field, it is handled synchronously on the socket thread:
- **Health Check:** `{"action": "ping"}`
*Replies:* `{"type": "pong", "data": {"memoryMb": 120, "cpuTimeMs": 4500}}`
- **Cancellation:** `{"action": "cancel", "jobId": "job_123"}`
→ Worker sets the atomic cancellation token to safely halt the target `taskflow`, instantly replying `{"type": "cancel_ack", "data": "job_123"}`
- **Daemon Teardown:** `{"action": "stop"}`
→ Flushes all streams and exits cleanly.
#### Gridsearch Payload (Node → C++)
If no `"action"` field exists, the message is treated as a gridsearch spec and pushed into a lock-free `ConcurrentQueue` for the background execution graph:
```json
{
"jobId": "run_9a8bc7",
"configPath": "config/postgres.toml",
"cacheDir": "../packages/gadm/cache",
"enrich": true,
"guided": {
"areas": [{ "gid": "ESP.6_1", "level": 1 }],
"settings": { "gridMode": "hex", "cellSize": 5.0 }
},
"search": {
"types": ["restaurant"],
"limitPerArea": 500
}
}
```
#### Event Streaming (C++ → Node)
As the gridsearch pipeline executes, the `GridsearchCallbacks` emit standard length-prefixed events directly back to the active UDS socket:
- **`ack`**: Acknowledges job was successfully dequeued (`{"type": "ack", "data": {"jobId": "..."}}`).
- **`log`**: Passthrough of all internal C++ `spdlog` messages using the custom `uds_sink` adapter.
- **`location` / `node`**: Raw geolocation geometries and enriched contact details streamed incrementally.
- **`job_progress`**: Phase updates (Grid Generation → Search → Enrichment).
- **`job_result`**: The final statistical and timer summary (EnumMs, SearchMs, Total Emails, etc).
- **`error`**: Unrecoverable boundary parsing or database initialization faults.
## License
BSD-3-Clause
## Requirements
- [https://github.com/taskflow/taskflow](https://github.com/taskflow/taskflow)
- [https://github.com/cameron314/concurrentqueue](https://github.com/cameron314/concurrentqueue)
- [https://github.com/chriskohlhoff/asio](https://github.com/chriskohlhoff/asio)