mono-cpp/docs/polymech.md
2026-03-24 22:23:13 +01:00

278 lines
9.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Polymech C++ Gridsearch Worker — Design
## Goal
Port the [gridsearch-worker.ts](../src/products/locations/gridsearch-worker.ts) pipeline to native C++, running as a **CLI subcommand** (`polymech-cli gridsearch`) while keeping all logic in internal libraries under `packages/`. The worker communicates progress via the [IPC framing protocol](./packages/ipc/) and writes results to Supabase via the existing [postgres](./packages/postgres/) package.
---
## Status
| Package | Status | Tests | Assertions |
|---------|--------|-------|------------|
| `geo` | ✅ Done | 23 | 77 |
| `gadm_reader` | ✅ Done | 18 | 53 |
| `grid` | ✅ Done | 13 | 105 |
| `search` | ✅ Done | 8 | 13 |
| CLI `gridsearch` | ✅ Done | — | dry-run verified (3ms) |
| IPC `gridsearch` | 🔧 Stub | — | routes msg, TODO: parse payload |
| **Total** | | **62** | **248** |
---
## Existing C++ Inventory
| Package | Provides |
|---------|----------|
| `ipc` | Length-prefixed JSON over stdio |
| `postgres` | Supabase PostgREST: `query`, `insert` |
| `http` | libcurl `GET`/`POST` |
| `json` | RapidJSON validate/prettify |
| `logger` | spdlog (stdout or **stderr** in worker mode) |
| `html` | HTML parser |
---
## TypeScript Pipeline (Reference)
```
GADM Resolve → Grid Generate → SerpAPI Search → Enrich → Supabase Upsert
```
| Phase | Input | Output | Heavy work |
|-------|-------|--------|------------|
| **1. GADM Resolve** | GID list + target level | `GridFeature[]` (GeoJSON polygons with GHS props) | Read pre-cached JSON files from `cache/gadm/boundary_{GID}_{LEVEL}.json` |
| **2. Grid Generate** | `GridFeature[]` + settings | `GridSearchHop[]` (waypoints: lat/lng/radius) | Centroid, bbox, distance, area, point-in-polygon, cell sorting |
| **3. Search** | Waypoints + query + SerpAPI key | Place results (JSON) | HTTP calls to `serpapi.com`, per-waypoint caching |
| **4. Enrich** | Place results | Enriched data (emails, pages) | HTTP scraping — **defer to Phase 2** |
| **5. Persist** | Enriched places | Supabase `places` + `grid_search_runs` | PostgREST upsert |
---
## Implemented Packages
### 1. `packages/geo` — Geometry primitives ✅
Header + `.cpp`, no external deps. Implements the **turf.js subset** used by the grid generator.
```cpp
namespace geo {
struct Coord { double lon, lat; };
struct BBox { double minLon, minLat, maxLon, maxLat; };
BBox bbox(const std::vector<Coord>& ring);
Coord centroid(const std::vector<Coord>& ring);
double area_sq_m(const std::vector<Coord>& ring);
double distance_km(Coord a, Coord b);
bool point_in_polygon(Coord pt, const std::vector<Coord>& ring);
std::vector<BBox> square_grid(BBox extent, double cellSizeKm);
std::vector<BBox> hex_grid(BBox extent, double cellSizeKm);
std::vector<Coord> buffer_circle(Coord center, double radiusKm, int steps = 6);
} // namespace geo
```
**Rationale**: ~200 lines avoids pulling GEOS/Boost.Geometry. Adopts `pip.h` ray-casting pattern from `packages/gadm/cpp/` without the GDAL/GEOS/PROJ dependency (~700MB).
---
### 2. `packages/gadm_reader` — Boundary resolver ✅
Reads pre-cached GADM boundary JSON from disk. No network calls.
```cpp
namespace gadm {
struct Feature {
std::string gid, name;
int level;
std::vector<std::vector<geo::Coord>> rings;
double ghsPopulation, ghsBuiltWeight;
geo::Coord ghsPopCenter, ghsBuiltCenter;
std::vector<std::array<double, 3>> ghsPopCenters; // [lon, lat, weight]
std::vector<std::array<double, 3>> ghsBuiltCenters;
double areaSqKm;
};
BoundaryResult load_boundary(const std::string& gid, int targetLevel,
const std::string& cacheDir = "cache/gadm");
} // namespace gadm
```
Handles `Polygon`/`MultiPolygon`, GHS enrichment fields, fallback resolution by country code prefix.
---
### 3. `packages/grid` — Grid generator ✅
Direct port of [grid-generator.ts](../../shared/src/products/places/grid-generator.ts).
```cpp
namespace grid {
struct Waypoint { int step; double lng, lat, radius_km; };
struct GridOptions {
std::string gridMode; // "hex", "square", "admin", "centers"
double cellSize; // km
double cellOverlap, centroidOverlap;
int maxCellsLimit;
double maxElevation, minDensity, minGhsPop, minGhsBuilt;
std::string ghsFilterMode; // "AND" | "OR"
bool allowMissingGhs, bypassFilters;
std::string pathOrder; // "zigzag", "snake", "spiral-out", "spiral-in", "shortest"
bool groupByRegion;
};
struct GridResult { std::vector<Waypoint> waypoints; int validCells, skippedCells; std::string error; };
GridResult generate(const std::vector<gadm::Feature>& features, const GridOptions& opts);
} // namespace grid
```
**4 modes**: `admin` (centroid + radius), `centers` (GHS deduplicated), `hex`, `square` (tessellation + PIP)
**5 sort algorithms**: `zigzag`, `snake`, `spiral-out`, `spiral-in`, `shortest` (greedy NN)
---
### 4. `packages/search` — SerpAPI client + config ✅
```cpp
namespace search {
struct Config {
std::string serpapi_key, geocoder_key, bigdata_key;
std::string postgres_url, supabase_url, supabase_service_key;
};
Config load_config(const std::string& path = "config/postgres.toml");
struct SearchOptions {
std::string query;
double lat, lng;
int zoom = 13, limit = 20;
std::string engine = "google_maps", hl = "en", google_domain = "google.com";
};
struct MapResult {
std::string title, place_id, data_id, address, phone, website, type;
std::vector<std::string> types;
double rating; int reviews;
GpsCoordinates gps;
};
SearchResult search_google_maps(const Config& cfg, const SearchOptions& opts);
} // namespace search
```
Reads `[services].SERPAPI_KEY`, `GEO_CODER_KEY`, `BIG_DATA_KEY` from `config/postgres.toml`. HTTP pagination via `http::get()`, JSON parsing with RapidJSON.
---
## CLI Subcommand: `gridsearch` ✅
```
polymech-cli gridsearch <GID> <QUERY> [OPTIONS]
Positionals:
GID GADM GID (e.g. ESP.1.1_1)
QUERY Search query (e.g. 'mecanizado cnc')
Options:
-l, --level INT Target GADM level (default: 0)
-m, --mode TEXT Grid mode: hex|square|admin|centers (default: hex)
-s, --cell-size FLOAT Cell size in km (default: 5.0)
--limit INT Max results per area (default: 20)
-z, --zoom INT Google Maps zoom (default: 13)
--sort TEXT Path order: snake|zigzag|spiral-out|spiral-in|shortest
-c, --config TEXT TOML config path (default: config/postgres.toml)
--cache-dir TEXT GADM cache directory (default: cache/gadm)
--dry-run Generate grid only, skip SerpAPI search
```
### Execution flow
```
1. load_config(configPath) → Config (TOML)
2. gadm::load_boundary(gid, level) → features[]
3. grid::generate(features, opts) → waypoints[]
4. --dry-run → output JSON array and exit
5. For each waypoint → search::search_google_maps(cfg, sopts)
6. Stream JSON summary to stdout
```
### Example
```bash
polymech-cli gridsearch ABW "recycling" --dry-run
# → [{"step":1,"lat":12.588582,"lng":-70.040465,"radius_km":3.540}, ...]
# [info] Dry-run complete in 3ms
```
### IPC worker mode
The `worker` subcommand routes `gridsearch` message type (currently echoes payload — TODO: wire full pipeline from parsed JSON).
---
## Build Integration
### Dependency graph
```
┌──────────┐
│ polymech │ (the lib)
│ -cli │ (the binary)
└────┬─────┘
┌────────────┼────────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ search │ │ grid │ │ ipc │
└────┬─────┘ └────┬─────┘ └──────────┘
│ │
▼ ▼
┌──────────┐ ┌───────────────┐
│ http │ │ gadm_reader │
└──────────┘ └────┬──────────┘
┌──────────┐
│ geo │ ← no deps (math only)
└──────────┘
┌──────────┐
│ json │ ← RapidJSON
└──────────┘
```
All packages depend on `logger` and `json` implicitly.
---
## Testing
### Unit tests (Catch2) — 62 tests, 248 assertions ✅
| Test file | Tests | Assertions | Validates |
|-----------|-------|------------|-----------|
| `test_geo.cpp` | 23 | 77 | Haversine, area, centroid, PIP, hex/square grid |
| `test_gadm_reader.cpp` | 18 | 53 | JSON parsing, GHS props, fallback resolution |
| `test_grid.cpp` | 13 | 105 | All 4 modes × 5 sorts, GHS filtering, PIP clipping |
| `test_search.cpp` | 8 | 13 | Config loading, key validation, error handling |
### Integration test (Node.js)
- Existing `orchestrator/test-ipc.mjs` validates spawn/lifecycle/ping/job
- TODO: `test-gridsearch.mjs` for full pipeline via IPC
---
## Deferred (Phase 2)
| Item | Reason |
|------|--------|
| Enrichment (email scraping) | Complex + browser-dependent; keep in Node.js |
| SerpAPI response caching | State store managed by orchestrator for now |
| Protobuf framing | JSON IPC sufficient for current throughput |
| Multi-threaded search | Sequential is fine for SerpAPI rate limits |
| GEOS integration | Custom geo is sufficient for grid math |
| IPC gridsearch payload parser | Currently a stub; wire full pipeline from JSON |
| Supabase upsert in CLI | Use postgres package for batch insert |