# Polymech C++ Gridsearch Worker — Design ## Goal Port the [gridsearch-worker.ts](../src/products/locations/gridsearch-worker.ts) pipeline to native C++, running as a **CLI subcommand** (`polymech-cli gridsearch`) while keeping all logic in internal libraries under `packages/`. The worker communicates progress via the [IPC framing protocol](./packages/ipc/) and writes results to Supabase via the existing [postgres](./packages/postgres/) package. --- ## Status | Package | Status | Tests | Assertions | |---------|--------|-------|------------| | `geo` | ✅ Done | 23 | 77 | | `gadm_reader` | ✅ Done | 18 | 53 | | `grid` | ✅ Done | 13 | 105 | | `search` | ✅ Done | 8 | 13 | | CLI `gridsearch` | ✅ Done | — | dry-run verified (3ms) | | IPC `gridsearch` | 🔧 Stub | — | routes msg, TODO: parse payload | | **Total** | | **62** | **248** | --- ## Existing C++ Inventory | Package | Provides | |---------|----------| | `ipc` | Length-prefixed JSON over stdio | | `postgres` | Supabase PostgREST: `query`, `insert` | | `http` | libcurl `GET`/`POST` | | `json` | RapidJSON validate/prettify | | `logger` | spdlog (stdout or **stderr** in worker mode) | | `html` | HTML parser | --- ## TypeScript Pipeline (Reference) ``` GADM Resolve → Grid Generate → SerpAPI Search → Enrich → Supabase Upsert ``` | Phase | Input | Output | Heavy work | |-------|-------|--------|------------| | **1. GADM Resolve** | GID list + target level | `GridFeature[]` (GeoJSON polygons with GHS props) | Read pre-cached JSON files from `cache/gadm/boundary_{GID}_{LEVEL}.json` | | **2. Grid Generate** | `GridFeature[]` + settings | `GridSearchHop[]` (waypoints: lat/lng/radius) | Centroid, bbox, distance, area, point-in-polygon, cell sorting | | **3. Search** | Waypoints + query + SerpAPI key | Place results (JSON) | HTTP calls to `serpapi.com`, per-waypoint caching | | **4. Enrich** | Place results | Enriched data (emails, pages) | HTTP scraping — **defer to Phase 2** | | **5. Persist** | Enriched places | Supabase `places` + `grid_search_runs` | PostgREST upsert | --- ## Implemented Packages ### 1. `packages/geo` — Geometry primitives ✅ Header + `.cpp`, no external deps. Implements the **turf.js subset** used by the grid generator. ```cpp namespace geo { struct Coord { double lon, lat; }; struct BBox { double minLon, minLat, maxLon, maxLat; }; BBox bbox(const std::vector& ring); Coord centroid(const std::vector& ring); double area_sq_m(const std::vector& ring); double distance_km(Coord a, Coord b); bool point_in_polygon(Coord pt, const std::vector& ring); std::vector square_grid(BBox extent, double cellSizeKm); std::vector hex_grid(BBox extent, double cellSizeKm); std::vector buffer_circle(Coord center, double radiusKm, int steps = 6); } // namespace geo ``` **Rationale**: ~200 lines avoids pulling GEOS/Boost.Geometry. Adopts `pip.h` ray-casting pattern from `packages/gadm/cpp/` without the GDAL/GEOS/PROJ dependency (~700MB). --- ### 2. `packages/gadm_reader` — Boundary resolver ✅ Reads pre-cached GADM boundary JSON from disk. No network calls. ```cpp namespace gadm { struct Feature { std::string gid, name; int level; std::vector> rings; double ghsPopulation, ghsBuiltWeight; geo::Coord ghsPopCenter, ghsBuiltCenter; std::vector> ghsPopCenters; // [lon, lat, weight] std::vector> ghsBuiltCenters; double areaSqKm; }; BoundaryResult load_boundary(const std::string& gid, int targetLevel, const std::string& cacheDir = "cache/gadm"); } // namespace gadm ``` Handles `Polygon`/`MultiPolygon`, GHS enrichment fields, fallback resolution by country code prefix. --- ### 3. `packages/grid` — Grid generator ✅ Direct port of [grid-generator.ts](../../shared/src/products/places/grid-generator.ts). ```cpp namespace grid { struct Waypoint { int step; double lng, lat, radius_km; }; struct GridOptions { std::string gridMode; // "hex", "square", "admin", "centers" double cellSize; // km double cellOverlap, centroidOverlap; int maxCellsLimit; double maxElevation, minDensity, minGhsPop, minGhsBuilt; std::string ghsFilterMode; // "AND" | "OR" bool allowMissingGhs, bypassFilters; std::string pathOrder; // "zigzag", "snake", "spiral-out", "spiral-in", "shortest" bool groupByRegion; }; struct GridResult { std::vector waypoints; int validCells, skippedCells; std::string error; }; GridResult generate(const std::vector& features, const GridOptions& opts); } // namespace grid ``` **4 modes**: `admin` (centroid + radius), `centers` (GHS deduplicated), `hex`, `square` (tessellation + PIP) **5 sort algorithms**: `zigzag`, `snake`, `spiral-out`, `spiral-in`, `shortest` (greedy NN) --- ### 4. `packages/search` — SerpAPI client + config ✅ ```cpp namespace search { struct Config { std::string serpapi_key, geocoder_key, bigdata_key; std::string postgres_url, supabase_url, supabase_service_key; }; Config load_config(const std::string& path = "config/postgres.toml"); struct SearchOptions { std::string query; double lat, lng; int zoom = 13, limit = 20; std::string engine = "google_maps", hl = "en", google_domain = "google.com"; }; struct MapResult { std::string title, place_id, data_id, address, phone, website, type; std::vector types; double rating; int reviews; GpsCoordinates gps; }; SearchResult search_google_maps(const Config& cfg, const SearchOptions& opts); } // namespace search ``` Reads `[services].SERPAPI_KEY`, `GEO_CODER_KEY`, `BIG_DATA_KEY` from `config/postgres.toml`. HTTP pagination via `http::get()`, JSON parsing with RapidJSON. --- ## CLI Subcommand: `gridsearch` ✅ ``` polymech-cli gridsearch [OPTIONS] Positionals: GID GADM GID (e.g. ESP.1.1_1) QUERY Search query (e.g. 'mecanizado cnc') Options: -l, --level INT Target GADM level (default: 0) -m, --mode TEXT Grid mode: hex|square|admin|centers (default: hex) -s, --cell-size FLOAT Cell size in km (default: 5.0) --limit INT Max results per area (default: 20) -z, --zoom INT Google Maps zoom (default: 13) --sort TEXT Path order: snake|zigzag|spiral-out|spiral-in|shortest -c, --config TEXT TOML config path (default: config/postgres.toml) --cache-dir TEXT GADM cache directory (default: cache/gadm) --dry-run Generate grid only, skip SerpAPI search ``` ### Execution flow ``` 1. load_config(configPath) → Config (TOML) 2. gadm::load_boundary(gid, level) → features[] 3. grid::generate(features, opts) → waypoints[] 4. --dry-run → output JSON array and exit 5. For each waypoint → search::search_google_maps(cfg, sopts) 6. Stream JSON summary to stdout ``` ### Example ```bash polymech-cli gridsearch ABW "recycling" --dry-run # → [{"step":1,"lat":12.588582,"lng":-70.040465,"radius_km":3.540}, ...] # [info] Dry-run complete in 3ms ``` ### IPC worker mode The `worker` subcommand routes `gridsearch` message type (currently echoes payload — TODO: wire full pipeline from parsed JSON). --- ## Build Integration ### Dependency graph ``` ┌──────────┐ │ polymech │ (the lib) │ -cli │ (the binary) └────┬─────┘ ┌────────────┼────────────────┐ ▼ ▼ ▼ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ search │ │ grid │ │ ipc │ └────┬─────┘ └────┬─────┘ └──────────┘ │ │ ▼ ▼ ┌──────────┐ ┌───────────────┐ │ http │ │ gadm_reader │ └──────────┘ └────┬──────────┘ ▼ ┌──────────┐ │ geo │ ← no deps (math only) └──────────┘ ┌──────────┐ │ json │ ← RapidJSON └──────────┘ ``` All packages depend on `logger` and `json` implicitly. --- ## Testing ### Unit tests (Catch2) — 62 tests, 248 assertions ✅ | Test file | Tests | Assertions | Validates | |-----------|-------|------------|-----------| | `test_geo.cpp` | 23 | 77 | Haversine, area, centroid, PIP, hex/square grid | | `test_gadm_reader.cpp` | 18 | 53 | JSON parsing, GHS props, fallback resolution | | `test_grid.cpp` | 13 | 105 | All 4 modes × 5 sorts, GHS filtering, PIP clipping | | `test_search.cpp` | 8 | 13 | Config loading, key validation, error handling | ### Integration test (Node.js) - Existing `orchestrator/test-ipc.mjs` validates spawn/lifecycle/ping/job - TODO: `test-gridsearch.mjs` for full pipeline via IPC --- ## Deferred (Phase 2) | Item | Reason | |------|--------| | Enrichment (email scraping) | Complex + browser-dependent; keep in Node.js | | SerpAPI response caching | State store managed by orchestrator for now | | Protobuf framing | JSON IPC sufficient for current throughput | | Multi-threaded search | Sequential is fine for SerpAPI rate limits | | GEOS integration | Custom geo is sufficient for grid math | | IPC gridsearch payload parser | Currently a stub; wire full pipeline from JSON | | Supabase upsert in CLI | Use postgres package for batch insert |