9.7 KiB
Polymech C++ Gridsearch Worker — Design
Goal
Port the gridsearch-worker.ts pipeline to native C++, running as a CLI subcommand (polymech-cli gridsearch) while keeping all logic in internal libraries under packages/. The worker communicates progress via the IPC framing protocol and writes results to Supabase via the existing postgres package.
Status
| Package | Status | Tests | Assertions |
|---|---|---|---|
geo |
✅ Done | 23 | 77 |
gadm_reader |
✅ Done | 18 | 53 |
grid |
✅ Done | 13 | 105 |
search |
✅ Done | 8 | 13 |
CLI gridsearch |
✅ Done | — | dry-run verified (3ms) |
IPC gridsearch |
🔧 Stub | — | routes msg, TODO: parse payload |
| Total | 62 | 248 |
Existing C++ Inventory
| Package | Provides |
|---|---|
ipc |
Length-prefixed JSON over stdio |
postgres |
Supabase PostgREST: query, insert |
http |
libcurl GET/POST |
json |
RapidJSON validate/prettify |
logger |
spdlog (stdout or stderr in worker mode) |
html |
HTML parser |
TypeScript Pipeline (Reference)
GADM Resolve → Grid Generate → SerpAPI Search → Enrich → Supabase Upsert
| Phase | Input | Output | Heavy work |
|---|---|---|---|
| 1. GADM Resolve | GID list + target level | GridFeature[] (GeoJSON polygons with GHS props) |
Read pre-cached JSON files from cache/gadm/boundary_{GID}_{LEVEL}.json |
| 2. Grid Generate | GridFeature[] + settings |
GridSearchHop[] (waypoints: lat/lng/radius) |
Centroid, bbox, distance, area, point-in-polygon, cell sorting |
| 3. Search | Waypoints + query + SerpAPI key | Place results (JSON) | HTTP calls to serpapi.com, per-waypoint caching |
| 4. Enrich | Place results | Enriched data (emails, pages) | HTTP scraping — defer to Phase 2 |
| 5. Persist | Enriched places | Supabase places + grid_search_runs |
PostgREST upsert |
Implemented Packages
1. packages/geo — Geometry primitives ✅
Header + .cpp, no external deps. Implements the turf.js subset used by the grid generator.
namespace geo {
struct Coord { double lon, lat; };
struct BBox { double minLon, minLat, maxLon, maxLat; };
BBox bbox(const std::vector<Coord>& ring);
Coord centroid(const std::vector<Coord>& ring);
double area_sq_m(const std::vector<Coord>& ring);
double distance_km(Coord a, Coord b);
bool point_in_polygon(Coord pt, const std::vector<Coord>& ring);
std::vector<BBox> square_grid(BBox extent, double cellSizeKm);
std::vector<BBox> hex_grid(BBox extent, double cellSizeKm);
std::vector<Coord> buffer_circle(Coord center, double radiusKm, int steps = 6);
} // namespace geo
Rationale: ~200 lines avoids pulling GEOS/Boost.Geometry. Adopts pip.h ray-casting pattern from packages/gadm/cpp/ without the GDAL/GEOS/PROJ dependency (~700MB).
2. packages/gadm_reader — Boundary resolver ✅
Reads pre-cached GADM boundary JSON from disk. No network calls.
namespace gadm {
struct Feature {
std::string gid, name;
int level;
std::vector<std::vector<geo::Coord>> rings;
double ghsPopulation, ghsBuiltWeight;
geo::Coord ghsPopCenter, ghsBuiltCenter;
std::vector<std::array<double, 3>> ghsPopCenters; // [lon, lat, weight]
std::vector<std::array<double, 3>> ghsBuiltCenters;
double areaSqKm;
};
BoundaryResult load_boundary(const std::string& gid, int targetLevel,
const std::string& cacheDir = "cache/gadm");
} // namespace gadm
Handles Polygon/MultiPolygon, GHS enrichment fields, fallback resolution by country code prefix.
3. packages/grid — Grid generator ✅
Direct port of grid-generator.ts.
namespace grid {
struct Waypoint { int step; double lng, lat, radius_km; };
struct GridOptions {
std::string gridMode; // "hex", "square", "admin", "centers"
double cellSize; // km
double cellOverlap, centroidOverlap;
int maxCellsLimit;
double maxElevation, minDensity, minGhsPop, minGhsBuilt;
std::string ghsFilterMode; // "AND" | "OR"
bool allowMissingGhs, bypassFilters;
std::string pathOrder; // "zigzag", "snake", "spiral-out", "spiral-in", "shortest"
bool groupByRegion;
};
struct GridResult { std::vector<Waypoint> waypoints; int validCells, skippedCells; std::string error; };
GridResult generate(const std::vector<gadm::Feature>& features, const GridOptions& opts);
} // namespace grid
4 modes: admin (centroid + radius), centers (GHS deduplicated), hex, square (tessellation + PIP)
5 sort algorithms: zigzag, snake, spiral-out, spiral-in, shortest (greedy NN)
4. packages/search — SerpAPI client + config ✅
namespace search {
struct Config {
std::string serpapi_key, geocoder_key, bigdata_key;
std::string postgres_url, supabase_url, supabase_service_key;
};
Config load_config(const std::string& path = "config/postgres.toml");
struct SearchOptions {
std::string query;
double lat, lng;
int zoom = 13, limit = 20;
std::string engine = "google_maps", hl = "en", google_domain = "google.com";
};
struct MapResult {
std::string title, place_id, data_id, address, phone, website, type;
std::vector<std::string> types;
double rating; int reviews;
GpsCoordinates gps;
};
SearchResult search_google_maps(const Config& cfg, const SearchOptions& opts);
} // namespace search
Reads [services].SERPAPI_KEY, GEO_CODER_KEY, BIG_DATA_KEY from config/postgres.toml. HTTP pagination via http::get(), JSON parsing with RapidJSON.
CLI Subcommand: gridsearch ✅
polymech-cli gridsearch <GID> <QUERY> [OPTIONS]
Positionals:
GID GADM GID (e.g. ESP.1.1_1)
QUERY Search query (e.g. 'mecanizado cnc')
Options:
-l, --level INT Target GADM level (default: 0)
-m, --mode TEXT Grid mode: hex|square|admin|centers (default: hex)
-s, --cell-size FLOAT Cell size in km (default: 5.0)
--limit INT Max results per area (default: 20)
-z, --zoom INT Google Maps zoom (default: 13)
--sort TEXT Path order: snake|zigzag|spiral-out|spiral-in|shortest
-c, --config TEXT TOML config path (default: config/postgres.toml)
--cache-dir TEXT GADM cache directory (default: cache/gadm)
--dry-run Generate grid only, skip SerpAPI search
Execution flow
1. load_config(configPath) → Config (TOML)
2. gadm::load_boundary(gid, level) → features[]
3. grid::generate(features, opts) → waypoints[]
4. --dry-run → output JSON array and exit
5. For each waypoint → search::search_google_maps(cfg, sopts)
6. Stream JSON summary to stdout
Example
polymech-cli gridsearch ABW "recycling" --dry-run
# → [{"step":1,"lat":12.588582,"lng":-70.040465,"radius_km":3.540}, ...]
# [info] Dry-run complete in 3ms
IPC worker mode
The worker subcommand routes gridsearch message type (currently echoes payload — TODO: wire full pipeline from parsed JSON).
Build Integration
Dependency graph
┌──────────┐
│ polymech │ (the lib)
│ -cli │ (the binary)
└────┬─────┘
┌────────────┼────────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ search │ │ grid │ │ ipc │
└────┬─────┘ └────┬─────┘ └──────────┘
│ │
▼ ▼
┌──────────┐ ┌───────────────┐
│ http │ │ gadm_reader │
└──────────┘ └────┬──────────┘
▼
┌──────────┐
│ geo │ ← no deps (math only)
└──────────┘
┌──────────┐
│ json │ ← RapidJSON
└──────────┘
All packages depend on logger and json implicitly.
Testing
Unit tests (Catch2) — 62 tests, 248 assertions ✅
| Test file | Tests | Assertions | Validates |
|---|---|---|---|
test_geo.cpp |
23 | 77 | Haversine, area, centroid, PIP, hex/square grid |
test_gadm_reader.cpp |
18 | 53 | JSON parsing, GHS props, fallback resolution |
test_grid.cpp |
13 | 105 | All 4 modes × 5 sorts, GHS filtering, PIP clipping |
test_search.cpp |
8 | 13 | Config loading, key validation, error handling |
Integration test (Node.js)
- Existing
orchestrator/test-ipc.mjsvalidates spawn/lifecycle/ping/job - TODO:
test-gridsearch.mjsfor full pipeline via IPC
Deferred (Phase 2)
| Item | Reason |
|---|---|
| Enrichment (email scraping) | Complex + browser-dependent; keep in Node.js |
| SerpAPI response caching | State store managed by orchestrator for now |
| Protobuf framing | JSON IPC sufficient for current throughput |
| Multi-threaded search | Sequential is fine for SerpAPI rate limits |
| GEOS integration | Custom geo is sufficient for grid math |
| IPC gridsearch payload parser | Currently a stub; wire full pipeline from JSON |
| Supabase upsert in CLI | Use postgres package for batch insert |