# C++ Port — Boundaries Batch Generator Port the `boundaries.ts` pipeline to native C++ for **10–100× speedup** on geometry union + GHS raster enrichment. > [!NOTE] > Scaffolding is complete — CLI entry point, module stubs, vcpkg manifest, and CMake build are in place under `cpp/`. > The `geo_merge` and `ghs_enrich` modules are stubs pending implementation. --- ## Why C++ | Bottleneck (TS) | Root cause | C++ remedy | |---|---|---| | `polygon-clipping.union` | Pure-JS, throws on complex polys (DEU) | GEOS `GEOSUnaryUnion` — battle-tested C engine | | `enrichFeatureWithGHS` PIP loop | GC pressure, obj alloc per pixel | Stack-allocated ray-cast, zero alloc | | `readRasters` (GeoTIFF) | JS decoder, no tiling | GDAL `RasterIO` — native COG/tiled reads | | `proj4` per pixel | JS re-init overhead | PROJ C API, reuse `PJ*` transform handle | --- ## Dependencies | Library | Role | Notes | |---|---|---| | **GDAL/OGR 3.x** | GeoPackage read, GeoTIFF read, coordinate transforms | Single dep for 3 concerns | | **GEOS 3.12+** | `GEOSUnaryUnion`, simplify, valid geometry repair | Used internally by GDAL | | **PROJ** | Coordinate transforms (Mollweide → WGS84) | `proj_trans` with cached `PJ*` handle | | **nlohmann/json** | JSON output (cache files) | Header-only | | **CLI11** | Argument parsing | FetchContent (not vcpkg) | | **SQLite3** | Already bundled with GDAL; direct access if needed | | Install via vcpkg manifest (`vcpkg.json` is in `cpp/`): ```bash cd cpp vcpkg install ``` --- ## Architecture ``` cpp/ ├── CMakeLists.txt # Project root — GDAL/GEOS/PROJ via vcpkg ├── CMakePresets.json # vcpkg, vcpkg-win, dev presets ├── vcpkg.json # Dependency manifest ├── packages/ │ ├── logger/ # Logging utility (kept from boilerplate) │ └── json/ # JSON utility (kept from boilerplate) ├── src/ │ ├── main.cpp # CLI entry, country loop, cache skip logic │ ├── gpkg_reader.h/cpp # GeoPackage → features (OGR) │ ├── geo_merge.h/cpp # Geometry union (GEOS) — stub │ ├── ghs_enrich.h/cpp # GeoTIFF raster sampling + PIP — stub │ ├── pip.h # Inline ray-casting point-in-polygon │ └── types.h # BoundaryFeature, BoundaryResult structs └── tests/ └── unit/test_pip.cpp # Catch2 PIP unit test ``` --- ## Module Mapping (TS → C++) ### 1. `gpkg_reader` — GeoPackage Reading **TS source:** [gpkg-reader.ts](file:///c:/Users/zx/Desktop/polymech/pm-pics/packages/gadm/src/gpkg-reader.ts) - Replace custom WKB parser + `better-sqlite3` with **OGR `GDALOpenEx`** + `OGR_L_GetNextFeature` - OGR natively reads GeoPackage and returns `OGRGeometry*` objects — no manual WKB parsing - Group features by `GID_{level}` column, collect geometries per group ```cpp GDALDatasetH ds = GDALOpenEx("gadm41.gpkg", GDAL_OF_VECTOR, nullptr, nullptr, nullptr); OGRLayerH layer = GDALDatasetGetLayer(ds, 0); OGR_L_SetAttributeFilter(layer, "GID_0 = 'DEU'"); OGRFeatureH feat; while ((feat = OGR_L_GetNextFeature(layer)) != nullptr) { OGRGeometryH geom = OGR_F_GetGeometryRef(feat); // group by GID_{level}, collect geom clones } ``` ### 2. `geo_merge` — Geometry Union **TS source:** `mergeGeometries()` in gpkg-reader.ts - Replace `polygon-clipping` + pairwise fallback with **`GEOSUnaryUnion`** - GEOS handles all edge cases (self-intersections, degeneracies) that crash JS libs - For invalid inputs: `GEOSMakeValid` → then union ```cpp GEOSGeometry* collection = GEOSGeom_createCollection( GEOS_GEOMETRYCOLLECTION, geoms.data(), geoms.size()); GEOSGeometry* merged = GEOSUnaryUnion(collection); if (!merged) { // MakeValid fallback for (auto& g : geoms) g = GEOSMakeValid(g); merged = GEOSUnaryUnion(collection); } ``` ### 3. `ghs_enrich` — GeoTIFF Raster Sampling **TS source:** [enrich-ghs.ts](file:///c:/Users/zx/Desktop/polymech/pm-pics/packages/gadm/src/enrich-ghs.ts) - Replace `geotiff.js` with **GDAL `GDALRasterIO`** — supports tiled/COG reads, much faster - Replace JS `proj4()` per-pixel with **PROJ `proj_trans`** using a cached `PJ*` handle - Replace Turf PIP with inline ray-casting (already done in TS, trivial in C++) - Keep stride-based sampling for large windows ```cpp // One-time setup PJ* transform = proj_create_crs_to_crs(ctx, "EPSG:54009", "EPSG:4326", nullptr); // Read raster window GDALRasterBandH band = GDALGetRasterBand(ds, 1); std::vector buf(width * height); GDALRasterIO(band, GF_Read, minPx, minPy, width, height, buf.data(), width, height, GDT_Float32, 0, 0); // Fast pixel loop — no allocations for (int y = 0; y < height; y += stride) { for (int x = 0; x < width; x += stride) { float val = buf[y * width + x]; if (val <= 0 || val == 65535.f) continue; PJ_COORD in = proj_coord(originX + (minPx+x)*resX, originY + (minPy+y)*resY, 0, 0); PJ_COORD out = proj_trans(transform, PJ_FWD, in); if (point_in_geometry(out.xy.x, out.xy.y, geom)) { totalVal += val * strideFactor; // ... weighted sums, candidate tracking } } } ``` ### 4. `main.cpp` — CLI + Cache Logic **TS source:** [boundaries.ts](file:///c:/Users/zx/Desktop/polymech/pm-pics/packages/gadm/scripts/boundaries.ts) - Read `gadm_continent.json` with nlohmann/json - Same skip-if-cached logic (check file existence) - Output identical JSON format to `cache/gadm/boundary_{code}_{level}.json` - Optional: OpenMP `#pragma omp parallel for` on the country loop (each country is independent) --- ## Expected Performance | Operation | TS (current) | C++ (expected) | Speedup | |---|---|---|---| | Geometry union (DEU L0, 151 polys) | **Crashes / >10min** | ~1–3s | ∞ / ~100× | | GHS enrich (NGA L0, ~6M px window) | ~210s | ~5–15s | ~15× | | Full batch (all countries, all levels) | **Hours, may hang** | ~20–40 min | ~10× | --- ## Build & Integration > [!TIP] > `gdal` pulls in `proj` and `geos` transitively on most triplets, but listing them explicitly in `vcpkg.json` ensures the CMake targets are always available. ### Build & Run ```bash # 1. Install dependencies (one-time, from cpp/ directory) cd cpp vcpkg install # 2. Configure + build (Windows MSVC) cmake --preset vcpkg-win cmake --build build --config Release # 2b. Or Ninja (Linux/macOS) cmake --preset vcpkg cmake --build build --config Release # 3. Run ./build/Release/boundaries --country=all ./build/Release/boundaries --country=DEU --level=0 ./build/Release/boundaries --country=NGA --force ``` ### CLI Flags | Flag | Default | Description | |---|---|---| | `--country` | `all` | ISO3 country code, or `all` for batch | | `--level` | `-1` (all) | Admin level 0–5, or -1 for all levels | | `--cache-dir` | `cache/gadm` | Output directory for boundary JSON files | | `--gpkg` | `data/gadm_410-levels.gpkg` | Path to GADM GeoPackage | | `--force` | off | Regenerate even if cache file exists | Output goes to the same `cache/gadm/` directory — the Node.js server reads these cache files at runtime, so the C++ tool is a **drop-in replacement** for batch generation. --- ## Implementation Status | Module | Status | Notes | |---|---|---| | `main.cpp` | ✅ Done | CLI harness, cache logic, JSON output, OpenMP | | `gpkg_reader` | ✅ Scaffolded | OGR read + feature grouping — needs GEOS geometry collection | | `geo_merge` | ✅ Done | GEOSUnaryUnion + MakeValid fallback, WKB lossless roundtrip | | `ghs_enrich` | ✅ Done | GDAL windowed RasterIO + PROJ + OGR PIP + peak detection | | `pip.h` | ✅ Done | Inline ray-casting, unit-tested | | `types.h` | ✅ Done | `BoundaryFeature`, `BoundaryResult` structs | ### Next Steps 1. **`geo_merge`** — Wire up GEOS: geometry collection → `GEOSUnaryUnion` → GeoJSON export 2. **`gpkg_reader`** — Clone OGR geometries into GEOS handles (instead of direct GeoJSON export) 3. **`ghs_enrich`** — GDAL raster read + PROJ transform + PIP loop 4. **Validation** — Compare output against existing TS-generated cache files 5. **Parallelism** — OpenMP on country loop (already wired, just needs `#pragma` testing) Each module can be tested independently against the existing JSON cache files for correctness.