# C++ Port — Boundaries Batch Generator Port the `boundaries.ts` pipeline to native C++ for **10–100× speedup** on geometry union + GHS raster enrichment. > [!NOTE] > Assumes existing C++ boilerplate for argument parsing, logging, and JSON serialization is available. --- ## Why C++ | Bottleneck (TS) | Root cause | C++ remedy | |---|---|---| | `polygon-clipping.union` | Pure-JS, throws on complex polys (DEU) | GEOS `GEOSUnaryUnion` — battle-tested C engine | | `enrichFeatureWithGHS` PIP loop | GC pressure, obj alloc per pixel | Stack-allocated ray-cast, zero alloc | | `readRasters` (GeoTIFF) | JS decoder, no tiling | GDAL `RasterIO` — native COG/tiled reads | | `proj4` per pixel | JS re-init overhead | PROJ C API, reuse `PJ*` transform handle | --- ## Dependencies | Library | Role | Notes | |---|---|---| | **GDAL/OGR 3.x** | GeoPackage read, GeoTIFF read, coordinate transforms | Single dep for 3 concerns | | **GEOS 3.12+** | `GEOSUnaryUnion`, simplify, valid geometry repair | Used internally by GDAL | | **nlohmann/json** | JSON output (cache files) | Header-only | | **SQLite3** | Already bundled with GDAL; direct access if needed | | On Windows: `vcpkg install gdal geos nlohmann-json` or use the OSGeo4W SDK. --- ## Architecture ``` boundaries-cli ├── main.cpp # CLI entry, country loop, cache skip logic ├── gpkg_reader.h/cpp # GeoPackage → GeoJSON features ├── geo_merge.h/cpp # Geometry union (GEOS) ├── ghs_enrich.h/cpp # GeoTIFF raster sampling + PIP ├── pip.h # Inline ray-casting point-in-polygon └── types.h # BoundaryFeature, BoundaryResult structs ``` --- ## Module Mapping (TS → C++) ### 1. `gpkg_reader` — GeoPackage Reading **TS source:** [gpkg-reader.ts](file:///c:/Users/zx/Desktop/polymech/pm-pics/packages/gadm/src/gpkg-reader.ts) - Replace custom WKB parser + `better-sqlite3` with **OGR `GDALOpenEx`** + `OGR_L_GetNextFeature` - OGR natively reads GeoPackage and returns `OGRGeometry*` objects — no manual WKB parsing - Group features by `GID_{level}` column, collect geometries per group ```cpp GDALDatasetH ds = GDALOpenEx("gadm41.gpkg", GDAL_OF_VECTOR, nullptr, nullptr, nullptr); OGRLayerH layer = GDALDatasetGetLayer(ds, 0); OGR_L_SetAttributeFilter(layer, "GID_0 = 'DEU'"); OGRFeatureH feat; while ((feat = OGR_L_GetNextFeature(layer)) != nullptr) { OGRGeometryH geom = OGR_F_GetGeometryRef(feat); // group by GID_{level}, collect geom clones } ``` ### 2. `geo_merge` — Geometry Union **TS source:** `mergeGeometries()` in gpkg-reader.ts - Replace `polygon-clipping` + pairwise fallback with **`GEOSUnaryUnion`** - GEOS handles all edge cases (self-intersections, degeneracies) that crash JS libs - For invalid inputs: `GEOSMakeValid` → then union ```cpp GEOSGeometry* collection = GEOSGeom_createCollection( GEOS_GEOMETRYCOLLECTION, geoms.data(), geoms.size()); GEOSGeometry* merged = GEOSUnaryUnion(collection); if (!merged) { // MakeValid fallback for (auto& g : geoms) g = GEOSMakeValid(g); merged = GEOSUnaryUnion(collection); } ``` ### 3. `ghs_enrich` — GeoTIFF Raster Sampling **TS source:** [enrich-ghs.ts](file:///c:/Users/zx/Desktop/polymech/pm-pics/packages/gadm/src/enrich-ghs.ts) - Replace `geotiff.js` with **GDAL `GDALRasterIO`** — supports tiled/COG reads, much faster - Replace JS `proj4()` per-pixel with **PROJ `proj_trans`** using a cached `PJ*` handle - Replace Turf PIP with inline ray-casting (already done in TS, trivial in C++) - Keep stride-based sampling for large windows ```cpp // One-time setup PJ* transform = proj_create_crs_to_crs(ctx, "EPSG:54009", "EPSG:4326", nullptr); // Read raster window GDALRasterBandH band = GDALGetRasterBand(ds, 1); std::vector buf(width * height); GDALRasterIO(band, GF_Read, minPx, minPy, width, height, buf.data(), width, height, GDT_Float32, 0, 0); // Fast pixel loop — no allocations for (int y = 0; y < height; y += stride) { for (int x = 0; x < width; x += stride) { float val = buf[y * width + x]; if (val <= 0 || val == 65535.f) continue; PJ_COORD in = proj_coord(originX + (minPx+x)*resX, originY + (minPy+y)*resY, 0, 0); PJ_COORD out = proj_trans(transform, PJ_FWD, in); if (point_in_geometry(out.xy.x, out.xy.y, geom)) { totalVal += val * strideFactor; // ... weighted sums, candidate tracking } } } ``` ### 4. `main.cpp` — CLI + Cache Logic **TS source:** [boundaries.ts](file:///c:/Users/zx/Desktop/polymech/pm-pics/packages/gadm/scripts/boundaries.ts) - Read `gadm_continent.json` with nlohmann/json - Same skip-if-cached logic (check file existence) - Output identical JSON format to `cache/gadm/boundary_{code}_{level}.json` - Optional: OpenMP `#pragma omp parallel for` on the country loop (each country is independent) --- ## Expected Performance | Operation | TS (current) | C++ (expected) | Speedup | |---|---|---|---| | Geometry union (DEU L0, 151 polys) | **Crashes / >10min** | ~1–3s | ∞ / ~100× | | GHS enrich (NGA L0, ~6M px window) | ~210s | ~5–15s | ~15× | | Full batch (all countries, all levels) | **Hours, may hang** | ~20–40 min | ~10× | --- ## Build & Integration ### vcpkg Manifest — `vcpkg.json` ```json { "name": "gadm-boundaries", "version-string": "1.0.0", "dependencies": [ "gdal", "geos", "proj", "nlohmann-json" ] } ``` > [!TIP] > `gdal` pulls in `proj` and `geos` transitively on most triplets, but listing them explicitly ensures the CMake targets are always available. ### CMake — `CMakeLists.txt` ```cmake cmake_minimum_required(VERSION 3.20) project(gadm-boundaries LANGUAGES CXX) set(CMAKE_CXX_STANDARD 20) set(CMAKE_CXX_STANDARD_REQUIRED ON) # ── Dependencies via vcpkg find_package ── find_package(GDAL CONFIG REQUIRED) # target: GDAL::GDAL find_package(GEOS CONFIG REQUIRED) # target: GEOS::geos_c find_package(PROJ CONFIG REQUIRED) # target: PROJ::proj find_package(nlohmann_json CONFIG REQUIRED) # target: nlohmann_json::nlohmann_json # ── OpenMP (optional, for country-level parallelism) ── find_package(OpenMP) add_executable(boundaries src/main.cpp src/gpkg_reader.cpp src/geo_merge.cpp src/ghs_enrich.cpp ) target_include_directories(boundaries PRIVATE src) target_link_libraries(boundaries PRIVATE GDAL::GDAL GEOS::geos_c PROJ::proj nlohmann_json::nlohmann_json ) if(OpenMP_CXX_FOUND) target_link_libraries(boundaries PRIVATE OpenMP::OpenMP_CXX) target_compile_definitions(boundaries PRIVATE HAS_OPENMP=1) endif() # Copy data files needed at runtime add_custom_command(TARGET boundaries POST_BUILD COMMAND ${CMAKE_COMMAND} -E copy_if_different ${CMAKE_SOURCE_DIR}/../data/gadm_continent.json $/gadm_continent.json ) ``` ### CMake Presets — `CMakePresets.json` ```json { "version": 6, "configurePresets": [ { "name": "vcpkg", "displayName": "vcpkg (default)", "generator": "Ninja", "binaryDir": "${sourceDir}/build", "cacheVariables": { "CMAKE_TOOLCHAIN_FILE": "$env{VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake", "CMAKE_BUILD_TYPE": "Release" } }, { "name": "vcpkg-win", "inherits": "vcpkg", "displayName": "vcpkg (Windows MSVC)", "generator": "Visual Studio 17 2022", "cacheVariables": { "VCPKG_TARGET_TRIPLET": "x64-windows" } } ], "buildPresets": [ { "name": "release", "configurePreset": "vcpkg", "configuration": "Release" } ] } ``` ### Build & Run ```bash # 1. Install dependencies (one-time, from project root with vcpkg.json) vcpkg install # 2. Configure + build cmake --preset vcpkg # or vcpkg-win on Windows cmake --build build --config Release # 3. Run ./build/boundaries --country=all ./build/boundaries --country=DEU --level=0 ``` Output goes to the same `cache/gadm/` directory — the Node.js server reads these cache files at runtime, so the C++ tool is a **drop-in replacement** for batch generation. --- ## Implementation Order 1. **`gpkg_reader`** — OGR-based GeoPackage read + feature grouping 2. **`geo_merge`** — GEOS union with MakeValid fallback 3. **`ghs_enrich`** — GDAL raster read + PROJ transform + PIP loop 4. **`main`** — CLI harness, cache logic, JSON output 5. **Parallelism** — OpenMP on country loop (optional, easy win) Each module can be tested independently against the existing JSON cache files for correctness.