8.2 KiB
C++ Port — Boundaries Batch Generator
Port the boundaries.ts pipeline to native C++ for 10–100× speedup on geometry union + GHS raster enrichment.
Note
Scaffolding is complete — CLI entry point, module stubs, vcpkg manifest, and CMake build are in place under
cpp/. Thegeo_mergeandghs_enrichmodules are stubs pending implementation.
Why C++
| Bottleneck (TS) | Root cause | C++ remedy |
|---|---|---|
polygon-clipping.union |
Pure-JS, throws on complex polys (DEU) | GEOS GEOSUnaryUnion — battle-tested C engine |
enrichFeatureWithGHS PIP loop |
GC pressure, obj alloc per pixel | Stack-allocated ray-cast, zero alloc |
readRasters (GeoTIFF) |
JS decoder, no tiling | GDAL RasterIO — native COG/tiled reads |
proj4 per pixel |
JS re-init overhead | PROJ C API, reuse PJ* transform handle |
Dependencies
| Library | Role | Notes |
|---|---|---|
| GDAL/OGR 3.x | GeoPackage read, GeoTIFF read, coordinate transforms | Single dep for 3 concerns |
| GEOS 3.12+ | GEOSUnaryUnion, simplify, valid geometry repair |
Used internally by GDAL |
| PROJ | Coordinate transforms (Mollweide → WGS84) | proj_trans with cached PJ* handle |
| nlohmann/json | JSON output (cache files) | Header-only |
| CLI11 | Argument parsing | FetchContent (not vcpkg) |
| SQLite3 | Already bundled with GDAL; direct access if needed |
Install via vcpkg manifest (vcpkg.json is in cpp/):
cd cpp
vcpkg install
Architecture
cpp/
├── CMakeLists.txt # Project root — GDAL/GEOS/PROJ via vcpkg
├── CMakePresets.json # vcpkg, vcpkg-win, dev presets
├── vcpkg.json # Dependency manifest
├── packages/
│ ├── logger/ # Logging utility (kept from boilerplate)
│ └── json/ # JSON utility (kept from boilerplate)
├── src/
│ ├── main.cpp # CLI entry, country loop, cache skip logic
│ ├── gpkg_reader.h/cpp # GeoPackage → features (OGR)
│ ├── geo_merge.h/cpp # Geometry union (GEOS) — stub
│ ├── ghs_enrich.h/cpp # GeoTIFF raster sampling + PIP — stub
│ ├── pip.h # Inline ray-casting point-in-polygon
│ └── types.h # BoundaryFeature, BoundaryResult structs
└── tests/
└── unit/test_pip.cpp # Catch2 PIP unit test
Module Mapping (TS → C++)
1. gpkg_reader — GeoPackage Reading
TS source: gpkg-reader.ts
- Replace custom WKB parser +
better-sqlite3with OGRGDALOpenEx+OGR_L_GetNextFeature - OGR natively reads GeoPackage and returns
OGRGeometry*objects — no manual WKB parsing - Group features by
GID_{level}column, collect geometries per group
GDALDatasetH ds = GDALOpenEx("gadm41.gpkg", GDAL_OF_VECTOR, nullptr, nullptr, nullptr);
OGRLayerH layer = GDALDatasetGetLayer(ds, 0);
OGR_L_SetAttributeFilter(layer, "GID_0 = 'DEU'");
OGRFeatureH feat;
while ((feat = OGR_L_GetNextFeature(layer)) != nullptr) {
OGRGeometryH geom = OGR_F_GetGeometryRef(feat);
// group by GID_{level}, collect geom clones
}
2. geo_merge — Geometry Union
TS source: mergeGeometries() in gpkg-reader.ts
- Replace
polygon-clipping+ pairwise fallback withGEOSUnaryUnion - GEOS handles all edge cases (self-intersections, degeneracies) that crash JS libs
- For invalid inputs:
GEOSMakeValid→ then union
GEOSGeometry* collection = GEOSGeom_createCollection(
GEOS_GEOMETRYCOLLECTION, geoms.data(), geoms.size());
GEOSGeometry* merged = GEOSUnaryUnion(collection);
if (!merged) {
// MakeValid fallback
for (auto& g : geoms) g = GEOSMakeValid(g);
merged = GEOSUnaryUnion(collection);
}
3. ghs_enrich — GeoTIFF Raster Sampling
TS source: enrich-ghs.ts
- Replace
geotiff.jswith GDALGDALRasterIO— supports tiled/COG reads, much faster - Replace JS
proj4()per-pixel with PROJproj_transusing a cachedPJ*handle - Replace Turf PIP with inline ray-casting (already done in TS, trivial in C++)
- Keep stride-based sampling for large windows
// One-time setup
PJ* transform = proj_create_crs_to_crs(ctx, "EPSG:54009", "EPSG:4326", nullptr);
// Read raster window
GDALRasterBandH band = GDALGetRasterBand(ds, 1);
std::vector<float> buf(width * height);
GDALRasterIO(band, GF_Read, minPx, minPy, width, height,
buf.data(), width, height, GDT_Float32, 0, 0);
// Fast pixel loop — no allocations
for (int y = 0; y < height; y += stride) {
for (int x = 0; x < width; x += stride) {
float val = buf[y * width + x];
if (val <= 0 || val == 65535.f) continue;
PJ_COORD in = proj_coord(originX + (minPx+x)*resX, originY + (minPy+y)*resY, 0, 0);
PJ_COORD out = proj_trans(transform, PJ_FWD, in);
if (point_in_geometry(out.xy.x, out.xy.y, geom)) {
totalVal += val * strideFactor;
// ... weighted sums, candidate tracking
}
}
}
4. main.cpp — CLI + Cache Logic
TS source: boundaries.ts
- Read
gadm_continent.jsonwith nlohmann/json - Same skip-if-cached logic (check file existence)
- Output identical JSON format to
cache/gadm/boundary_{code}_{level}.json - Optional: OpenMP
#pragma omp parallel foron the country loop (each country is independent)
Expected Performance
| Operation | TS (current) | C++ (expected) | Speedup |
|---|---|---|---|
| Geometry union (DEU L0, 151 polys) | Crashes / >10min | ~1–3s | ∞ / ~100× |
| GHS enrich (NGA L0, ~6M px window) | ~210s | ~5–15s | ~15× |
| Full batch (all countries, all levels) | Hours, may hang | ~20–40 min | ~10× |
Build & Integration
Tip
gdalpulls inprojandgeostransitively on most triplets, but listing them explicitly invcpkg.jsonensures the CMake targets are always available.
Build & Run
# 1. Install dependencies (one-time, from cpp/ directory)
cd cpp
vcpkg install
# 2. Configure + build (Windows MSVC)
cmake --preset vcpkg-win
cmake --build build --config Release
# 2b. Or Ninja (Linux/macOS)
cmake --preset vcpkg
cmake --build build --config Release
# 3. Run
./build/Release/boundaries --country=all
./build/Release/boundaries --country=DEU --level=0
./build/Release/boundaries --country=NGA --force
CLI Flags
| Flag | Default | Description |
|---|---|---|
--country |
all |
ISO3 country code, or all for batch |
--level |
-1 (all) |
Admin level 0–5, or -1 for all levels |
--cache-dir |
cache/gadm |
Output directory for boundary JSON files |
--gpkg |
data/gadm_410-levels.gpkg |
Path to GADM GeoPackage |
--force |
off | Regenerate even if cache file exists |
Output goes to the same cache/gadm/ directory — the Node.js server reads these cache files at runtime, so the C++ tool is a drop-in replacement for batch generation.
Implementation Status
| Module | Status | Notes |
|---|---|---|
main.cpp |
✅ Done | CLI harness, cache logic, JSON output, OpenMP |
gpkg_reader |
✅ Scaffolded | OGR read + feature grouping — needs GEOS geometry collection |
geo_merge |
✅ Done | GEOSUnaryUnion + MakeValid fallback, WKB lossless roundtrip |
ghs_enrich |
✅ Done | GDAL windowed RasterIO + PROJ + OGR PIP + peak detection |
pip.h |
✅ Done | Inline ray-casting, unit-tested |
types.h |
✅ Done | BoundaryFeature, BoundaryResult structs |
Next Steps
geo_merge— Wire up GEOS: geometry collection →GEOSUnaryUnion→ GeoJSON exportgpkg_reader— Clone OGR geometries into GEOS handles (instead of direct GeoJSON export)ghs_enrich— GDAL raster read + PROJ transform + PIP loop- Validation — Compare output against existing TS-generated cache files
- Parallelism — OpenMP on country loop (already wired, just needs
#pragmatesting)
Each module can be tested independently against the existing JSON cache files for correctness.