8.4 KiB
C++ Port — Boundaries Batch Generator
Port the boundaries.ts pipeline to native C++ for 10–100× speedup on geometry union + GHS raster enrichment.
Note
Assumes existing C++ boilerplate for argument parsing, logging, and JSON serialization is available.
Why C++
| Bottleneck (TS) | Root cause | C++ remedy |
|---|---|---|
polygon-clipping.union |
Pure-JS, throws on complex polys (DEU) | GEOS GEOSUnaryUnion — battle-tested C engine |
enrichFeatureWithGHS PIP loop |
GC pressure, obj alloc per pixel | Stack-allocated ray-cast, zero alloc |
readRasters (GeoTIFF) |
JS decoder, no tiling | GDAL RasterIO — native COG/tiled reads |
proj4 per pixel |
JS re-init overhead | PROJ C API, reuse PJ* transform handle |
Dependencies
| Library | Role | Notes |
|---|---|---|
| GDAL/OGR 3.x | GeoPackage read, GeoTIFF read, coordinate transforms | Single dep for 3 concerns |
| GEOS 3.12+ | GEOSUnaryUnion, simplify, valid geometry repair |
Used internally by GDAL |
| nlohmann/json | JSON output (cache files) | Header-only |
| SQLite3 | Already bundled with GDAL; direct access if needed |
On Windows: vcpkg install gdal geos nlohmann-json or use the OSGeo4W SDK.
Architecture
boundaries-cli
├── main.cpp # CLI entry, country loop, cache skip logic
├── gpkg_reader.h/cpp # GeoPackage → GeoJSON features
├── geo_merge.h/cpp # Geometry union (GEOS)
├── ghs_enrich.h/cpp # GeoTIFF raster sampling + PIP
├── pip.h # Inline ray-casting point-in-polygon
└── types.h # BoundaryFeature, BoundaryResult structs
Module Mapping (TS → C++)
1. gpkg_reader — GeoPackage Reading
TS source: gpkg-reader.ts
- Replace custom WKB parser +
better-sqlite3with OGRGDALOpenEx+OGR_L_GetNextFeature - OGR natively reads GeoPackage and returns
OGRGeometry*objects — no manual WKB parsing - Group features by
GID_{level}column, collect geometries per group
GDALDatasetH ds = GDALOpenEx("gadm41.gpkg", GDAL_OF_VECTOR, nullptr, nullptr, nullptr);
OGRLayerH layer = GDALDatasetGetLayer(ds, 0);
OGR_L_SetAttributeFilter(layer, "GID_0 = 'DEU'");
OGRFeatureH feat;
while ((feat = OGR_L_GetNextFeature(layer)) != nullptr) {
OGRGeometryH geom = OGR_F_GetGeometryRef(feat);
// group by GID_{level}, collect geom clones
}
2. geo_merge — Geometry Union
TS source: mergeGeometries() in gpkg-reader.ts
- Replace
polygon-clipping+ pairwise fallback withGEOSUnaryUnion - GEOS handles all edge cases (self-intersections, degeneracies) that crash JS libs
- For invalid inputs:
GEOSMakeValid→ then union
GEOSGeometry* collection = GEOSGeom_createCollection(
GEOS_GEOMETRYCOLLECTION, geoms.data(), geoms.size());
GEOSGeometry* merged = GEOSUnaryUnion(collection);
if (!merged) {
// MakeValid fallback
for (auto& g : geoms) g = GEOSMakeValid(g);
merged = GEOSUnaryUnion(collection);
}
3. ghs_enrich — GeoTIFF Raster Sampling
TS source: enrich-ghs.ts
- Replace
geotiff.jswith GDALGDALRasterIO— supports tiled/COG reads, much faster - Replace JS
proj4()per-pixel with PROJproj_transusing a cachedPJ*handle - Replace Turf PIP with inline ray-casting (already done in TS, trivial in C++)
- Keep stride-based sampling for large windows
// One-time setup
PJ* transform = proj_create_crs_to_crs(ctx, "EPSG:54009", "EPSG:4326", nullptr);
// Read raster window
GDALRasterBandH band = GDALGetRasterBand(ds, 1);
std::vector<float> buf(width * height);
GDALRasterIO(band, GF_Read, minPx, minPy, width, height,
buf.data(), width, height, GDT_Float32, 0, 0);
// Fast pixel loop — no allocations
for (int y = 0; y < height; y += stride) {
for (int x = 0; x < width; x += stride) {
float val = buf[y * width + x];
if (val <= 0 || val == 65535.f) continue;
PJ_COORD in = proj_coord(originX + (minPx+x)*resX, originY + (minPy+y)*resY, 0, 0);
PJ_COORD out = proj_trans(transform, PJ_FWD, in);
if (point_in_geometry(out.xy.x, out.xy.y, geom)) {
totalVal += val * strideFactor;
// ... weighted sums, candidate tracking
}
}
}
4. main.cpp — CLI + Cache Logic
TS source: boundaries.ts
- Read
gadm_continent.jsonwith nlohmann/json - Same skip-if-cached logic (check file existence)
- Output identical JSON format to
cache/gadm/boundary_{code}_{level}.json - Optional: OpenMP
#pragma omp parallel foron the country loop (each country is independent)
Expected Performance
| Operation | TS (current) | C++ (expected) | Speedup |
|---|---|---|---|
| Geometry union (DEU L0, 151 polys) | Crashes / >10min | ~1–3s | ∞ / ~100× |
| GHS enrich (NGA L0, ~6M px window) | ~210s | ~5–15s | ~15× |
| Full batch (all countries, all levels) | Hours, may hang | ~20–40 min | ~10× |
Build & Integration
vcpkg Manifest — vcpkg.json
{
"name": "gadm-boundaries",
"version-string": "1.0.0",
"dependencies": [
"gdal",
"geos",
"proj",
"nlohmann-json"
]
}
Tip
gdalpulls inprojandgeostransitively on most triplets, but listing them explicitly ensures the CMake targets are always available.
CMake — CMakeLists.txt
cmake_minimum_required(VERSION 3.20)
project(gadm-boundaries LANGUAGES CXX)
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
# ── Dependencies via vcpkg find_package ──
find_package(GDAL CONFIG REQUIRED) # target: GDAL::GDAL
find_package(GEOS CONFIG REQUIRED) # target: GEOS::geos_c
find_package(PROJ CONFIG REQUIRED) # target: PROJ::proj
find_package(nlohmann_json CONFIG REQUIRED) # target: nlohmann_json::nlohmann_json
# ── OpenMP (optional, for country-level parallelism) ──
find_package(OpenMP)
add_executable(boundaries
src/main.cpp
src/gpkg_reader.cpp
src/geo_merge.cpp
src/ghs_enrich.cpp
)
target_include_directories(boundaries PRIVATE src)
target_link_libraries(boundaries PRIVATE
GDAL::GDAL
GEOS::geos_c
PROJ::proj
nlohmann_json::nlohmann_json
)
if(OpenMP_CXX_FOUND)
target_link_libraries(boundaries PRIVATE OpenMP::OpenMP_CXX)
target_compile_definitions(boundaries PRIVATE HAS_OPENMP=1)
endif()
# Copy data files needed at runtime
add_custom_command(TARGET boundaries POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy_if_different
${CMAKE_SOURCE_DIR}/../data/gadm_continent.json
$<TARGET_FILE_DIR:boundaries>/gadm_continent.json
)
CMake Presets — CMakePresets.json
{
"version": 6,
"configurePresets": [
{
"name": "vcpkg",
"displayName": "vcpkg (default)",
"generator": "Ninja",
"binaryDir": "${sourceDir}/build",
"cacheVariables": {
"CMAKE_TOOLCHAIN_FILE": "$env{VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake",
"CMAKE_BUILD_TYPE": "Release"
}
},
{
"name": "vcpkg-win",
"inherits": "vcpkg",
"displayName": "vcpkg (Windows MSVC)",
"generator": "Visual Studio 17 2022",
"cacheVariables": {
"VCPKG_TARGET_TRIPLET": "x64-windows"
}
}
],
"buildPresets": [
{
"name": "release",
"configurePreset": "vcpkg",
"configuration": "Release"
}
]
}
Build & Run
# 1. Install dependencies (one-time, from project root with vcpkg.json)
vcpkg install
# 2. Configure + build
cmake --preset vcpkg # or vcpkg-win on Windows
cmake --build build --config Release
# 3. Run
./build/boundaries --country=all
./build/boundaries --country=DEU --level=0
Output goes to the same cache/gadm/ directory — the Node.js server reads these cache files at runtime, so the C++ tool is a drop-in replacement for batch generation.
Implementation Order
gpkg_reader— OGR-based GeoPackage read + feature groupinggeo_merge— GEOS union with MakeValid fallbackghs_enrich— GDAL raster read + PROJ transform + PIP loopmain— CLI harness, cache logic, JSON output- Parallelism — OpenMP on country loop (optional, easy win)
Each module can be tested independently against the existing JSON cache files for correctness.