gadm-ts/docs/cpp-port.md
2026-03-23 15:32:00 +01:00

8.4 KiB
Raw Blame History

C++ Port — Boundaries Batch Generator

Port the boundaries.ts pipeline to native C++ for 10100× speedup on geometry union + GHS raster enrichment.

Note

Assumes existing C++ boilerplate for argument parsing, logging, and JSON serialization is available.


Why C++

Bottleneck (TS) Root cause C++ remedy
polygon-clipping.union Pure-JS, throws on complex polys (DEU) GEOS GEOSUnaryUnion — battle-tested C engine
enrichFeatureWithGHS PIP loop GC pressure, obj alloc per pixel Stack-allocated ray-cast, zero alloc
readRasters (GeoTIFF) JS decoder, no tiling GDAL RasterIO — native COG/tiled reads
proj4 per pixel JS re-init overhead PROJ C API, reuse PJ* transform handle

Dependencies

Library Role Notes
GDAL/OGR 3.x GeoPackage read, GeoTIFF read, coordinate transforms Single dep for 3 concerns
GEOS 3.12+ GEOSUnaryUnion, simplify, valid geometry repair Used internally by GDAL
nlohmann/json JSON output (cache files) Header-only
SQLite3 Already bundled with GDAL; direct access if needed

On Windows: vcpkg install gdal geos nlohmann-json or use the OSGeo4W SDK.


Architecture

boundaries-cli
├── main.cpp            # CLI entry, country loop, cache skip logic
├── gpkg_reader.h/cpp   # GeoPackage → GeoJSON features
├── geo_merge.h/cpp     # Geometry union (GEOS)
├── ghs_enrich.h/cpp    # GeoTIFF raster sampling + PIP
├── pip.h               # Inline ray-casting point-in-polygon
└── types.h             # BoundaryFeature, BoundaryResult structs

Module Mapping (TS → C++)

1. gpkg_reader — GeoPackage Reading

TS source: gpkg-reader.ts

  • Replace custom WKB parser + better-sqlite3 with OGR GDALOpenEx + OGR_L_GetNextFeature
  • OGR natively reads GeoPackage and returns OGRGeometry* objects — no manual WKB parsing
  • Group features by GID_{level} column, collect geometries per group
GDALDatasetH ds = GDALOpenEx("gadm41.gpkg", GDAL_OF_VECTOR, nullptr, nullptr, nullptr);
OGRLayerH layer = GDALDatasetGetLayer(ds, 0);
OGR_L_SetAttributeFilter(layer, "GID_0 = 'DEU'");
OGRFeatureH feat;
while ((feat = OGR_L_GetNextFeature(layer)) != nullptr) {
    OGRGeometryH geom = OGR_F_GetGeometryRef(feat);
    // group by GID_{level}, collect geom clones
}

2. geo_merge — Geometry Union

TS source: mergeGeometries() in gpkg-reader.ts

  • Replace polygon-clipping + pairwise fallback with GEOSUnaryUnion
  • GEOS handles all edge cases (self-intersections, degeneracies) that crash JS libs
  • For invalid inputs: GEOSMakeValid → then union
GEOSGeometry* collection = GEOSGeom_createCollection(
    GEOS_GEOMETRYCOLLECTION, geoms.data(), geoms.size());
GEOSGeometry* merged = GEOSUnaryUnion(collection);
if (!merged) {
    // MakeValid fallback
    for (auto& g : geoms) g = GEOSMakeValid(g);
    merged = GEOSUnaryUnion(collection);
}

3. ghs_enrich — GeoTIFF Raster Sampling

TS source: enrich-ghs.ts

  • Replace geotiff.js with GDAL GDALRasterIO — supports tiled/COG reads, much faster
  • Replace JS proj4() per-pixel with PROJ proj_trans using a cached PJ* handle
  • Replace Turf PIP with inline ray-casting (already done in TS, trivial in C++)
  • Keep stride-based sampling for large windows
// One-time setup
PJ* transform = proj_create_crs_to_crs(ctx, "EPSG:54009", "EPSG:4326", nullptr);

// Read raster window
GDALRasterBandH band = GDALGetRasterBand(ds, 1);
std::vector<float> buf(width * height);
GDALRasterIO(band, GF_Read, minPx, minPy, width, height,
             buf.data(), width, height, GDT_Float32, 0, 0);

// Fast pixel loop — no allocations
for (int y = 0; y < height; y += stride) {
    for (int x = 0; x < width; x += stride) {
        float val = buf[y * width + x];
        if (val <= 0 || val == 65535.f) continue;
        PJ_COORD in = proj_coord(originX + (minPx+x)*resX, originY + (minPy+y)*resY, 0, 0);
        PJ_COORD out = proj_trans(transform, PJ_FWD, in);
        if (point_in_geometry(out.xy.x, out.xy.y, geom)) {
            totalVal += val * strideFactor;
            // ... weighted sums, candidate tracking
        }
    }
}

4. main.cpp — CLI + Cache Logic

TS source: boundaries.ts

  • Read gadm_continent.json with nlohmann/json
  • Same skip-if-cached logic (check file existence)
  • Output identical JSON format to cache/gadm/boundary_{code}_{level}.json
  • Optional: OpenMP #pragma omp parallel for on the country loop (each country is independent)

Expected Performance

Operation TS (current) C++ (expected) Speedup
Geometry union (DEU L0, 151 polys) Crashes / >10min ~13s ∞ / ~100×
GHS enrich (NGA L0, ~6M px window) ~210s ~515s ~15×
Full batch (all countries, all levels) Hours, may hang ~2040 min ~10×

Build & Integration

vcpkg Manifest — vcpkg.json

{
  "name": "gadm-boundaries",
  "version-string": "1.0.0",
  "dependencies": [
    "gdal",
    "geos",
    "proj",
    "nlohmann-json"
  ]
}

Tip

gdal pulls in proj and geos transitively on most triplets, but listing them explicitly ensures the CMake targets are always available.

CMake — CMakeLists.txt

cmake_minimum_required(VERSION 3.20)
project(gadm-boundaries LANGUAGES CXX)

set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

# ── Dependencies via vcpkg find_package ──
find_package(GDAL CONFIG REQUIRED)        # target: GDAL::GDAL
find_package(GEOS CONFIG REQUIRED)        # target: GEOS::geos_c
find_package(PROJ CONFIG REQUIRED)        # target: PROJ::proj
find_package(nlohmann_json CONFIG REQUIRED) # target: nlohmann_json::nlohmann_json

# ── OpenMP (optional, for country-level parallelism) ──
find_package(OpenMP)

add_executable(boundaries
    src/main.cpp
    src/gpkg_reader.cpp
    src/geo_merge.cpp
    src/ghs_enrich.cpp
)

target_include_directories(boundaries PRIVATE src)

target_link_libraries(boundaries PRIVATE
    GDAL::GDAL
    GEOS::geos_c
    PROJ::proj
    nlohmann_json::nlohmann_json
)

if(OpenMP_CXX_FOUND)
    target_link_libraries(boundaries PRIVATE OpenMP::OpenMP_CXX)
    target_compile_definitions(boundaries PRIVATE HAS_OPENMP=1)
endif()

# Copy data files needed at runtime
add_custom_command(TARGET boundaries POST_BUILD
    COMMAND ${CMAKE_COMMAND} -E copy_if_different
        ${CMAKE_SOURCE_DIR}/../data/gadm_continent.json
        $<TARGET_FILE_DIR:boundaries>/gadm_continent.json
)

CMake Presets — CMakePresets.json

{
  "version": 6,
  "configurePresets": [
    {
      "name": "vcpkg",
      "displayName": "vcpkg (default)",
      "generator": "Ninja",
      "binaryDir": "${sourceDir}/build",
      "cacheVariables": {
        "CMAKE_TOOLCHAIN_FILE": "$env{VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake",
        "CMAKE_BUILD_TYPE": "Release"
      }
    },
    {
      "name": "vcpkg-win",
      "inherits": "vcpkg",
      "displayName": "vcpkg (Windows MSVC)",
      "generator": "Visual Studio 17 2022",
      "cacheVariables": {
        "VCPKG_TARGET_TRIPLET": "x64-windows"
      }
    }
  ],
  "buildPresets": [
    {
      "name": "release",
      "configurePreset": "vcpkg",
      "configuration": "Release"
    }
  ]
}

Build & Run

# 1. Install dependencies (one-time, from project root with vcpkg.json)
vcpkg install

# 2. Configure + build
cmake --preset vcpkg          # or vcpkg-win on Windows
cmake --build build --config Release

# 3. Run
./build/boundaries --country=all
./build/boundaries --country=DEU --level=0

Output goes to the same cache/gadm/ directory — the Node.js server reads these cache files at runtime, so the C++ tool is a drop-in replacement for batch generation.


Implementation Order

  1. gpkg_reader — OGR-based GeoPackage read + feature grouping
  2. geo_merge — GEOS union with MakeValid fallback
  3. ghs_enrich — GDAL raster read + PROJ transform + PIP loop
  4. main — CLI harness, cache logic, JSON output
  5. Parallelism — OpenMP on country loop (optional, easy win)

Each module can be tested independently against the existing JSON cache files for correctness.