gadm-ts/docs/cpp-port.md
2026-03-23 15:32:00 +01:00

274 lines
8.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# C++ Port — Boundaries Batch Generator
Port the `boundaries.ts` pipeline to native C++ for **10100× speedup** on geometry union + GHS raster enrichment.
> [!NOTE]
> Assumes existing C++ boilerplate for argument parsing, logging, and JSON serialization is available.
---
## Why C++
| Bottleneck (TS) | Root cause | C++ remedy |
|---|---|---|
| `polygon-clipping.union` | Pure-JS, throws on complex polys (DEU) | GEOS `GEOSUnaryUnion` — battle-tested C engine |
| `enrichFeatureWithGHS` PIP loop | GC pressure, obj alloc per pixel | Stack-allocated ray-cast, zero alloc |
| `readRasters` (GeoTIFF) | JS decoder, no tiling | GDAL `RasterIO` — native COG/tiled reads |
| `proj4` per pixel | JS re-init overhead | PROJ C API, reuse `PJ*` transform handle |
---
## Dependencies
| Library | Role | Notes |
|---|---|---|
| **GDAL/OGR 3.x** | GeoPackage read, GeoTIFF read, coordinate transforms | Single dep for 3 concerns |
| **GEOS 3.12+** | `GEOSUnaryUnion`, simplify, valid geometry repair | Used internally by GDAL |
| **nlohmann/json** | JSON output (cache files) | Header-only |
| **SQLite3** | Already bundled with GDAL; direct access if needed | |
On Windows: `vcpkg install gdal geos nlohmann-json` or use the OSGeo4W SDK.
---
## Architecture
```
boundaries-cli
├── main.cpp # CLI entry, country loop, cache skip logic
├── gpkg_reader.h/cpp # GeoPackage → GeoJSON features
├── geo_merge.h/cpp # Geometry union (GEOS)
├── ghs_enrich.h/cpp # GeoTIFF raster sampling + PIP
├── pip.h # Inline ray-casting point-in-polygon
└── types.h # BoundaryFeature, BoundaryResult structs
```
---
## Module Mapping (TS → C++)
### 1. `gpkg_reader` — GeoPackage Reading
**TS source:** [gpkg-reader.ts](file:///c:/Users/zx/Desktop/polymech/pm-pics/packages/gadm/src/gpkg-reader.ts)
- Replace custom WKB parser + `better-sqlite3` with **OGR `GDALOpenEx`** + `OGR_L_GetNextFeature`
- OGR natively reads GeoPackage and returns `OGRGeometry*` objects — no manual WKB parsing
- Group features by `GID_{level}` column, collect geometries per group
```cpp
GDALDatasetH ds = GDALOpenEx("gadm41.gpkg", GDAL_OF_VECTOR, nullptr, nullptr, nullptr);
OGRLayerH layer = GDALDatasetGetLayer(ds, 0);
OGR_L_SetAttributeFilter(layer, "GID_0 = 'DEU'");
OGRFeatureH feat;
while ((feat = OGR_L_GetNextFeature(layer)) != nullptr) {
OGRGeometryH geom = OGR_F_GetGeometryRef(feat);
// group by GID_{level}, collect geom clones
}
```
### 2. `geo_merge` — Geometry Union
**TS source:** `mergeGeometries()` in gpkg-reader.ts
- Replace `polygon-clipping` + pairwise fallback with **`GEOSUnaryUnion`**
- GEOS handles all edge cases (self-intersections, degeneracies) that crash JS libs
- For invalid inputs: `GEOSMakeValid` → then union
```cpp
GEOSGeometry* collection = GEOSGeom_createCollection(
GEOS_GEOMETRYCOLLECTION, geoms.data(), geoms.size());
GEOSGeometry* merged = GEOSUnaryUnion(collection);
if (!merged) {
// MakeValid fallback
for (auto& g : geoms) g = GEOSMakeValid(g);
merged = GEOSUnaryUnion(collection);
}
```
### 3. `ghs_enrich` — GeoTIFF Raster Sampling
**TS source:** [enrich-ghs.ts](file:///c:/Users/zx/Desktop/polymech/pm-pics/packages/gadm/src/enrich-ghs.ts)
- Replace `geotiff.js` with **GDAL `GDALRasterIO`** — supports tiled/COG reads, much faster
- Replace JS `proj4()` per-pixel with **PROJ `proj_trans`** using a cached `PJ*` handle
- Replace Turf PIP with inline ray-casting (already done in TS, trivial in C++)
- Keep stride-based sampling for large windows
```cpp
// One-time setup
PJ* transform = proj_create_crs_to_crs(ctx, "EPSG:54009", "EPSG:4326", nullptr);
// Read raster window
GDALRasterBandH band = GDALGetRasterBand(ds, 1);
std::vector<float> buf(width * height);
GDALRasterIO(band, GF_Read, minPx, minPy, width, height,
buf.data(), width, height, GDT_Float32, 0, 0);
// Fast pixel loop — no allocations
for (int y = 0; y < height; y += stride) {
for (int x = 0; x < width; x += stride) {
float val = buf[y * width + x];
if (val <= 0 || val == 65535.f) continue;
PJ_COORD in = proj_coord(originX + (minPx+x)*resX, originY + (minPy+y)*resY, 0, 0);
PJ_COORD out = proj_trans(transform, PJ_FWD, in);
if (point_in_geometry(out.xy.x, out.xy.y, geom)) {
totalVal += val * strideFactor;
// ... weighted sums, candidate tracking
}
}
}
```
### 4. `main.cpp` — CLI + Cache Logic
**TS source:** [boundaries.ts](file:///c:/Users/zx/Desktop/polymech/pm-pics/packages/gadm/scripts/boundaries.ts)
- Read `gadm_continent.json` with nlohmann/json
- Same skip-if-cached logic (check file existence)
- Output identical JSON format to `cache/gadm/boundary_{code}_{level}.json`
- Optional: OpenMP `#pragma omp parallel for` on the country loop (each country is independent)
---
## Expected Performance
| Operation | TS (current) | C++ (expected) | Speedup |
|---|---|---|---|
| Geometry union (DEU L0, 151 polys) | **Crashes / >10min** | ~13s | ∞ / ~100× |
| GHS enrich (NGA L0, ~6M px window) | ~210s | ~515s | ~15× |
| Full batch (all countries, all levels) | **Hours, may hang** | ~2040 min | ~10× |
---
## Build & Integration
### vcpkg Manifest — `vcpkg.json`
```json
{
"name": "gadm-boundaries",
"version-string": "1.0.0",
"dependencies": [
"gdal",
"geos",
"proj",
"nlohmann-json"
]
}
```
> [!TIP]
> `gdal` pulls in `proj` and `geos` transitively on most triplets, but listing them explicitly ensures the CMake targets are always available.
### CMake — `CMakeLists.txt`
```cmake
cmake_minimum_required(VERSION 3.20)
project(gadm-boundaries LANGUAGES CXX)
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
# ── Dependencies via vcpkg find_package ──
find_package(GDAL CONFIG REQUIRED) # target: GDAL::GDAL
find_package(GEOS CONFIG REQUIRED) # target: GEOS::geos_c
find_package(PROJ CONFIG REQUIRED) # target: PROJ::proj
find_package(nlohmann_json CONFIG REQUIRED) # target: nlohmann_json::nlohmann_json
# ── OpenMP (optional, for country-level parallelism) ──
find_package(OpenMP)
add_executable(boundaries
src/main.cpp
src/gpkg_reader.cpp
src/geo_merge.cpp
src/ghs_enrich.cpp
)
target_include_directories(boundaries PRIVATE src)
target_link_libraries(boundaries PRIVATE
GDAL::GDAL
GEOS::geos_c
PROJ::proj
nlohmann_json::nlohmann_json
)
if(OpenMP_CXX_FOUND)
target_link_libraries(boundaries PRIVATE OpenMP::OpenMP_CXX)
target_compile_definitions(boundaries PRIVATE HAS_OPENMP=1)
endif()
# Copy data files needed at runtime
add_custom_command(TARGET boundaries POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy_if_different
${CMAKE_SOURCE_DIR}/../data/gadm_continent.json
$<TARGET_FILE_DIR:boundaries>/gadm_continent.json
)
```
### CMake Presets — `CMakePresets.json`
```json
{
"version": 6,
"configurePresets": [
{
"name": "vcpkg",
"displayName": "vcpkg (default)",
"generator": "Ninja",
"binaryDir": "${sourceDir}/build",
"cacheVariables": {
"CMAKE_TOOLCHAIN_FILE": "$env{VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake",
"CMAKE_BUILD_TYPE": "Release"
}
},
{
"name": "vcpkg-win",
"inherits": "vcpkg",
"displayName": "vcpkg (Windows MSVC)",
"generator": "Visual Studio 17 2022",
"cacheVariables": {
"VCPKG_TARGET_TRIPLET": "x64-windows"
}
}
],
"buildPresets": [
{
"name": "release",
"configurePreset": "vcpkg",
"configuration": "Release"
}
]
}
```
### Build & Run
```bash
# 1. Install dependencies (one-time, from project root with vcpkg.json)
vcpkg install
# 2. Configure + build
cmake --preset vcpkg # or vcpkg-win on Windows
cmake --build build --config Release
# 3. Run
./build/boundaries --country=all
./build/boundaries --country=DEU --level=0
```
Output goes to the same `cache/gadm/` directory — the Node.js server reads these cache files at runtime, so the C++ tool is a **drop-in replacement** for batch generation.
---
## Implementation Order
1. **`gpkg_reader`** — OGR-based GeoPackage read + feature grouping
2. **`geo_merge`** — GEOS union with MakeValid fallback
3. **`ghs_enrich`** — GDAL raster read + PROJ transform + PIP loop
4. **`main`** — CLI harness, cache logic, JSON output
5. **Parallelism** — OpenMP on country loop (optional, easy win)
Each module can be tested independently against the existing JSON cache files for correctness.