274 lines
8.4 KiB
Markdown
274 lines
8.4 KiB
Markdown
# C++ Port — Boundaries Batch Generator
|
||
|
||
Port the `boundaries.ts` pipeline to native C++ for **10–100× speedup** on geometry union + GHS raster enrichment.
|
||
|
||
> [!NOTE]
|
||
> Assumes existing C++ boilerplate for argument parsing, logging, and JSON serialization is available.
|
||
|
||
---
|
||
|
||
## Why C++
|
||
|
||
| Bottleneck (TS) | Root cause | C++ remedy |
|
||
|---|---|---|
|
||
| `polygon-clipping.union` | Pure-JS, throws on complex polys (DEU) | GEOS `GEOSUnaryUnion` — battle-tested C engine |
|
||
| `enrichFeatureWithGHS` PIP loop | GC pressure, obj alloc per pixel | Stack-allocated ray-cast, zero alloc |
|
||
| `readRasters` (GeoTIFF) | JS decoder, no tiling | GDAL `RasterIO` — native COG/tiled reads |
|
||
| `proj4` per pixel | JS re-init overhead | PROJ C API, reuse `PJ*` transform handle |
|
||
|
||
---
|
||
|
||
## Dependencies
|
||
|
||
| Library | Role | Notes |
|
||
|---|---|---|
|
||
| **GDAL/OGR 3.x** | GeoPackage read, GeoTIFF read, coordinate transforms | Single dep for 3 concerns |
|
||
| **GEOS 3.12+** | `GEOSUnaryUnion`, simplify, valid geometry repair | Used internally by GDAL |
|
||
| **nlohmann/json** | JSON output (cache files) | Header-only |
|
||
| **SQLite3** | Already bundled with GDAL; direct access if needed | |
|
||
|
||
On Windows: `vcpkg install gdal geos nlohmann-json` or use the OSGeo4W SDK.
|
||
|
||
---
|
||
|
||
## Architecture
|
||
|
||
```
|
||
boundaries-cli
|
||
├── main.cpp # CLI entry, country loop, cache skip logic
|
||
├── gpkg_reader.h/cpp # GeoPackage → GeoJSON features
|
||
├── geo_merge.h/cpp # Geometry union (GEOS)
|
||
├── ghs_enrich.h/cpp # GeoTIFF raster sampling + PIP
|
||
├── pip.h # Inline ray-casting point-in-polygon
|
||
└── types.h # BoundaryFeature, BoundaryResult structs
|
||
```
|
||
|
||
---
|
||
|
||
## Module Mapping (TS → C++)
|
||
|
||
### 1. `gpkg_reader` — GeoPackage Reading
|
||
|
||
**TS source:** [gpkg-reader.ts](file:///c:/Users/zx/Desktop/polymech/pm-pics/packages/gadm/src/gpkg-reader.ts)
|
||
|
||
- Replace custom WKB parser + `better-sqlite3` with **OGR `GDALOpenEx`** + `OGR_L_GetNextFeature`
|
||
- OGR natively reads GeoPackage and returns `OGRGeometry*` objects — no manual WKB parsing
|
||
- Group features by `GID_{level}` column, collect geometries per group
|
||
|
||
```cpp
|
||
GDALDatasetH ds = GDALOpenEx("gadm41.gpkg", GDAL_OF_VECTOR, nullptr, nullptr, nullptr);
|
||
OGRLayerH layer = GDALDatasetGetLayer(ds, 0);
|
||
OGR_L_SetAttributeFilter(layer, "GID_0 = 'DEU'");
|
||
OGRFeatureH feat;
|
||
while ((feat = OGR_L_GetNextFeature(layer)) != nullptr) {
|
||
OGRGeometryH geom = OGR_F_GetGeometryRef(feat);
|
||
// group by GID_{level}, collect geom clones
|
||
}
|
||
```
|
||
|
||
### 2. `geo_merge` — Geometry Union
|
||
|
||
**TS source:** `mergeGeometries()` in gpkg-reader.ts
|
||
|
||
- Replace `polygon-clipping` + pairwise fallback with **`GEOSUnaryUnion`**
|
||
- GEOS handles all edge cases (self-intersections, degeneracies) that crash JS libs
|
||
- For invalid inputs: `GEOSMakeValid` → then union
|
||
|
||
```cpp
|
||
GEOSGeometry* collection = GEOSGeom_createCollection(
|
||
GEOS_GEOMETRYCOLLECTION, geoms.data(), geoms.size());
|
||
GEOSGeometry* merged = GEOSUnaryUnion(collection);
|
||
if (!merged) {
|
||
// MakeValid fallback
|
||
for (auto& g : geoms) g = GEOSMakeValid(g);
|
||
merged = GEOSUnaryUnion(collection);
|
||
}
|
||
```
|
||
|
||
### 3. `ghs_enrich` — GeoTIFF Raster Sampling
|
||
|
||
**TS source:** [enrich-ghs.ts](file:///c:/Users/zx/Desktop/polymech/pm-pics/packages/gadm/src/enrich-ghs.ts)
|
||
|
||
- Replace `geotiff.js` with **GDAL `GDALRasterIO`** — supports tiled/COG reads, much faster
|
||
- Replace JS `proj4()` per-pixel with **PROJ `proj_trans`** using a cached `PJ*` handle
|
||
- Replace Turf PIP with inline ray-casting (already done in TS, trivial in C++)
|
||
- Keep stride-based sampling for large windows
|
||
|
||
```cpp
|
||
// One-time setup
|
||
PJ* transform = proj_create_crs_to_crs(ctx, "EPSG:54009", "EPSG:4326", nullptr);
|
||
|
||
// Read raster window
|
||
GDALRasterBandH band = GDALGetRasterBand(ds, 1);
|
||
std::vector<float> buf(width * height);
|
||
GDALRasterIO(band, GF_Read, minPx, minPy, width, height,
|
||
buf.data(), width, height, GDT_Float32, 0, 0);
|
||
|
||
// Fast pixel loop — no allocations
|
||
for (int y = 0; y < height; y += stride) {
|
||
for (int x = 0; x < width; x += stride) {
|
||
float val = buf[y * width + x];
|
||
if (val <= 0 || val == 65535.f) continue;
|
||
PJ_COORD in = proj_coord(originX + (minPx+x)*resX, originY + (minPy+y)*resY, 0, 0);
|
||
PJ_COORD out = proj_trans(transform, PJ_FWD, in);
|
||
if (point_in_geometry(out.xy.x, out.xy.y, geom)) {
|
||
totalVal += val * strideFactor;
|
||
// ... weighted sums, candidate tracking
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### 4. `main.cpp` — CLI + Cache Logic
|
||
|
||
**TS source:** [boundaries.ts](file:///c:/Users/zx/Desktop/polymech/pm-pics/packages/gadm/scripts/boundaries.ts)
|
||
|
||
- Read `gadm_continent.json` with nlohmann/json
|
||
- Same skip-if-cached logic (check file existence)
|
||
- Output identical JSON format to `cache/gadm/boundary_{code}_{level}.json`
|
||
- Optional: OpenMP `#pragma omp parallel for` on the country loop (each country is independent)
|
||
|
||
---
|
||
|
||
## Expected Performance
|
||
|
||
| Operation | TS (current) | C++ (expected) | Speedup |
|
||
|---|---|---|---|
|
||
| Geometry union (DEU L0, 151 polys) | **Crashes / >10min** | ~1–3s | ∞ / ~100× |
|
||
| GHS enrich (NGA L0, ~6M px window) | ~210s | ~5–15s | ~15× |
|
||
| Full batch (all countries, all levels) | **Hours, may hang** | ~20–40 min | ~10× |
|
||
|
||
---
|
||
|
||
## Build & Integration
|
||
|
||
### vcpkg Manifest — `vcpkg.json`
|
||
|
||
```json
|
||
{
|
||
"name": "gadm-boundaries",
|
||
"version-string": "1.0.0",
|
||
"dependencies": [
|
||
"gdal",
|
||
"geos",
|
||
"proj",
|
||
"nlohmann-json"
|
||
]
|
||
}
|
||
```
|
||
|
||
> [!TIP]
|
||
> `gdal` pulls in `proj` and `geos` transitively on most triplets, but listing them explicitly ensures the CMake targets are always available.
|
||
|
||
### CMake — `CMakeLists.txt`
|
||
|
||
```cmake
|
||
cmake_minimum_required(VERSION 3.20)
|
||
project(gadm-boundaries LANGUAGES CXX)
|
||
|
||
set(CMAKE_CXX_STANDARD 20)
|
||
set(CMAKE_CXX_STANDARD_REQUIRED ON)
|
||
|
||
# ── Dependencies via vcpkg find_package ──
|
||
find_package(GDAL CONFIG REQUIRED) # target: GDAL::GDAL
|
||
find_package(GEOS CONFIG REQUIRED) # target: GEOS::geos_c
|
||
find_package(PROJ CONFIG REQUIRED) # target: PROJ::proj
|
||
find_package(nlohmann_json CONFIG REQUIRED) # target: nlohmann_json::nlohmann_json
|
||
|
||
# ── OpenMP (optional, for country-level parallelism) ──
|
||
find_package(OpenMP)
|
||
|
||
add_executable(boundaries
|
||
src/main.cpp
|
||
src/gpkg_reader.cpp
|
||
src/geo_merge.cpp
|
||
src/ghs_enrich.cpp
|
||
)
|
||
|
||
target_include_directories(boundaries PRIVATE src)
|
||
|
||
target_link_libraries(boundaries PRIVATE
|
||
GDAL::GDAL
|
||
GEOS::geos_c
|
||
PROJ::proj
|
||
nlohmann_json::nlohmann_json
|
||
)
|
||
|
||
if(OpenMP_CXX_FOUND)
|
||
target_link_libraries(boundaries PRIVATE OpenMP::OpenMP_CXX)
|
||
target_compile_definitions(boundaries PRIVATE HAS_OPENMP=1)
|
||
endif()
|
||
|
||
# Copy data files needed at runtime
|
||
add_custom_command(TARGET boundaries POST_BUILD
|
||
COMMAND ${CMAKE_COMMAND} -E copy_if_different
|
||
${CMAKE_SOURCE_DIR}/../data/gadm_continent.json
|
||
$<TARGET_FILE_DIR:boundaries>/gadm_continent.json
|
||
)
|
||
```
|
||
|
||
### CMake Presets — `CMakePresets.json`
|
||
|
||
```json
|
||
{
|
||
"version": 6,
|
||
"configurePresets": [
|
||
{
|
||
"name": "vcpkg",
|
||
"displayName": "vcpkg (default)",
|
||
"generator": "Ninja",
|
||
"binaryDir": "${sourceDir}/build",
|
||
"cacheVariables": {
|
||
"CMAKE_TOOLCHAIN_FILE": "$env{VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake",
|
||
"CMAKE_BUILD_TYPE": "Release"
|
||
}
|
||
},
|
||
{
|
||
"name": "vcpkg-win",
|
||
"inherits": "vcpkg",
|
||
"displayName": "vcpkg (Windows MSVC)",
|
||
"generator": "Visual Studio 17 2022",
|
||
"cacheVariables": {
|
||
"VCPKG_TARGET_TRIPLET": "x64-windows"
|
||
}
|
||
}
|
||
],
|
||
"buildPresets": [
|
||
{
|
||
"name": "release",
|
||
"configurePreset": "vcpkg",
|
||
"configuration": "Release"
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
### Build & Run
|
||
|
||
```bash
|
||
# 1. Install dependencies (one-time, from project root with vcpkg.json)
|
||
vcpkg install
|
||
|
||
# 2. Configure + build
|
||
cmake --preset vcpkg # or vcpkg-win on Windows
|
||
cmake --build build --config Release
|
||
|
||
# 3. Run
|
||
./build/boundaries --country=all
|
||
./build/boundaries --country=DEU --level=0
|
||
```
|
||
|
||
Output goes to the same `cache/gadm/` directory — the Node.js server reads these cache files at runtime, so the C++ tool is a **drop-in replacement** for batch generation.
|
||
|
||
---
|
||
|
||
## Implementation Order
|
||
|
||
1. **`gpkg_reader`** — OGR-based GeoPackage read + feature grouping
|
||
2. **`geo_merge`** — GEOS union with MakeValid fallback
|
||
3. **`ghs_enrich`** — GDAL raster read + PROJ transform + PIP loop
|
||
4. **`main`** — CLI harness, cache logic, JSON output
|
||
5. **Parallelism** — OpenMP on country loop (optional, easy win)
|
||
|
||
Each module can be tested independently against the existing JSON cache files for correctness.
|