gadm-ts/docs/cpp-port.md
2026-03-23 16:57:01 +01:00

220 lines
8.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# C++ Port — Boundaries Batch Generator
Port the `boundaries.ts` pipeline to native C++ for **10100× speedup** on geometry union + GHS raster enrichment.
> [!NOTE]
> Scaffolding is complete — CLI entry point, module stubs, vcpkg manifest, and CMake build are in place under `cpp/`.
> The `geo_merge` and `ghs_enrich` modules are stubs pending implementation.
---
## Why C++
| Bottleneck (TS) | Root cause | C++ remedy |
|---|---|---|
| `polygon-clipping.union` | Pure-JS, throws on complex polys (DEU) | GEOS `GEOSUnaryUnion` — battle-tested C engine |
| `enrichFeatureWithGHS` PIP loop | GC pressure, obj alloc per pixel | Stack-allocated ray-cast, zero alloc |
| `readRasters` (GeoTIFF) | JS decoder, no tiling | GDAL `RasterIO` — native COG/tiled reads |
| `proj4` per pixel | JS re-init overhead | PROJ C API, reuse `PJ*` transform handle |
---
## Dependencies
| Library | Role | Notes |
|---|---|---|
| **GDAL/OGR 3.x** | GeoPackage read, GeoTIFF read, coordinate transforms | Single dep for 3 concerns |
| **GEOS 3.12+** | `GEOSUnaryUnion`, simplify, valid geometry repair | Used internally by GDAL |
| **PROJ** | Coordinate transforms (Mollweide → WGS84) | `proj_trans` with cached `PJ*` handle |
| **nlohmann/json** | JSON output (cache files) | Header-only |
| **CLI11** | Argument parsing | FetchContent (not vcpkg) |
| **SQLite3** | Already bundled with GDAL; direct access if needed | |
Install via vcpkg manifest (`vcpkg.json` is in `cpp/`):
```bash
cd cpp
vcpkg install
```
---
## Architecture
```
cpp/
├── CMakeLists.txt # Project root — GDAL/GEOS/PROJ via vcpkg
├── CMakePresets.json # vcpkg, vcpkg-win, dev presets
├── vcpkg.json # Dependency manifest
├── packages/
│ ├── logger/ # Logging utility (kept from boilerplate)
│ └── json/ # JSON utility (kept from boilerplate)
├── src/
│ ├── main.cpp # CLI entry, country loop, cache skip logic
│ ├── gpkg_reader.h/cpp # GeoPackage → features (OGR)
│ ├── geo_merge.h/cpp # Geometry union (GEOS) — stub
│ ├── ghs_enrich.h/cpp # GeoTIFF raster sampling + PIP — stub
│ ├── pip.h # Inline ray-casting point-in-polygon
│ └── types.h # BoundaryFeature, BoundaryResult structs
└── tests/
└── unit/test_pip.cpp # Catch2 PIP unit test
```
---
## Module Mapping (TS → C++)
### 1. `gpkg_reader` — GeoPackage Reading
**TS source:** [gpkg-reader.ts](file:///c:/Users/zx/Desktop/polymech/pm-pics/packages/gadm/src/gpkg-reader.ts)
- Replace custom WKB parser + `better-sqlite3` with **OGR `GDALOpenEx`** + `OGR_L_GetNextFeature`
- OGR natively reads GeoPackage and returns `OGRGeometry*` objects — no manual WKB parsing
- Group features by `GID_{level}` column, collect geometries per group
```cpp
GDALDatasetH ds = GDALOpenEx("gadm41.gpkg", GDAL_OF_VECTOR, nullptr, nullptr, nullptr);
OGRLayerH layer = GDALDatasetGetLayer(ds, 0);
OGR_L_SetAttributeFilter(layer, "GID_0 = 'DEU'");
OGRFeatureH feat;
while ((feat = OGR_L_GetNextFeature(layer)) != nullptr) {
OGRGeometryH geom = OGR_F_GetGeometryRef(feat);
// group by GID_{level}, collect geom clones
}
```
### 2. `geo_merge` — Geometry Union
**TS source:** `mergeGeometries()` in gpkg-reader.ts
- Replace `polygon-clipping` + pairwise fallback with **`GEOSUnaryUnion`**
- GEOS handles all edge cases (self-intersections, degeneracies) that crash JS libs
- For invalid inputs: `GEOSMakeValid` → then union
```cpp
GEOSGeometry* collection = GEOSGeom_createCollection(
GEOS_GEOMETRYCOLLECTION, geoms.data(), geoms.size());
GEOSGeometry* merged = GEOSUnaryUnion(collection);
if (!merged) {
// MakeValid fallback
for (auto& g : geoms) g = GEOSMakeValid(g);
merged = GEOSUnaryUnion(collection);
}
```
### 3. `ghs_enrich` — GeoTIFF Raster Sampling
**TS source:** [enrich-ghs.ts](file:///c:/Users/zx/Desktop/polymech/pm-pics/packages/gadm/src/enrich-ghs.ts)
- Replace `geotiff.js` with **GDAL `GDALRasterIO`** — supports tiled/COG reads, much faster
- Replace JS `proj4()` per-pixel with **PROJ `proj_trans`** using a cached `PJ*` handle
- Replace Turf PIP with inline ray-casting (already done in TS, trivial in C++)
- Keep stride-based sampling for large windows
```cpp
// One-time setup
PJ* transform = proj_create_crs_to_crs(ctx, "EPSG:54009", "EPSG:4326", nullptr);
// Read raster window
GDALRasterBandH band = GDALGetRasterBand(ds, 1);
std::vector<float> buf(width * height);
GDALRasterIO(band, GF_Read, minPx, minPy, width, height,
buf.data(), width, height, GDT_Float32, 0, 0);
// Fast pixel loop — no allocations
for (int y = 0; y < height; y += stride) {
for (int x = 0; x < width; x += stride) {
float val = buf[y * width + x];
if (val <= 0 || val == 65535.f) continue;
PJ_COORD in = proj_coord(originX + (minPx+x)*resX, originY + (minPy+y)*resY, 0, 0);
PJ_COORD out = proj_trans(transform, PJ_FWD, in);
if (point_in_geometry(out.xy.x, out.xy.y, geom)) {
totalVal += val * strideFactor;
// ... weighted sums, candidate tracking
}
}
}
```
### 4. `main.cpp` — CLI + Cache Logic
**TS source:** [boundaries.ts](file:///c:/Users/zx/Desktop/polymech/pm-pics/packages/gadm/scripts/boundaries.ts)
- Read `gadm_continent.json` with nlohmann/json
- Same skip-if-cached logic (check file existence)
- Output identical JSON format to `cache/gadm/boundary_{code}_{level}.json`
- Optional: OpenMP `#pragma omp parallel for` on the country loop (each country is independent)
---
## Expected Performance
| Operation | TS (current) | C++ (expected) | Speedup |
|---|---|---|---|
| Geometry union (DEU L0, 151 polys) | **Crashes / >10min** | ~13s | ∞ / ~100× |
| GHS enrich (NGA L0, ~6M px window) | ~210s | ~515s | ~15× |
| Full batch (all countries, all levels) | **Hours, may hang** | ~2040 min | ~10× |
---
## Build & Integration
> [!TIP]
> `gdal` pulls in `proj` and `geos` transitively on most triplets, but listing them explicitly in `vcpkg.json` ensures the CMake targets are always available.
### Build & Run
```bash
# 1. Install dependencies (one-time, from cpp/ directory)
cd cpp
vcpkg install
# 2. Configure + build (Windows MSVC)
cmake --preset vcpkg-win
cmake --build build --config Release
# 2b. Or Ninja (Linux/macOS)
cmake --preset vcpkg
cmake --build build --config Release
# 3. Run
./build/Release/boundaries --country=all
./build/Release/boundaries --country=DEU --level=0
./build/Release/boundaries --country=NGA --force
```
### CLI Flags
| Flag | Default | Description |
|---|---|---|
| `--country` | `all` | ISO3 country code, or `all` for batch |
| `--level` | `-1` (all) | Admin level 05, or -1 for all levels |
| `--cache-dir` | `cache/gadm` | Output directory for boundary JSON files |
| `--gpkg` | `data/gadm_410-levels.gpkg` | Path to GADM GeoPackage |
| `--force` | off | Regenerate even if cache file exists |
Output goes to the same `cache/gadm/` directory — the Node.js server reads these cache files at runtime, so the C++ tool is a **drop-in replacement** for batch generation.
---
## Implementation Status
| Module | Status | Notes |
|---|---|---|
| `main.cpp` | ✅ Done | CLI harness, cache logic, JSON output, OpenMP |
| `gpkg_reader` | ✅ Scaffolded | OGR read + feature grouping — needs GEOS geometry collection |
| `geo_merge` | ✅ Done | GEOSUnaryUnion + MakeValid fallback, WKB lossless roundtrip |
| `ghs_enrich` | ✅ Done | GDAL windowed RasterIO + PROJ + OGR PIP + peak detection |
| `pip.h` | ✅ Done | Inline ray-casting, unit-tested |
| `types.h` | ✅ Done | `BoundaryFeature`, `BoundaryResult` structs |
### Next Steps
1. **`geo_merge`** — Wire up GEOS: geometry collection → `GEOSUnaryUnion` → GeoJSON export
2. **`gpkg_reader`** — Clone OGR geometries into GEOS handles (instead of direct GeoJSON export)
3. **`ghs_enrich`** — GDAL raster read + PROJ transform + PIP loop
4. **Validation** — Compare output against existing TS-generated cache files
5. **Parallelism** — OpenMP on country loop (already wired, just needs `#pragma` testing)
Each module can be tested independently against the existing JSON cache files for correctness.