220 lines
8.2 KiB
Markdown
220 lines
8.2 KiB
Markdown
# C++ Port — Boundaries Batch Generator
|
||
|
||
Port the `boundaries.ts` pipeline to native C++ for **10–100× speedup** on geometry union + GHS raster enrichment.
|
||
|
||
> [!NOTE]
|
||
> Scaffolding is complete — CLI entry point, module stubs, vcpkg manifest, and CMake build are in place under `cpp/`.
|
||
> The `geo_merge` and `ghs_enrich` modules are stubs pending implementation.
|
||
|
||
---
|
||
|
||
## Why C++
|
||
|
||
| Bottleneck (TS) | Root cause | C++ remedy |
|
||
|---|---|---|
|
||
| `polygon-clipping.union` | Pure-JS, throws on complex polys (DEU) | GEOS `GEOSUnaryUnion` — battle-tested C engine |
|
||
| `enrichFeatureWithGHS` PIP loop | GC pressure, obj alloc per pixel | Stack-allocated ray-cast, zero alloc |
|
||
| `readRasters` (GeoTIFF) | JS decoder, no tiling | GDAL `RasterIO` — native COG/tiled reads |
|
||
| `proj4` per pixel | JS re-init overhead | PROJ C API, reuse `PJ*` transform handle |
|
||
|
||
---
|
||
|
||
## Dependencies
|
||
|
||
| Library | Role | Notes |
|
||
|---|---|---|
|
||
| **GDAL/OGR 3.x** | GeoPackage read, GeoTIFF read, coordinate transforms | Single dep for 3 concerns |
|
||
| **GEOS 3.12+** | `GEOSUnaryUnion`, simplify, valid geometry repair | Used internally by GDAL |
|
||
| **PROJ** | Coordinate transforms (Mollweide → WGS84) | `proj_trans` with cached `PJ*` handle |
|
||
| **nlohmann/json** | JSON output (cache files) | Header-only |
|
||
| **CLI11** | Argument parsing | FetchContent (not vcpkg) |
|
||
| **SQLite3** | Already bundled with GDAL; direct access if needed | |
|
||
|
||
Install via vcpkg manifest (`vcpkg.json` is in `cpp/`):
|
||
|
||
```bash
|
||
cd cpp
|
||
vcpkg install
|
||
```
|
||
|
||
---
|
||
|
||
## Architecture
|
||
|
||
```
|
||
cpp/
|
||
├── CMakeLists.txt # Project root — GDAL/GEOS/PROJ via vcpkg
|
||
├── CMakePresets.json # vcpkg, vcpkg-win, dev presets
|
||
├── vcpkg.json # Dependency manifest
|
||
├── packages/
|
||
│ ├── logger/ # Logging utility (kept from boilerplate)
|
||
│ └── json/ # JSON utility (kept from boilerplate)
|
||
├── src/
|
||
│ ├── main.cpp # CLI entry, country loop, cache skip logic
|
||
│ ├── gpkg_reader.h/cpp # GeoPackage → features (OGR)
|
||
│ ├── geo_merge.h/cpp # Geometry union (GEOS) — stub
|
||
│ ├── ghs_enrich.h/cpp # GeoTIFF raster sampling + PIP — stub
|
||
│ ├── pip.h # Inline ray-casting point-in-polygon
|
||
│ └── types.h # BoundaryFeature, BoundaryResult structs
|
||
└── tests/
|
||
└── unit/test_pip.cpp # Catch2 PIP unit test
|
||
```
|
||
|
||
---
|
||
|
||
## Module Mapping (TS → C++)
|
||
|
||
### 1. `gpkg_reader` — GeoPackage Reading
|
||
|
||
**TS source:** [gpkg-reader.ts](file:///c:/Users/zx/Desktop/polymech/pm-pics/packages/gadm/src/gpkg-reader.ts)
|
||
|
||
- Replace custom WKB parser + `better-sqlite3` with **OGR `GDALOpenEx`** + `OGR_L_GetNextFeature`
|
||
- OGR natively reads GeoPackage and returns `OGRGeometry*` objects — no manual WKB parsing
|
||
- Group features by `GID_{level}` column, collect geometries per group
|
||
|
||
```cpp
|
||
GDALDatasetH ds = GDALOpenEx("gadm41.gpkg", GDAL_OF_VECTOR, nullptr, nullptr, nullptr);
|
||
OGRLayerH layer = GDALDatasetGetLayer(ds, 0);
|
||
OGR_L_SetAttributeFilter(layer, "GID_0 = 'DEU'");
|
||
OGRFeatureH feat;
|
||
while ((feat = OGR_L_GetNextFeature(layer)) != nullptr) {
|
||
OGRGeometryH geom = OGR_F_GetGeometryRef(feat);
|
||
// group by GID_{level}, collect geom clones
|
||
}
|
||
```
|
||
|
||
### 2. `geo_merge` — Geometry Union
|
||
|
||
**TS source:** `mergeGeometries()` in gpkg-reader.ts
|
||
|
||
- Replace `polygon-clipping` + pairwise fallback with **`GEOSUnaryUnion`**
|
||
- GEOS handles all edge cases (self-intersections, degeneracies) that crash JS libs
|
||
- For invalid inputs: `GEOSMakeValid` → then union
|
||
|
||
```cpp
|
||
GEOSGeometry* collection = GEOSGeom_createCollection(
|
||
GEOS_GEOMETRYCOLLECTION, geoms.data(), geoms.size());
|
||
GEOSGeometry* merged = GEOSUnaryUnion(collection);
|
||
if (!merged) {
|
||
// MakeValid fallback
|
||
for (auto& g : geoms) g = GEOSMakeValid(g);
|
||
merged = GEOSUnaryUnion(collection);
|
||
}
|
||
```
|
||
|
||
### 3. `ghs_enrich` — GeoTIFF Raster Sampling
|
||
|
||
**TS source:** [enrich-ghs.ts](file:///c:/Users/zx/Desktop/polymech/pm-pics/packages/gadm/src/enrich-ghs.ts)
|
||
|
||
- Replace `geotiff.js` with **GDAL `GDALRasterIO`** — supports tiled/COG reads, much faster
|
||
- Replace JS `proj4()` per-pixel with **PROJ `proj_trans`** using a cached `PJ*` handle
|
||
- Replace Turf PIP with inline ray-casting (already done in TS, trivial in C++)
|
||
- Keep stride-based sampling for large windows
|
||
|
||
```cpp
|
||
// One-time setup
|
||
PJ* transform = proj_create_crs_to_crs(ctx, "EPSG:54009", "EPSG:4326", nullptr);
|
||
|
||
// Read raster window
|
||
GDALRasterBandH band = GDALGetRasterBand(ds, 1);
|
||
std::vector<float> buf(width * height);
|
||
GDALRasterIO(band, GF_Read, minPx, minPy, width, height,
|
||
buf.data(), width, height, GDT_Float32, 0, 0);
|
||
|
||
// Fast pixel loop — no allocations
|
||
for (int y = 0; y < height; y += stride) {
|
||
for (int x = 0; x < width; x += stride) {
|
||
float val = buf[y * width + x];
|
||
if (val <= 0 || val == 65535.f) continue;
|
||
PJ_COORD in = proj_coord(originX + (minPx+x)*resX, originY + (minPy+y)*resY, 0, 0);
|
||
PJ_COORD out = proj_trans(transform, PJ_FWD, in);
|
||
if (point_in_geometry(out.xy.x, out.xy.y, geom)) {
|
||
totalVal += val * strideFactor;
|
||
// ... weighted sums, candidate tracking
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### 4. `main.cpp` — CLI + Cache Logic
|
||
|
||
**TS source:** [boundaries.ts](file:///c:/Users/zx/Desktop/polymech/pm-pics/packages/gadm/scripts/boundaries.ts)
|
||
|
||
- Read `gadm_continent.json` with nlohmann/json
|
||
- Same skip-if-cached logic (check file existence)
|
||
- Output identical JSON format to `cache/gadm/boundary_{code}_{level}.json`
|
||
- Optional: OpenMP `#pragma omp parallel for` on the country loop (each country is independent)
|
||
|
||
---
|
||
|
||
## Expected Performance
|
||
|
||
| Operation | TS (current) | C++ (expected) | Speedup |
|
||
|---|---|---|---|
|
||
| Geometry union (DEU L0, 151 polys) | **Crashes / >10min** | ~1–3s | ∞ / ~100× |
|
||
| GHS enrich (NGA L0, ~6M px window) | ~210s | ~5–15s | ~15× |
|
||
| Full batch (all countries, all levels) | **Hours, may hang** | ~20–40 min | ~10× |
|
||
|
||
---
|
||
|
||
## Build & Integration
|
||
|
||
> [!TIP]
|
||
> `gdal` pulls in `proj` and `geos` transitively on most triplets, but listing them explicitly in `vcpkg.json` ensures the CMake targets are always available.
|
||
|
||
### Build & Run
|
||
|
||
```bash
|
||
# 1. Install dependencies (one-time, from cpp/ directory)
|
||
cd cpp
|
||
vcpkg install
|
||
|
||
# 2. Configure + build (Windows MSVC)
|
||
cmake --preset vcpkg-win
|
||
cmake --build build --config Release
|
||
|
||
# 2b. Or Ninja (Linux/macOS)
|
||
cmake --preset vcpkg
|
||
cmake --build build --config Release
|
||
|
||
# 3. Run
|
||
./build/Release/boundaries --country=all
|
||
./build/Release/boundaries --country=DEU --level=0
|
||
./build/Release/boundaries --country=NGA --force
|
||
```
|
||
|
||
### CLI Flags
|
||
|
||
| Flag | Default | Description |
|
||
|---|---|---|
|
||
| `--country` | `all` | ISO3 country code, or `all` for batch |
|
||
| `--level` | `-1` (all) | Admin level 0–5, or -1 for all levels |
|
||
| `--cache-dir` | `cache/gadm` | Output directory for boundary JSON files |
|
||
| `--gpkg` | `data/gadm_410-levels.gpkg` | Path to GADM GeoPackage |
|
||
| `--force` | off | Regenerate even if cache file exists |
|
||
|
||
Output goes to the same `cache/gadm/` directory — the Node.js server reads these cache files at runtime, so the C++ tool is a **drop-in replacement** for batch generation.
|
||
|
||
---
|
||
|
||
## Implementation Status
|
||
|
||
| Module | Status | Notes |
|
||
|---|---|---|
|
||
| `main.cpp` | ✅ Done | CLI harness, cache logic, JSON output, OpenMP |
|
||
| `gpkg_reader` | ✅ Scaffolded | OGR read + feature grouping — needs GEOS geometry collection |
|
||
| `geo_merge` | ✅ Done | GEOSUnaryUnion + MakeValid fallback, WKB lossless roundtrip |
|
||
| `ghs_enrich` | ✅ Done | GDAL windowed RasterIO + PROJ + OGR PIP + peak detection |
|
||
| `pip.h` | ✅ Done | Inline ray-casting, unit-tested |
|
||
| `types.h` | ✅ Done | `BoundaryFeature`, `BoundaryResult` structs |
|
||
|
||
### Next Steps
|
||
|
||
1. **`geo_merge`** — Wire up GEOS: geometry collection → `GEOSUnaryUnion` → GeoJSON export
|
||
2. **`gpkg_reader`** — Clone OGR geometries into GEOS handles (instead of direct GeoJSON export)
|
||
3. **`ghs_enrich`** — GDAL raster read + PROJ transform + PIP loop
|
||
4. **Validation** — Compare output against existing TS-generated cache files
|
||
5. **Parallelism** — OpenMP on country loop (already wired, just needs `#pragma` testing)
|
||
|
||
Each module can be tested independently against the existing JSON cache files for correctness.
|