gadm-ts/cpp/README.md
2026-03-23 17:35:02 +01:00

101 lines
3.8 KiB
Markdown

# gadm-boundaries (C++)
Native C++ boundaries batch generator for the GADM pipeline.
Uses GDAL/GEOS/PROJ for geometry union + GHS raster enrichment (population & built-up surface).
## Prerequisites
| Tool | Version |
|------|---------|
| CMake | >= 3.20 |
| C++ compiler | C++20 (MSVC, GCC, or Clang) |
| vcpkg | Latest (set `VCPKG_ROOT` env var) |
vcpkg dependencies (auto-installed via `vcpkg.json`): GDAL, GEOS, PROJ, nlohmann-json, CLI11, spdlog, Catch2.
## Build
```bash
# Configure + build (Windows MSVC)
cmake --preset vcpkg-win
cmake --build build --config Release
```
The post-build step copies `boundaries.exe`, runtime DLLs, and `proj.db` to `dist/win-x64/`.
## Usage
```bash
# From the gadm package root (not cpp/)
# All countries, all levels (263 countries x 6 levels)
.\dist\win-x64\boundaries.exe --country=all
# Single country, all levels
.\dist\win-x64\boundaries.exe --country=DEU
# Single country + level
.\dist\win-x64\boundaries.exe --country=DEU --level=0
# Force regeneration (ignore cached files)
.\dist\win-x64\boundaries.exe --country=NGA --force
```
### CLI Options
| Option | Default | Description |
|--------|---------|-------------|
| `--country` | `all` | ISO3 code or `all` for batch |
| `--level` | `-1` | Admin level 0-5, or -1 for all |
| `--cache-dir` | `cache/gadm` | Output directory |
| `--gpkg` | `data/gadm_410-levels.gpkg` | GADM GeoPackage |
| `--continent-json` | `data/gadm_continent.json` | Continent mapping |
| `--pop-tiff` | `data/ghs/GHS_POP_...tif` | GHS population raster |
| `--built-tiff` | `data/ghs/GHS_BUILT_S_...tif` | GHS built-up raster |
| `--force` | `false` | Regenerate even if cached |
### npm Scripts
```bash
npm run boundaries:cpp # --country=all (full batch)
npm run boundaries:cpp -- --country=DEU # single country
```
## Output
Files written to `cache/gadm/boundary_{CODE}_{LEVEL}.json` — drop-in replacement for the TS pipeline.
Each feature includes:
- `code` — admin region code
- `name` — admin region name
- `geometry` — GeoJSON (MultiPolygon via WKB-precision GEOS union)
- `ghsPopulation` — total population (GHS-POP 2030)
- `ghsPopMaxDensity` — peak population density
- `ghsPopCenter` — weighted population center `[lon, lat]`
- `ghsPopCenters` — top-N population peaks `[[lon, lat, density], ...]`
- `ghsBuiltWeight` — total built-up surface weight
- `ghsBuiltMax` — peak built-up value
- `ghsBuiltCenter` — weighted built-up center `[lon, lat]`
- `ghsBuiltCenters` — top-N built-up peaks `[[lon, lat, value], ...]`
## Architecture
See [docs/cpp-port.md](../docs/cpp-port.md) for the full spec.
```
src/
├── main.cpp # CLI entry, country loop, cache logic, PROJ_DATA setup
├── gpkg_reader.h/cpp # GeoPackage -> features (OGR C API)
├── geo_merge.h/cpp # Geometry union via WKB roundtrip (GEOS C API)
├── ghs_enrich.h/cpp # GeoTIFF raster sampling + PIP (GDAL + PROJ)
├── pip.h # Inline ray-casting point-in-polygon
└── types.h # BoundaryFeature, BoundaryResult structs
```
### Key Design Decisions
- **No OpenMP** — GDAL/PROJ/GEOS are not thread-safe for concurrent raster reads + transform creation. Sequential processing is I/O-bound anyway.
- **WKB precision** — geometry union uses WKB serialization to avoid floating-point drift from WKT roundtrips.
- **Mollweide projection** — uses `+proj=moll` string directly (not `EPSG:54009` which isn't in the PROJ database). Transforms are normalized via `proj_normalize_for_visualization` for correct lon/lat axis order.
- **Windowed raster I/O** — GDAL `GDALRasterIO` reads only the bbox-clipped window from multi-GB GeoTIFFs, keeping memory bounded.
- **PROJ_DATA auto-discovery**`main.cpp` sets `PROJ_DATA` at startup pointing to the exe directory where `proj.db` is co-located.