gadm-ts/cpp
2026-03-24 01:10:17 +01:00
..
packages cpp port 1/2 2026-03-23 15:47:35 +01:00
src cache purge | cpp json fix 2026-03-24 01:10:17 +01:00
tests cpp port 1/2 2026-03-23 15:47:35 +01:00
.clangd cpp port 1/2 2026-03-23 16:57:01 +01:00
.gitignore cpp port 1/2 2026-03-23 15:47:35 +01:00
CMakeLists.txt cache | lnx 1/2 2026-03-23 18:08:44 +01:00
CMakePresets.json cache | lnx 1/2 2026-03-23 18:08:44 +01:00
README.md cache | optmization 2/2 2026-03-23 21:48:26 +01:00
vcpkg.json cpp port 1/2 2026-03-23 15:47:35 +01:00

gadm-boundaries (C++)

Native C++ boundaries batch generator for the GADM pipeline.
Uses GDAL/GEOS/PROJ for geometry union + GHS raster enrichment (population & built-up surface).

Prerequisites

Tool Version
CMake >= 3.20
C++ compiler C++20 (MSVC, GCC, or Clang)
vcpkg Latest (set VCPKG_ROOT env var)

vcpkg dependencies (auto-installed via vcpkg.json): GDAL, GEOS, PROJ, nlohmann-json, CLI11, spdlog, Catch2.

Build

Windows (MSVC)

cd packages/gadm/cpp
cmake --preset vcpkg-win
cmake --build build --config Release

Output: dist/win-x64/boundaries.exe + runtime DLLs + proj.db

Linux (Ubuntu)

# System dependencies (build tools)
sudo apt install cmake ninja-build pkg-config g++ git curl zip unzip tar

# Install vcpkg (if not already)
git clone https://github.com/microsoft/vcpkg.git ~/vcpkg
~/vcpkg/bootstrap-vcpkg.sh
export VCPKG_ROOT=~/vcpkg

# Build
cd packages/gadm/cpp
cmake --preset vcpkg-linux
cmake --build build --config Release

Output: dist/linux-x64/boundaries + proj.db

Note: First build takes 10-20 min as vcpkg compiles GDAL/GEOS/PROJ from source. Subsequent builds are fast.

Usage

# From the gadm package root (not cpp/)

# All countries, all levels (263 countries x 6 levels)
# Windows:
.\dist\win-x64\boundaries.exe --country=all
# Linux:
./dist/linux-x64/boundaries --country=all

# Single country, all levels
./dist/linux-x64/boundaries --country=DEU

# Single country + level
./dist/linux-x64/boundaries --country=DEU --level=0

# Force regeneration (ignore cached files)
./dist/linux-x64/boundaries --country=NGA --force

CLI Options

Option Default Description
--country all ISO3 code or all for batch
--level -1 Admin level 0-5, or -1 for all
--cache-dir cache/gadm Output directory
--gpkg data/gadm_410-levels.gpkg GADM GeoPackage
--continent-json data/gadm_continent.json Continent mapping
--pop-tiff data/ghs/GHS_POP_...tif GHS population raster
--built-tiff data/ghs/GHS_BUILT_S_...tif GHS built-up raster
--force false Regenerate even if cached

npm Scripts

npm run boundaries:cpp           # --country=all (full batch, Windows)
npm run boundaries:cpp -- --country=DEU  # single country

On Linux, run the binary directly since the npm script points to the Windows path.

Output

Files written to cache/gadm/boundary_{CODE}_{LEVEL}.json — drop-in replacement for the TS pipeline.

The TS wrapper (gpkg-reader.ts) automatically discovers these cache files and serves them through the API. Sub-region queries (e.g. getBoundary('DEU.2_1', 3)) are resolved by prefix-filtering the full country file.

Each feature includes:

  • code — admin region code
  • name — admin region name
  • geometry — GeoJSON (MultiPolygon via WKB-precision GEOS union)
  • ghsPopulation — total population (GHS-POP 2030)
  • ghsPopMaxDensity — peak population density
  • ghsPopCenter — weighted population center [lon, lat]
  • ghsPopCenters — top-N population peaks [[lon, lat, density], ...]
  • ghsBuiltWeight — total built-up surface weight
  • ghsBuiltMax — peak built-up value
  • ghsBuiltCenter — weighted built-up center [lon, lat]
  • ghsBuiltCenters — top-N built-up peaks [[lon, lat, value], ...]

Architecture

See docs/cpp-port.md for the full spec.

src/
├── main.cpp          # CLI entry, country loop, cache logic, PROJ_DATA setup
├── gpkg_reader.h/cpp # GeoPackage -> features (OGR C API)
├── geo_merge.h/cpp   # Geometry union via WKB roundtrip (GEOS C API)
├── ghs_enrich.h/cpp  # GeoTIFF raster sampling + PIP (GDAL + PROJ)
├── pip.h             # Inline ray-casting point-in-polygon
└── types.h           # BoundaryFeature, BoundaryResult structs

Key Design Decisions

  • No OpenMP — GDAL/PROJ/GEOS are not thread-safe for concurrent raster reads + transform creation. Sequential processing is I/O-bound anyway.
  • WKB precision — geometry union uses WKB serialization to avoid floating-point drift from WKT roundtrips.
  • Mollweide projection — uses +proj=moll string directly (not EPSG:54009 which isn't in the PROJ database). Transforms are normalized via proj_normalize_for_visualization for correct lon/lat axis order.
  • Windowed raster I/O — GDAL GDALRasterIO reads only the bbox-clipped window from multi-GB GeoTIFFs, keeping memory bounded.
  • PROJ_DATA auto-discoverymain.cpp sets PROJ_DATA at startup pointing to the exe directory where proj.db is co-located.
  • Cross-platform — builds on Windows (MSVC) and Linux (GCC/Clang) via vcpkg. Platform-specific code is guarded by #ifdef _WIN32.

Recent Optimizations (40-50x Speedup)

  • Prepared GEOS Geometries: OGR_G_Contains parsing was replaced with GEOSPreparedContains_r. Prepared geometries index the structure once, reducing millions of unprepared spatial lookups and point-instantiations inside the raster loop to microseconds.
  • Bounding Box Pre-filtering: BoundaryFeature precalculates and stores minX, minY, maxX, maxY during GeoPackage ingestion, entirely skipping OGR bounding box and geometry checks per chunk.
  • GeoPackage Handle Caching: Replaced sequential re-opening of the GDALOpenEx context across recursive boundary levels with a globally cached datastore handle, bypassing 1.8-second repeated disk-sweeping overhead.
  • Deferred GeoJSON Extraction: GeoJSON serialization only happens for final merged features or un-merged singletons, circumventing string/parsing allocations overhead during intermediate GeoPackage reading.

Suggestions for Further Optimization (I/O Bottleneck)

Right now, reading from the GeoPackage takes ~2.1 seconds per level per country because GDAL/OGR is performing full-layer scans (the 700MB SQLite files lack proper B-Tree spatial indexing on the GID_0 column). Since the CPU/Geometry overhead was already optimized by ~50x, the remaining time is purely standard disk I/O. To eliminate this I/O overhead entirely (< 10ms read times), consider:

  1. Offline Indexing: Run a one-time indexing command before use (ogrinfo data/gadm_410-levels.gpkg -sql "CREATE INDEX idx_gid0 ON ADM_ADM_1 (GID_0)") to bake B-Tree indices directly into your source .gpkg.
  2. Pre-partitioning / Sharding: Split the massive monolithic GADM GeoPackage into individual .gpkg or .geojson chunks per country (e.g., data/gadm/DEU.gpkg), enabling true O(1) instantaneous access.