7.9 KiB
gadm-boundaries (C++)
Native C++ boundaries batch generator for the GADM pipeline.
Uses GDAL/GEOS/PROJ for geometry union + GHS raster enrichment (population & built-up surface).
Prerequisites
| Tool | Version |
|---|---|
| CMake | >= 3.20 |
| C++ compiler | C++20 (MSVC, GCC, or Clang) |
| vcpkg | Latest (set VCPKG_ROOT env var) |
vcpkg dependencies (auto-installed via vcpkg.json): GDAL, GEOS, PROJ, nlohmann-json, CLI11, spdlog, Catch2.
Build
Windows (MSVC)
cd packages/gadm/cpp
cmake --preset vcpkg-win
cmake --build build --config Release
Output: dist/win-x64/boundaries.exe + runtime DLLs + proj.db
Linux (Ubuntu)
# System dependencies (build tools)
sudo apt install cmake ninja-build pkg-config g++ git curl zip unzip tar
# Install vcpkg (if not already)
git clone https://github.com/microsoft/vcpkg.git ~/vcpkg
~/vcpkg/bootstrap-vcpkg.sh
export VCPKG_ROOT=~/vcpkg
# Build
cd packages/gadm/cpp
cmake --preset vcpkg-linux
cmake --build build --config Release
Output: dist/linux-x64/boundaries + proj.db
Note: First build takes 10-20 min as vcpkg compiles GDAL/GEOS/PROJ from source. Subsequent builds are fast.
Usage
# From the gadm package root (not cpp/)
# All countries, all levels (263 countries x 6 levels)
.\dist\win-x64\boundaries.exe --country=all
# Single country, all levels
.\dist\win-x64\boundaries.exe --country=DEU
# Single country + level
.\dist\win-x64\boundaries.exe --country=DEU --level=0
# Sub-region specific (dotted GADM ID)
.\dist\win-x64\boundaries.exe --country=ESP.6_1 --level=4
# Batch sub-region generation for ALL countries
# Generates boundary_ESP.6_1_4.json, boundary_DEU.2_1_3.json, etc.
.\dist\win-x64\boundaries.exe --country=all --level=4 --split-levels=1
# Custom resolution (1=full detail, 10=max simplification, default=4)
.\dist\win-x64\boundaries.exe --country=DEU --resolution=6
# Force regeneration (ignore cached files)
.\dist\win-x64\boundaries.exe --country=NGA --force
CLI Options
| Option | Default | Description |
|---|---|---|
--country |
all |
ISO3 code, dotted GADM ID (e.g. ESP.6_1), or all for batch |
--level |
-1 |
Admin level 0-5, or -1 for all |
--split-levels |
0 |
Comma-separated list of levels to split output files by (e.g. 0,1) |
--resolution |
4 |
Simplification resolution (1=full detail, 10=max simplification) |
--cache-dir |
cache/gadm |
Output directory |
--gpkg |
data/gadm_410-levels.gpkg |
GADM GeoPackage |
--continent-json |
data/gadm_continent.json |
Continent mapping |
--pop-tiff |
data/ghs/GHS_POP_...tif |
GHS population raster |
--built-tiff |
data/ghs/GHS_BUILT_S_...tif |
GHS built-up raster |
--force |
false |
Regenerate even if cached |
--split-levels explained
By default (--split-levels=0), the tool outputs one file per country per level: boundary_ESP_4.json.
With --split-levels=1, it dynamically queries all distinct level-1 region codes (e.g. ESP.6_1, ESP.1_1, …) and outputs individual files like boundary_ESP.6_1_4.json. This is useful for pre-splitting large countries into smaller, faster-loading cache files.
The TS wrapper automatically discovers these sub-region files before falling back to the full country file.
npm Scripts
npm run build:cpp # rebuild the C++ binary
npm run boundaries:cpp # --country=all (full batch, Windows)
npm run boundaries:cpp -- --country=DEU # single country
On Linux, run the binary directly since the npm script points to the Windows path.
Output
Files written to cache/gadm/ as:
boundary_{CODE}_{LEVEL}.json— full country file (e.g.boundary_ESP_4.json)boundary_{GADM_ID}_{LEVEL}.json— sub-region file when--split-levelsis used (e.g.boundary_ESP.6_1_4.json)
The TS wrapper (gpkg-reader.ts) resolves cache files in this order:
- Exact sub-region file:
boundary_{gadmId}_{level}.json - Full country file:
boundary_{countryCode}_{level}.json(prefix-filtered for sub-region queries) - Live GeoPackage query (fallback)
Each feature includes:
code— admin region codename— admin region namegeometry— GeoJSON (MultiPolygon via WKB-precision GEOS union)ghsPopulation— total population (GHS-POP 2030)ghsPopMaxDensity— peak population densityghsPopCenter— weighted population center[lon, lat]ghsPopCenters— top-N population peaks[[lon, lat, density], ...]ghsBuiltWeight— total built-up surface weightghsBuiltMax— peak built-up valueghsBuiltCenter— weighted built-up center[lon, lat]ghsBuiltCenters— top-N built-up peaks[[lon, lat, value], ...]
Architecture
See docs/cpp-port.md for the full spec.
src/
├── main.cpp # CLI entry, country loop, cache logic, PROJ_DATA setup
├── gpkg_reader.h/cpp # GeoPackage -> features (OGR C API)
├── geo_merge.h/cpp # Geometry union via WKB roundtrip (GEOS C API)
├── ghs_enrich.h/cpp # GeoTIFF raster sampling + PIP (GDAL + PROJ)
├── pip.h # Inline ray-casting point-in-polygon
└── types.h # BoundaryFeature, BoundaryResult structs
Key Design Decisions
- No OpenMP — GDAL/PROJ/GEOS are not thread-safe for concurrent raster reads + transform creation. Sequential processing is I/O-bound anyway.
- WKB precision — geometry union uses WKB serialization to avoid floating-point drift from WKT roundtrips.
- Mollweide projection — uses
+proj=mollstring directly (notEPSG:54009which isn't in the PROJ database). Transforms are normalized viaproj_normalize_for_visualizationfor correct lon/lat axis order. - Windowed raster I/O — GDAL
GDALRasterIOreads only the bbox-clipped window from multi-GB GeoTIFFs, keeping memory bounded. - PROJ_DATA auto-discovery —
main.cppsetsPROJ_DATAat startup pointing to the exe directory whereproj.dbis co-located. - Cross-platform — builds on Windows (MSVC) and Linux (GCC/Clang) via vcpkg. Platform-specific code is guarded by
#ifdef _WIN32.
Recent Optimizations (40-50x Speedup)
- Prepared GEOS Geometries:
OGR_G_Containsparsing was replaced withGEOSPreparedContains_r. Prepared geometries index the structure once, reducing millions of unprepared spatial lookups and point-instantiations inside the raster loop to microseconds. - Bounding Box Pre-filtering:
BoundaryFeatureprecalculates and storesminX, minY, maxX, maxYduring GeoPackage ingestion, entirely skipping OGR bounding box and geometry checks per chunk. - GeoPackage Handle Caching: Replaced sequential re-opening of the
GDALOpenExcontext across recursive boundary levels with a globally cached datastore handle, bypassing 1.8-second repeated disk-sweeping overhead. - Deferred GeoJSON Extraction: GeoJSON serialization only happens for final merged features or un-merged singletons, circumventing string/parsing allocations overhead during intermediate GeoPackage reading.
Suggestions for Further Optimization (I/O Bottleneck)
Right now, reading from the GeoPackage takes ~2.1 seconds per level per country because GDAL/OGR is performing full-layer scans (the 700MB SQLite files lack proper B-Tree spatial indexing on the GID_0 column). Since the CPU/Geometry overhead was already optimized by ~50x, the remaining time is purely standard disk I/O.
To eliminate this I/O overhead entirely (< 10ms read times), consider:
- Offline Indexing: Run a one-time indexing command before use (
ogrinfo data/gadm_410-levels.gpkg -sql "CREATE INDEX idx_gid0 ON ADM_ADM_1 (GID_0)") to bake B-Tree indices directly into your source.gpkg. - Pre-partitioning / Sharding: Split the massive monolithic GADM GeoPackage into individual
.gpkgor.geojsonchunks per country (e.g.,data/gadm/DEU.gpkg), enabling true O(1) instantaneous access.