# @polymech/gadm ![npm version](https://img.shields.io/npm/v/@polymech/gadm) ![TypeScript](https://img.shields.io/badge/TypeScript-Strict-%233178C6?logo=typescript) **[Homepage](https://service.polymech.info/)**  ·  **[Source Code](https://git.polymech.info/polymech/gadm-ts)** Pure TypeScript interface to the [GADM](https://gadm.org) v4.1 administrative boundaries database. Zero Python dependencies — parquet data, tree construction, iterators, and caching all run in Node.js. ## Overview | Feature | Description | |---------|-------------| | **Database** | 356K rows from GADM 4.1, stored as a 6 MB Parquet file | | **Admin Levels** | L0 (country) → L5 (municipality/commune) | | **Tree API** | Build hierarchical trees, walk with DFS/BFS/level iterators | | **Name Search** | Fuzzy search across all levels with Levenshtein suggestions | | **GeoJSON** | Fetch boundaries from GADM CDN with corrected names | | **Caching** | File-based JSON cache for trees and API results | | **VARNAME** | Alternate names / English translations via `VARNAME_1..5` columns | ![PoolyPress GADM based picker](./docs/gadm-inspector.png) --- ## Installation ```bash npm install @polymech/gadm ``` Internal monorepo — referenced via workspace protocol in `package.json`. --- ## Acknowledgments & PyGADM Port This package is a direct Node.js/TypeScript port of the excellent Python library [pygadm](https://github.com/gee-community/pygadm) (which powers the core parquet-based data structure and fetching methodology). While bringing these capabilities natively to the javascript ecosystem, we built and added several critical enhancements designed specifically for web applications and browser performance: - **Aggressive Geometry Simplification:** Natively integrates `@turf/simplify` and `@turf/truncate` with a configurable `resolution` parameter (1=full detail, 10=max simplification, default=4). Compresses raw unoptimized 25MB boundary polygons down to ~1MB browser-friendly payloads while rounding all coordinates (geometry + GHS metadata) to 5 decimal places. - **Unified Cascading Caches:** Intelligent caching ladders that auto-resolve across global `process.env.GADM_CACHE`, active `process.cwd()`, and local workspace `../cache` mounts. - **Target-Level Subdivision Extraction:** A unified `targetLevel` API design that distinctly differentiates between extracting an outer merged geographic perimeter vs. an array of granular inner subdivided states natively derived from recursive `.merge()` operations. - **Smart Pre-cacher Script:** Includes `boundaries.ts`, an auto-resuming build script that iterates downwards to pre-calculate, dissolve, and aggressively compress hierarchy layers 0–5 for instant sub-ms API delivery, bypassing heavy mathematical geometry intersections at runtime. --- ## Quick Start ```ts import { buildTree, walkDFS, findNode, searchRegions, getNames } from '@polymech/gadm'; // Build a tree for Spain const tree = await buildTree({ admin: 'ESP', cacheDir: './cache/gadm' }); console.log(tree.root.children.length); // 18 (comunidades) // Find a specific region const bcn = findNode(tree.root, 'Barcelona'); console.log(bcn?.gid); // ESP.6.1_1 // Walk all nodes for (const node of walkDFS(tree.root)) { console.log(' '.repeat(node.level) + node.name); } // Search via wrapper API const result = await searchRegions({ query: 'France', contentLevel: 2 }); console.log(result.data?.length); // ~101 departments ``` --- ## API Reference ### Tree Module #### `buildTree(opts: BuildTreeOptions): Promise` Builds a hierarchical tree from the flat parquet data. Results are cached to disk when `cacheDir` is set. ```ts interface BuildTreeOptions { name?: string; // Region name: "Spain", "Cataluña", "Bayern" admin?: string; // GADM code: "ESP", "DEU.2_1", "FRA.11_1" cacheDir?: string; // Path for JSON cache files (optional) } ``` Either `name` or `admin` must be set (not both). Throws if the region is not found in the database. #### `GADMTree` and `GADMNode` ```ts interface GADMTree { root: GADMNode; // Root node of the tree maxLevel: number; // Deepest admin level reached (0–5) nodeCount: number; // Total nodes across all levels } interface GADMNode { name: string; // Display name: "Barcelona" gid: string; // GADM ID: "ESP.6.1_1" level: number; // Admin level 0–5 children: GADMNode[]; // Sub-regions (sorted alphabetically) } ``` #### Iterators All iterators are generators — use `for...of` or spread into arrays. | Function | Description | |----------|-------------| | `walkDFS(node)` | Depth-first traversal, top-down | | `walkBFS(node)` | Breadth-first, level by level | | `walkLevel(node, level)` | Only nodes at a specific admin level | | `leaves(node)` | Only leaf nodes (deepest, no children) | | `findNode(root, query)` | First DFS match by name or GID (case-insensitive) | ```ts // Get all provinces (level 2) under Cataluña const provinces = [...walkLevel(tree.root, 2)]; // → [{ name: 'Barcelona', ... }, { name: 'Girona', ... }, ...] // Count municipalities const municipios = [...leaves(tree.root)]; console.log(municipios.length); // 955 // Find by GID const girona = findNode(tree.root, 'ESP.6.2_1'); ``` --- ### Names Module #### `getNames(opts: NamesOptions): Promise` Searches the parquet database for admin areas. Returns deduplicated rows with fuzzy match suggestions on miss. ```ts interface NamesOptions { name?: string; // Search by name admin?: string; // Search by GADM code contentLevel?: number; // Target level (0–5), -1 = auto complete?: boolean; // Return all columns up to contentLevel } interface NamesResult { rows: GadmRow[]; // Matched records level: number; // Resolved content level columns: string[]; // Column names in result } ``` On miss, throws with Levenshtein-based suggestions: ``` The requested "Franec" is not part of GADM. The closest matches are: France, Franca, Franco, ... ``` --- ### Items Module #### `getItems(opts: ItemsOptions): Promise` Fetches GeoJSON boundaries from the GADM CDN, with name correction from the local parquet database (workaround for camelCase bug in GADM GeoJSON responses). ```ts interface ItemsOptions { name?: string | string[]; // Region name(s) admin?: string | string[]; // GADM code(s) contentLevel?: number; // Target level, -1 = auto includeOuter?: boolean; // Also include the containing region's external perimeter geojson?: boolean; // Return geometries instead of just properties (metadata) } ``` Supports continent expansion: `getItems({ name: ['europe'] })` fetches all European countries. --- ### Wrapper Module (Server API) Higher-level API designed for HTTP handlers. Includes file-based caching via `GADM_CACHE` env var (default: `./cache/gadm`). | Function | Description | |----------|-------------| | `searchRegions(opts)` | Search by name, returns metadata or GeoJSON | | `getBoundary(gadmId, contentLevel?, cache?, enrichOpts?, resolution?)` | Get GeoJSON boundary for a GADM ID | | `getRegionNames(opts)` | List sub-region names with depth control | #### Integration Example (Server API) Here is a real-world example of wrapping the GADM engine inside an HTTP handler (like Hono or Express) to fetch dynamically chunked boundaries and enrich their GeoJSON metadata on the fly: ```ts import { getBoundary } from '@polymech/gadm'; import * as turf from '@turf/turf'; async function handleGetRegionBoundary(c) { const id = c.req.param('id'); // e.g. "DEU" or "ESP.6_1" const targetLevel = c.req.query('targetLevel'); // e.g. "1" for inner states const enrich = c.req.query('enrich') === 'true'; try { const parsedTargetLevel = targetLevel !== undefined ? parseInt(targetLevel) : undefined; // Instantly fetches Boundary FeatureCollection (already cached and compressed) const result = await getBoundary(id, parsedTargetLevel); if ('error' in result) { return c.json({ error: result.error }, 404); } // On-the-fly Geometry Enrichment if (enrich && result.features) { for (const feature of result.features) { // Calculate geographical square kilometers organically using Turf const areaSqkm = Math.round(turf.area(feature as any) / 1000000); feature.properties.areaSqkm = areaSqkm; // Construct bounding box for client camera tracking const bbox = turf.bbox(feature as any); feature.properties.bbox = bbox; } } return c.json(result, 200); } catch (error) { return c.json({ error: error.message }, 500); } } ``` --- ## Data Enrichment (Optional GeoTIFFs) The GADM engine includes built-in optional enrichers that can rapidly query **European Commission GHSL (Global Human Settlement Layer)** GeoTIFFs directly in Node.js to instantly yield the **exact simulated population** and **built-up concrete metric weight** perfectly inside any requested boundary. Because `getBoundary()` natively projects bounding boxes to Mollweide `EPSG:54009` and extracts spatial windows from the raw satellite TIFF data, you get perfect 100m² resolution density analytics on the fly, saving you from setting up heavy PostGIS/QGIS servers. ### Prerequisites (GHSL Data) You must download the raw GeoTIFF datasets from the EU JRC Open Data portal and store them locally (e.g. in `data/ghs/`). *Warning: These files are >1GB.* | Dataset | Metric | URL | |---------|--------|-----| | `GHS_POP` | Population (2030 Projections) | [GHS_POP_E2030_GLOBE_R2023A_54009_100_V1_0.tif](https://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/GHSL/GHS_POP_GLOBE_R2023A/GHS_POP_E2030_GLOBE_R2023A_54009_100/V1-0/GHS_POP_E2030_GLOBE_R2023A_54009_100_V1_0.zip) | | `GHS_BUILT_S`| Built-up Area / Concrete Surface | [GHS_BUILT_S_E2030_GLOBE_R2023A_54009_100_V1_0.tif](https://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/GHSL/GHS_BUILT_S_GLOBE_R2023A/GHS_BUILT_S_E2030_GLOBE_R2023A_54009_100/V1-0/GHS_BUILT_S_E2030_GLOBE_R2023A_54009_100_V1_0.zip) | ### Option 1: Native Wrapper Option (Recommended) Simply pass `{ pop: true, built: true }` into `getBoundary()`. It will automatically discover the `.tif` datasets (looking in `data/ghs`, `cache/ghs`, and environment variables), scan the density per feature, calculate the true Population and Physical Centers of Mass, and append them directly to the GeoJSON `feature.properties` (deep cloning and caching the result!) ```typescript const result = await getBoundary(gadmId, targetLevel, undefined, { pop: true, built: true }); // result.features[0].properties will now contain: // { // ... // "population": 5666, <- the standard property overwritten with hyper-accurate bounds // "ghsPopMaxDensity": 125, <- highest density 100x100m block // "ghsPopCenter": [2.1019, 41.8130], <- true center of mass (where residents actually live vs geographical center) // "ghsPopCenters": [ <- up to 5 distinct population clusters [lon, lat, max_density] // [2.1019, 41.8130, 125] // ], // "ghsBuiltWeight": 744080, <- concrete physical size index // "ghsBuiltCenter": [2.1039, 41.8130], <- true center of concrete (industrial + urban spread) // "ghsBuiltCenters": [ <- up to 5 distinct concrete clusters [lon, lat, max_density] // [2.1039, 41.8130, 300] // ] // } ``` ### Option 2: Standalone Feature Module If you already have arbitrary GeoJSON polygons, you can extract the exact same density metrics natively: ```typescript import { enrichFeatureWithGHS } from '@polymech/gadm'; const myCustomPolygon = { type: 'Feature', geometry: { ... } }; const stats = await enrichFeatureWithGHS(myCustomPolygon, { pop: true }); console.log(stats.ghsPopulation, stats.ghsPopCenter); ``` --- ## Boundary Geometries & Caching Fetching complex geospatial polygons (like country borders or district subdivisions) requires merging and calculating hundreds of complex geometries. Doing this mathematically at runtime for a user request is too slow, so `@polymech/gadm` handles this with pre-compiled caches and aggressive size compression. ### Resolving Boundary Target Levels When building interactive user interfaces or fetching boundaries through the top-level API (`handleGetRegionBoundary`), the returned `FeatureCollection` granularity is controlled strictly through the `targetLevel` (or programmatic `contentLevel`). - **Outer Boundary**: Set `targetLevel` exactly equal to the region's intrinsic level (e.g., Targetting `Level 0` for Spain). The engine uses `turf` to automatically dissolve internal geometries, returning a single merged bounding polygon mimicking the total region envelope. - **Inner Subdivisions**: Provide a `targetLevel` deeper than the intrinsic level (e.g., Targetting `Level 1` for Spain). The engine filters for the exact constituent parts and returns a `FeatureCollection` where each active sub-group (the 17 Spanish States) is a distinctly preserved geometry feature. ### Geometry Simplification & Resolution Both the TypeScript and C++ pipelines apply geometry simplification controlled by a `resolution` parameter (default: **4**): | Resolution | Tolerance | Coordinate Precision | Use Case | |------------|-----------|---------------------|----------| | 1 | 0.0001 | 5 decimals | Maximum detail | | 4 | 0.005 | 5 decimals | Default — good balance | | 10 | 0.5 | 5 decimals | Maximum compression | The formula: `tolerance = 0.0001 * 10^((resolution-1) * 4/9)`. GHS metadata coordinates (`ghsPopCenter`, `ghsBuiltCenters`, etc.) are also rounded to 5 decimal places to match geometry precision. ### Smart Caching & Cache Resolution Order To ensure instantaneous delivery (sub-10ms) of these polygons to your HTTP APIs: 1. **Pre-Caching Scripts**: Run `npm run boundaries -- --country=all` (TypeScript) or `npm run boundaries:cpp` (C++). Both iterate downwards to compute and compress hierarchical layers 0 through 5 for each country. Existing files are skipped for easy resume. 2. **Cascading Cache Lookups**: The package resolves caches in order: - Exact sub-region cache file: `boundary_{gadmId}_{level}.json` - Full country cache file: `boundary_{countryCode}_{level}.json` (prefix-filtered for sub-region queries) - Environment paths: `process.env.GADM_CACHE`, then `process.cwd()/cache/gadm`, then `../cache/gadm` - Live GeoPackage query (fallback) 3. **Payload Compression (~25MB -> ~1MB)**: Boundary geometries are compressed using `@turf/simplify` (TS) or GEOS `GEOSSimplify_r` (C++) with matching tolerance, ensuring consistent output from both pipelines. --- ### Database Module (Low-Level) | Function | Description | |----------|-------------| | `loadDatabase()` | Load parquet into memory (lazy, singleton) | | `getColumns()` | Return column names | | `resetCache()` | Clear the in-memory row cache | `GadmRow` is `Record` — all values normalized to strings. --- ## Types All types are exported from the package entry point: ```ts import type { GADMNode, GADMTree, BuildTreeOptions, // tree NamesOptions, NamesResult, GadmRow, // names + database ItemsOptions, GeoJSONFeature, GeoJSONCollection, // items SearchRegionsOptions, SearchRegionsResult, RegionNamesOptions, // wrapper } from '@polymech/gadm'; ``` --- ## Data Layout ### Parquet File `data/gadm_database.parquet` — **356,508 rows**, **6.29 MB** | Column Group | Columns | Description | |--------------|---------|-------------| | GID | `GID_0` … `GID_5` | GADM identifiers per level | | NAME | `NAME_0` … `NAME_5` | Display names per level | | VARNAME | `VARNAME_1` … `VARNAME_5` | Alternate names / translations | 129,448 rows have `VARNAME_1` values (e.g. `Badakhshān`, `Bavière`). ### GADM Levels | Level | Typical Meaning | Example (Spain) | |-------|----------------|-----------------| | 0 | Country | Spain | | 1 | State / Region | Cataluña | | 2 | Province / Department | Barcelona | | 3 | District / Comarca | Baix Llobregat | | 4 | Municipality | Castelldefels | | 5 | Sub-municipality | *(rare, not all countries)* | > **Note:** GADM does not include neighborhood/Stadtteil-level data. > For sub-city resolution (e.g. Johannstadt in Dresden), OSM/Nominatim would be needed. --- ## Caching ### Tree Cache (`cacheDir`) When `cacheDir` is passed to `buildTree()`, the full tree is saved as `tree_{md5}.json`. Subsequent calls with the same `name`/`admin` return the cached tree instantly (~1ms). ### Wrapper Cache (`GADM_CACHE`) The wrapper module caches search results, boundaries, and region names in `$GADM_CACHE/` (default `./cache/gadm`). Files are keyed by MD5 hash of the query parameters. ### In-Memory Cache `loadDatabase()` is a singleton — the 356K-row array is loaded once per process. Call `resetCache()` to force a reload (useful in tests). ### Precalculating Boundaries To improve runtime performance (especially for large geographies which take time to dissolve), you can precalculate and cache standard admin boundaries using the included CLI script: ```bash cd packages/gadm # Precalculate the outer boundary for a specific country npm run boundaries -- --country=DEU # Precalculate inner boundaries for a specific level npm run boundaries -- --country=DEU --level=1 # Precalculate the outer boundary for ALL countries worldwide npm run boundaries -- --country=all ``` Precalculated boundaries are saved as native `.json` artifacts inside the configured cache directory (`./cache/gadm/boundary_{CODE}_{LEVEL}.json`). ### C++ Native Pipeline (Recommended for Batch) For full batch generation across all 263 countries × 6 levels, the native C++ port provides significantly faster processing using GDAL/GEOS/PROJ directly. It reads the same GeoPackage, performs geometry unions via WKB-precision GEOS, and enriches with GHS raster data — producing identical output to the TypeScript pipeline. ```bash # Build (requires vcpkg + CMake) npm run build:cpp # or: cmake --build cpp/build --config Release # Run via npm scripts npm run boundaries:cpp # all countries npm run boundaries:cpp -- --country=DEU # single country # Sub-region splitting (generates boundary_ESP.6_1_4.json etc.) npm run boundaries:cpp -- --country=all --level=4 --split-levels=1 # Custom resolution (1-10, default=4) npm run boundaries:cpp -- --country=DEU --resolution=6 ``` Output includes GHS enrichment by default when tiff files are present in `data/ghs/`: - `ghsPopulation`, `ghsPopMaxDensity`, `ghsPopCenter`, `ghsPopCenters` - `ghsBuiltWeight`, `ghsBuiltMax`, `ghsBuiltCenter`, `ghsBuiltCenters` See [`cpp/README.md`](./cpp/README.md) for build prerequisites, full CLI reference, and architecture details. --- ## Data Refresh Regenerate `data/gadm_database.parquet` from a GADM GeoPackage source file. ### Prerequisites Download one of the core GeoPackage database files. You can point the package to your `gpkg` location using the `GADM_GPKG_PATH` environment variable, or store it in your working directory at `cache/gadm/gadm_410.gpkg`: ```bash https://geodata.ucdavis.edu/gadm/gadm4.1/gadm_410-gpkg.zip → unzip → gadm_410.gpkg https://geodata.ucdavis.edu/gadm/gadm4.1/gadm_410-raw.gpkg ``` ### Run ```bash cd packages/gadm npm run refresh ``` The script (`scripts/refresh-database.ts`): 1. Opens the GeoPackage (SQLite) via `better-sqlite3` 2. Auto-detects table format (per-level `ADM_x` tables or single flat table) 3. Extracts GID, NAME, and VARNAME columns for levels 0–5 4. Writes to `data/gadm_database.parquet` via `hyparquet-writer` ### Dev Dependencies (refresh only) | Package | Purpose | |---------|---------| | `better-sqlite3` | Read GeoPackage (SQLite) files | | `hyparquet-writer` | Write Parquet output | These are `devDependencies` — not needed at runtime. --- ## Tests ```bash cd packages/gadm npx vitest run # all tests npx vitest run src/__tests__/tree.test.ts # tree tests only ``` ### Tree Tests JSON outputs saved to `tests/tree/` for inspection: | File | Content | |------|---------| | `test-cataluna.json` | Full Cataluña tree (1,000 nodes, 955 leaves) | | `test-germany-summary.json` | Germany L1 summary (16 Bundesländer, 16,402 nodes) | | `test-dresden.json` | Sachsen → Dresden subtree with all children | | `test-iterators.json` | DFS/BFS/walkLevel/findNode verification data | ### Name Tests `src/__tests__/province-names.test.ts` — tests `getNames()` for France departments, exact matches, fuzzy suggestions. --- ## Architecture ``` packages/gadm/ ├── cpp/ # C++ native pipeline (GDAL/GEOS/PROJ) │ ├── src/ # main.cpp, gpkg_reader, geo_merge, ghs_enrich │ ├── CMakeLists.txt │ └── vcpkg.json ├── data/ │ ├── gadm_database.parquet # 356K rows, 6.29 MB │ ├── gadm_continent.json # Continent → ISO3 mapping │ └── ghs/ # GHS GeoTIFF rasters (optional) ├── dist/ │ └── win-x64/ # Compiled C++ binary + DLLs ├── scripts/ │ └── refresh-database.ts # GeoPackage → Parquet converter ├── src/ │ ├── database.ts # Parquet reader (hyparquet) │ ├── names.ts # Name/code lookup + fuzzy match │ ├── items.ts # GeoJSON boundaries from CDN │ ├── gpkg-reader.ts # GeoPackage boundary reader + C++ cache fallback │ ├── enrich-ghs.ts # GHS GeoTIFF enrichment (TS) │ ├── wrapper.ts # Server-facing API with cache │ ├── tree.ts # Tree builder + iterators │ ├── index.ts # Barrel exports │ └── __tests__/ │ ├── tree.test.ts # Tree building + iterator tests │ └── province-names.test.ts ├── tests/ │ ├── tree/ # Test output JSONs │ └── cache/gadm/ # Tree cache files └── package.json ``` ## Dependencies | Package | Type | Purpose | |---------|------|---------| | `hyparquet` | runtime | Read Parquet files (zero native deps) | | `zod` | runtime | Schema validation | | `better-sqlite3` | dev | GeoPackage reader (refresh only) | | `hyparquet-writer` | dev | Parquet writer (refresh only) | | `vitest` | dev | Test runner | | `typescript` | dev | Build |