optimization 3/3

This commit is contained in:
lovebird 2026-03-24 10:50:17 +01:00
parent 3c0c42a9b2
commit 12d38ef56b
10 changed files with 316 additions and 114 deletions

View File

@ -41,7 +41,7 @@ This package is a direct Node.js/TypeScript port of the excellent Python library
While bringing these capabilities natively to the javascript ecosystem, we built and added several critical enhancements designed specifically for web applications and browser performance:
- **Aggressive Geometry Simplification:** Natively integrates `@turf/simplify` (0.001 tolerance, high-quality) to instantly compress raw unoptimized 25MB boundary polygons down to ~1MB browser-friendly payloads.
- **Aggressive Geometry Simplification:** Natively integrates `@turf/simplify` and `@turf/truncate` with a configurable `resolution` parameter (1=full detail, 10=max simplification, default=4). Compresses raw unoptimized 25MB boundary polygons down to ~1MB browser-friendly payloads while rounding all coordinates (geometry + GHS metadata) to 5 decimal places.
- **Unified Cascading Caches:** Intelligent caching ladders that auto-resolve across global `process.env.GADM_CACHE`, active `process.cwd()`, and local workspace `../cache` mounts.
- **Target-Level Subdivision Extraction:** A unified `targetLevel` API design that distinctly differentiates between extracting an outer merged geographic perimeter vs. an array of granular inner subdivided states natively derived from recursive `.merge()` operations.
- **Smart Pre-cacher Script:** Includes `boundaries.ts`, an auto-resuming build script that iterates downwards to pre-calculate, dissolve, and aggressively compress hierarchy layers 05 for instant sub-ms API delivery, bypassing heavy mathematical geometry intersections at runtime.
@ -192,7 +192,7 @@ Higher-level API designed for HTTP handlers. Includes file-based caching via `GA
| Function | Description |
|----------|-------------|
| `searchRegions(opts)` | Search by name, returns metadata or GeoJSON |
| `getBoundary(gadmId, contentLevel?, cache?)` | Get GeoJSON boundary for a GADM ID |
| `getBoundary(gadmId, contentLevel?, cache?, enrichOpts?, resolution?)` | Get GeoJSON boundary for a GADM ID |
| `getRegionNames(opts)` | List sub-region names with depth control |
#### Integration Example (Server API)
@ -311,13 +311,29 @@ When building interactive user interfaces or fetching boundaries through the top
- **Outer Boundary**: Set `targetLevel` exactly equal to the region's intrinsic level (e.g., Targetting `Level 0` for Spain). The engine uses `turf` to automatically dissolve internal geometries, returning a single merged bounding polygon mimicking the total region envelope.
- **Inner Subdivisions**: Provide a `targetLevel` deeper than the intrinsic level (e.g., Targetting `Level 1` for Spain). The engine filters for the exact constituent parts and returns a `FeatureCollection` where each active sub-group (the 17 Spanish States) is a distinctly preserved geometry feature.
### Smart Caching & Geometry Compression
### Geometry Simplification & Resolution
Both the TypeScript and C++ pipelines apply geometry simplification controlled by a `resolution` parameter (default: **4**):
| Resolution | Tolerance | Coordinate Precision | Use Case |
|------------|-----------|---------------------|----------|
| 1 | 0.0001 | 5 decimals | Maximum detail |
| 4 | 0.005 | 5 decimals | Default — good balance |
| 10 | 0.5 | 5 decimals | Maximum compression |
The formula: `tolerance = 0.0001 * 10^((resolution-1) * 4/9)`. GHS metadata coordinates (`ghsPopCenter`, `ghsBuiltCenters`, etc.) are also rounded to 5 decimal places to match geometry precision.
### Smart Caching & Cache Resolution Order
To ensure instantaneous delivery (sub-10ms) of these polygons to your HTTP APIs:
1. **Pre-Caching Script**: You can run `npm run boundaries -- --country=all` manually. This script iterates downwards to compute and compress hierarchical layers 0 through 5 for each invoked country automatically. If a file already exists, it intelligently skips it allowing easy resume.
2. **Cascading Cache Lookups**: The package resolves caches intelligently by searching matching environments: First checking `process.env.GADM_CACHE` (used by explicit production APIs), then `process.cwd()/cache/gadm`, and finally its own local `../cache/gadm` workspace path. You can also actively inject an external Redis or memory cache interface providing `{ get(key), set(key, val) }` as the 3rd argument to `getBoundary`.
3. **Payload Compression (~25MB -> ~1MB)**: Boundary geometries are natively compressed using `@turf/simplify` (with `0.001` tolerance mapping to ~100m geospatial fidelity) immediately prior to caching, ensuring React mapping payloads stay tightly optimized and smooth without crashing browsers.
1. **Pre-Caching Scripts**: Run `npm run boundaries -- --country=all` (TypeScript) or `npm run boundaries:cpp` (C++). Both iterate downwards to compute and compress hierarchical layers 0 through 5 for each country. Existing files are skipped for easy resume.
2. **Cascading Cache Lookups**: The package resolves caches in order:
- Exact sub-region cache file: `boundary_{gadmId}_{level}.json`
- Full country cache file: `boundary_{countryCode}_{level}.json` (prefix-filtered for sub-region queries)
- Environment paths: `process.env.GADM_CACHE`, then `process.cwd()/cache/gadm`, then `../cache/gadm`
- Live GeoPackage query (fallback)
3. **Payload Compression (~25MB -> ~1MB)**: Boundary geometries are compressed using `@turf/simplify` (TS) or GEOS `GEOSSimplify_r` (C++) with matching tolerance, ensuring consistent output from both pipelines.
---
@ -420,15 +436,17 @@ For full batch generation across all 263 countries × 6 levels, the native C++ p
```bash
# Build (requires vcpkg + CMake)
cmake --preset vcpkg-win
cmake --build cpp/build --config Release
npm run build:cpp # or: cmake --build cpp/build --config Release
# Run via npm scripts
npm run boundaries:cpp # all countries
npm run boundaries:cpp -- --country=DEU # single country
npm run boundaries:cpp # all countries
npm run boundaries:cpp -- --country=DEU # single country
# Or directly
.\dist\win-x64\boundaries.exe --country=DEU --level=0 --force
# Sub-region splitting (generates boundary_ESP.6_1_4.json etc.)
npm run boundaries:cpp -- --country=all --level=4 --split-levels=1
# Custom resolution (1-10, default=4)
npm run boundaries:cpp -- --country=DEU --resolution=6
```
Output includes GHS enrichment by default when tiff files are present in `data/ghs/`:
@ -505,15 +523,24 @@ JSON outputs saved to `tests/tree/` for inspection:
```
packages/gadm/
├── cpp/ # C++ native pipeline (GDAL/GEOS/PROJ)
│ ├── src/ # main.cpp, gpkg_reader, geo_merge, ghs_enrich
│ ├── CMakeLists.txt
│ └── vcpkg.json
├── data/
│ ├── gadm_database.parquet # 356K rows, 6.29 MB
│ └── gadm_continent.json # Continent → ISO3 mapping
│ ├── gadm_continent.json # Continent → ISO3 mapping
│ └── ghs/ # GHS GeoTIFF rasters (optional)
├── dist/
│ └── win-x64/ # Compiled C++ binary + DLLs
├── scripts/
│ └── refresh-database.ts # GeoPackage → Parquet converter
├── src/
│ ├── database.ts # Parquet reader (hyparquet)
│ ├── names.ts # Name/code lookup + fuzzy match
│ ├── items.ts # GeoJSON boundaries from CDN
│ ├── gpkg-reader.ts # GeoPackage boundary reader + C++ cache fallback
│ ├── enrich-ghs.ts # GHS GeoTIFF enrichment (TS)
│ ├── wrapper.ts # Server-facing API with cache
│ ├── tree.ts # Tree builder + iterators
│ ├── index.ts # Barrel exports

View File

@ -52,27 +52,36 @@ Output: `dist/linux-x64/boundaries` + `proj.db`
# From the gadm package root (not cpp/)
# All countries, all levels (263 countries x 6 levels)
# Windows:
.\dist\win-x64\boundaries.exe --country=all
# Linux:
./dist/linux-x64/boundaries --country=all
# Single country, all levels
./dist/linux-x64/boundaries --country=DEU
.\dist\win-x64\boundaries.exe --country=DEU
# Single country + level
./dist/linux-x64/boundaries --country=DEU --level=0
.\dist\win-x64\boundaries.exe --country=DEU --level=0
# Sub-region specific (dotted GADM ID)
.\dist\win-x64\boundaries.exe --country=ESP.6_1 --level=4
# Batch sub-region generation for ALL countries
# Generates boundary_ESP.6_1_4.json, boundary_DEU.2_1_3.json, etc.
.\dist\win-x64\boundaries.exe --country=all --level=4 --split-levels=1
# Custom resolution (1=full detail, 10=max simplification, default=4)
.\dist\win-x64\boundaries.exe --country=DEU --resolution=6
# Force regeneration (ignore cached files)
./dist/linux-x64/boundaries --country=NGA --force
.\dist\win-x64\boundaries.exe --country=NGA --force
```
### CLI Options
| Option | Default | Description |
|--------|---------|-------------|
| `--country` | `all` | ISO3 code or `all` for batch |
| `--country` | `all` | ISO3 code, dotted GADM ID (e.g. `ESP.6_1`), or `all` for batch |
| `--level` | `-1` | Admin level 0-5, or -1 for all |
| `--split-levels` | `0` | Comma-separated list of levels to split output files by (e.g. `0,1`) |
| `--resolution` | `4` | Simplification resolution (1=full detail, 10=max simplification) |
| `--cache-dir` | `cache/gadm` | Output directory |
| `--gpkg` | `data/gadm_410-levels.gpkg` | GADM GeoPackage |
| `--continent-json` | `data/gadm_continent.json` | Continent mapping |
@ -80,20 +89,34 @@ Output: `dist/linux-x64/boundaries` + `proj.db`
| `--built-tiff` | `data/ghs/GHS_BUILT_S_...tif` | GHS built-up raster |
| `--force` | `false` | Regenerate even if cached |
#### `--split-levels` explained
By default (`--split-levels=0`), the tool outputs one file per country per level: `boundary_ESP_4.json`.
With `--split-levels=1`, it dynamically queries all distinct level-1 region codes (e.g. `ESP.6_1`, `ESP.1_1`, …) and outputs individual files like `boundary_ESP.6_1_4.json`. This is useful for pre-splitting large countries into smaller, faster-loading cache files.
The TS wrapper automatically discovers these sub-region files before falling back to the full country file.
### npm Scripts
```bash
npm run boundaries:cpp # --country=all (full batch, Windows)
npm run boundaries:cpp -- --country=DEU # single country
npm run build:cpp # rebuild the C++ binary
npm run boundaries:cpp # --country=all (full batch, Windows)
npm run boundaries:cpp -- --country=DEU # single country
```
On Linux, run the binary directly since the npm script points to the Windows path.
## Output
Files written to `cache/gadm/boundary_{CODE}_{LEVEL}.json` — drop-in replacement for the TS pipeline.
Files written to `cache/gadm/` as:
- `boundary_{CODE}_{LEVEL}.json` — full country file (e.g. `boundary_ESP_4.json`)
- `boundary_{GADM_ID}_{LEVEL}.json` — sub-region file when `--split-levels` is used (e.g. `boundary_ESP.6_1_4.json`)
The TS wrapper (`gpkg-reader.ts`) automatically discovers these cache files and serves them through the API. Sub-region queries (e.g. `getBoundary('DEU.2_1', 3)`) are resolved by prefix-filtering the full country file.
The TS wrapper (`gpkg-reader.ts`) resolves cache files in this order:
1. Exact sub-region file: `boundary_{gadmId}_{level}.json`
2. Full country file: `boundary_{countryCode}_{level}.json` (prefix-filtered for sub-region queries)
3. Live GeoPackage query (fallback)
Each feature includes:
- `code` — admin region code

View File

@ -15,12 +15,7 @@ namespace gpkg_reader {
static GDALDatasetH g_ds = nullptr;
static std::string g_current_gpkg = "";
std::vector<boundary::BoundaryFeature> read_features(
const std::string& gpkg_path,
const std::string& country_code,
int level,
double tolerance
) {
static void ensure_open(const std::string& gpkg_path) {
if (!g_ds || g_current_gpkg != gpkg_path) {
if (g_ds) GDALClose(g_ds);
GDALAllRegister();
@ -52,9 +47,60 @@ std::vector<boundary::BoundaryFeature> read_features(
}
g_current_gpkg = gpkg_path;
}
}
std::vector<std::string> get_subregions(
const std::string& gpkg_path,
const std::string& country_code,
int split_level
) {
ensure_open(gpkg_path);
GDALDatasetH ds = g_ds;
std::string layer_name = "ADM_ADM_" + std::to_string(split_level);
OGRLayerH layer = GDALDatasetGetLayerByName(ds, layer_name.c_str());
if (!layer) {
layer = GDALDatasetGetLayer(ds, split_level);
if (!layer) layer = GDALDatasetGetLayer(ds, 0);
}
if (!layer) return {};
std::string actual_layer_name = OGR_L_GetName(layer);
std::string sql = "SELECT DISTINCT GID_" + std::to_string(split_level) + " FROM \"" + actual_layer_name + "\" WHERE GID_0 = '" + country_code + "'";
OGRLayerH query_layer = GDALDatasetExecuteSQL(ds, sql.c_str(), nullptr, "SQLite");
if (!query_layer) return {};
std::vector<std::string> subregions;
OGRFeatureH feat;
OGR_L_ResetReading(query_layer);
while ((feat = OGR_L_GetNextFeature(query_layer)) != nullptr) {
const char* val = OGR_F_GetFieldAsString(feat, 0);
if (val) {
subregions.push_back(val);
}
OGR_F_Destroy(feat);
}
GDALDatasetReleaseResultSet(ds, query_layer);
return subregions;
}
std::vector<boundary::BoundaryFeature> read_features(
const std::string& gpkg_path,
const std::string& country_code,
int level,
double tolerance
) {
ensure_open(gpkg_path);
GDALDatasetH ds = g_ds;
// Parse dot_count for country_code filtering (e.g. ESP.6_1)
std::string gadm_id = country_code;
int dot_count = std::count(gadm_id.begin(), gadm_id.end(), '.');
std::string iso_code = dot_count > 0 ? gadm_id.substr(0, gadm_id.find('.')) : gadm_id;
// Find the correct layer and its actual name
std::string layer_name = "ADM_ADM_" + std::to_string(level);
OGRLayerH layer = GDALDatasetGetLayerByName(ds, layer_name.c_str());
@ -67,12 +113,19 @@ std::vector<boundary::BoundaryFeature> read_features(
std::string actual_layer_name = OGR_L_GetName(layer);
// Execute SQL query directly via SQLite dialect to force index usage
std::string sql = "SELECT * FROM \"" + actual_layer_name + "\" WHERE GID_0 = '" + country_code + "'";
std::string sql = "SELECT * FROM \"" + actual_layer_name + "\" WHERE GID_0 = '" + iso_code + "'";
if (dot_count > 0 && dot_count <= level) {
sql += " AND GID_" + std::to_string(dot_count) + " = '" + gadm_id + "'";
}
OGRLayerH query_layer = GDALDatasetExecuteSQL(ds, sql.c_str(), nullptr, "SQLite");
if (!query_layer) {
// Fallback to OGR filtering if ExecuteSQL fails
OGR_L_SetAttributeFilter(layer, ("GID_0 = '" + country_code + "'").c_str());
std::string filter = "GID_0 = '" + iso_code + "'";
if (dot_count > 0 && dot_count <= level) {
filter += " AND GID_" + std::to_string(dot_count) + " = '" + gadm_id + "'";
}
OGR_L_SetAttributeFilter(layer, filter.c_str());
query_layer = layer;
}
@ -114,7 +167,7 @@ std::vector<boundary::BoundaryFeature> read_features(
bf.minY = env.MinY;
bf.maxY = env.MaxY;
int wkbSize = OGR_G_WkbSize(out_geom);
size_t wkbSize = OGR_G_WkbSize(out_geom);
bf.wkb.resize(wkbSize);
OGR_G_ExportToWkb(out_geom, wkbNDR, bf.wkb.data());

View File

@ -9,6 +9,15 @@ namespace gpkg_reader {
/// Read features from a GeoPackage file, filtered by country (GID_0) and admin level.
/// Groups features by GID_{level} and returns one BoundaryFeature per group
/// with its geometry as a GEOS handle.
/// Retrieve a list of distinct GID_{split_level} values for a given country code
std::vector<std::string> get_subregions(
const std::string& gpkg_path,
const std::string& country_code,
int split_level
);
/// Read features from a GeoPackage file, filtered by country (GID_0) and admin level.
/// Can also filter directly by sub-region if country_code is a dotted GADM ID (e.g. ESP.6_1).
std::vector<boundary::BoundaryFeature> read_features(
const std::string& gpkg_path,
const std::string& country_code,

View File

@ -44,7 +44,8 @@ int main(int argc, char *argv[]) {
std::string pop_tiff = "data/ghs/GHS_POP_E2030_GLOBE_R2023A_54009_100_V1_0.tif";
std::string built_tiff = "data/ghs/GHS_BUILT_S_E2030_GLOBE_R2023A_54009_100_V1_0.tif";
bool force = false;
int resolution = 1;
int resolution = 4; // Use 4 as default matching the TS wrapper
std::vector<int> split_levels = {0};
app.add_option("--country", country, "ISO3 country code, or 'all' for batch")
->default_val("all");
@ -64,7 +65,10 @@ int main(int argc, char *argv[]) {
->default_val("data/ghs/GHS_BUILT_S_E2030_GLOBE_R2023A_54009_100_V1_0.tif");
app.add_flag("--force", force, "Regenerate even if cache file exists");
app.add_option("--resolution", resolution, "Simplification resolution (1=full to 10=max simplification)")
->default_val(1);
->default_val(4);
app.add_option("--split-levels", split_levels, "Comma-separated levels to split output files (e.g. 1,2,3)")
->default_val(std::vector<int>{0})
->delimiter(',');
CLI11_PARSE(app, argc, argv);
@ -161,26 +165,37 @@ int main(int argc, char *argv[]) {
const auto &code = countries[i];
for (int lvl : levels) {
std::string filename =
"boundary_" + code + "_" + std::to_string(lvl) + ".json";
fs::path out_path = fs::path(cache_dir) / filename;
for (int split_lvl : split_levels) {
if (split_lvl > lvl) continue;
// Skip if cached
if (!force && file_exists(out_path)) {
++skipped;
continue;
}
try {
// 1. Read features from GeoPackage
auto t0 = std::chrono::high_resolution_clock::now();
auto features = gpkg_reader::read_features(gpkg_path, code, lvl, tolerance);
auto t1 = std::chrono::high_resolution_clock::now();
if (features.empty()) {
++skipped;
continue;
std::vector<std::string> regions;
if (split_lvl == 0) {
regions.push_back(code);
} else {
regions = gpkg_reader::get_subregions(gpkg_path, code, split_lvl);
}
for (const auto& region_id : regions) {
std::string filename =
"boundary_" + region_id + "_" + std::to_string(lvl) + ".json";
fs::path out_path = fs::path(cache_dir) / filename;
// Skip if cached
if (!force && file_exists(out_path)) {
++skipped;
continue;
}
try {
// 1. Read features from GeoPackage
auto t0 = std::chrono::high_resolution_clock::now();
auto features = gpkg_reader::read_features(gpkg_path, region_id, lvl, tolerance);
auto t1 = std::chrono::high_resolution_clock::now();
if (features.empty()) {
++skipped;
continue;
}
// 2. Merge geometries
auto merged = geo_merge::merge(features);
auto t2 = std::chrono::high_resolution_clock::now();
@ -201,7 +216,7 @@ int main(int argc, char *argv[]) {
props["NAME_" + std::to_string(lvl)] = feat.name;
props["GID_" + std::to_string(lvl)] = feat.code;
int dotCount = std::count(feat.code.begin(), feat.code.end(), '.');
int dotCount = static_cast<int>(std::count(feat.code.begin(), feat.code.end(), '.'));
if (lvl == dotCount) {
props["isOuter"] = true;
}
@ -261,9 +276,11 @@ int main(int argc, char *argv[]) {
"write:" + std::to_string(ms_write) + "ms " +
"Total:" + std::to_string(ms_total) + "ms");
} catch (const std::exception &e) {
++errors;
logger::error(code + " L" + std::to_string(lvl) + ": " + e.what());
} catch (const std::exception &e) {
++errors;
logger::error(region_id + " L" + std::to_string(lvl) + ": " + e.what());
}
}
}
}
}

70
dist/gpkg-reader.js vendored

File diff suppressed because one or more lines are too long

Binary file not shown.

40
docs/geo.md Normal file
View File

@ -0,0 +1,40 @@
# Geospatial Architecture & Rendering Strategy
This document outlines the architectural decisions and trade-offs regarding how we process, store, and render GADM administrative boundaries in our web application using MapLibre GL JS.
## The Problem: High-Resolution Boundaries vs. Web Performance
GADM boundaries are highly accurate and meticulously detailed. A single raw boundary for a Region/Level 2 area (like a large province in Poland) can easily exceed 5MB in its raw GeoJSON format.
Sending 5MB+ payloads to a frontend map client to render a basic region outline is an architectural anti-pattern for several reasons:
- **Bandwidth:** High download times break the fluidity of the app, especially on slower connections or mobile devices.
- **Parsing Overhead:** Browsers struggle and block the main thread when parsing JSON payloads of this size.
- **Rendering Lag:** Rendering hundreds of thousands of microscopic, tightly-packed vertices degrades panning and zooming performance because the GPU is forced to process unneeded geometry.
## Evaluation of Architectural Approaches
### 1. Geometry Simplification (Chosen Approach)
Instead of serving 100% full-resolution geometries, we apply a geometry simplification algorithm (such as Douglas-Peucker via `OGR_G_SimplifyPreserveTopology` in GDAL) during the backend C++ pipeline step before the files are saved to the cache.
By using a small geographical tolerance (e.g., `0.001` to `0.005` degrees), we eliminate up to 90-95% of vertices that lie essentially on a straight line.
- **Pros:** A 5MB file drops to ~100-200KB with zero visually perceptible difference at standard region-view zoom levels. It is extremely easy to cache and serve over a standard API without changing the frontend geometry-loading concepts.
- **Cons:** Shared borders between different polygons can sometimes develop tiny gaps or slivers if simplified individually without a shared topological graph/mesh.
### 2. Mapbox Vector Tiles (MVT / Protocol Buffers)
Instead of returning monolithic GeoJSON boundary files, the map client natively requests data in small 256x256 pixel binary tiles (`.mvt` or `.pbf`). The geometries within these tiles are aggressively simplified and clipped based on the user's current zoom level.
- **Pros:** Phenomenal performance for rendering global, tremendously heavy datasets. Only data intersecting the user's viewport is loaded at any given time.
- **Cons:** High infrastructure overhead. It requires running an active live tile server (like `pg_tileserv` or `martin`) or pre-generating massive pyramidal tile caches (e.g., using `tippecanoe`).
### 3. TopoJSON
An extension of GeoJSON that encodes *topology* rather than discrete geometries. Shared borders between adjacent regions are only recorded once.
- **Pros:** Can shrink file sizes by up to 80% compared to GeoJSON. Completely eliminates topological gaps when scaling/simplifying.
- **Cons:** Requires the frontend to bundle and run `topojson-client` to decode the data back into GeoJSON on-the-fly before MapLibre can consume it natively.
## Conclusion & Current Direction
Given our specific usecase: **We are loading boundaries into MapLibre primarily to display outlines and population centers for *certain/selected* GADM regions, rather than rendering the entire administrative globe all at once.**
Because we are operating on an "on-demand, selected region" basis, setting up a full-blown Vector Tile infrastructure (Option 2) introduces unnecessary complexity.
**Our strategy is to leverage Option 1: Pre-cached, heavily simplified GeoJSON.**
By applying a GDAL geometry simplification threshold and limiting coordinate precision down to 5 decimal places during the C++ build pipeline, we yield lightweight, highly performant payloads that MapLibre can ingest natively. This solves the file size bottleneck while preserving our simple and highly efficient file-based caching architecture.

View File

@ -27,6 +27,7 @@
},
"scripts": {
"build": "tsc",
"build:cpp": "cmake --build cpp/build --config Release",
"dev": "tsc -w",
"tests": "vitest run",
"test:boundaries": "vitest run src/__tests__/boundary.test.ts",
@ -37,6 +38,7 @@
"boundaries": "npx tsx scripts/boundaries.ts",
"boundaries:cpp": ".\\dist\\win-x64\\boundaries.exe",
"boundaries:cpp:all": ".\\dist\\win-x64\\boundaries.exe --country=all",
"boundaries:cpp:all:split": ".\\dist\\win-x64\\boundaries.exe --country=all --split-levels=1,2",
"boundaries:cpp:linux": "./dist/linux-x64/boundaries",
"boundaries:cpp:linux:all": "./dist/linux-x64/boundaries --country=all",
"refresh": "npx tsx scripts/refresh-database.ts"
@ -60,4 +62,4 @@
"simplify-js": "1.2.4",
"zod": "^4.3.6"
}
}
}

View File

@ -415,46 +415,63 @@ export async function getBoundaryFromGpkg(
const gidLevel = dotCount;
const resolvedLevel = contentLevel != null ? contentLevel : gidLevel;
const countryCode = gadmId.split('.')[0];
const cppFileName = `boundary_${countryCode}_${resolvedLevel}.json`;
// C++ outputs sub-region precise files too if batched with `--split-levels`.
// We look for exact gadmId match first, then fall back to full country.
const fallbackNames = [
`boundary_${gadmId}_${resolvedLevel}.json`,
`boundary_${countryCode}_${resolvedLevel}.json`
];
// Remove duplicates if gadmId == countryCode
const uniqueFallbackNames = [...new Set(fallbackNames)];
// Build a prefix for sub-region filtering:
// 'DEU' (country-level) → no filter needed, return all features
// 'DEU.2_1' → prefix 'DEU.2.' matches DEU.2.91_1, DEU.2.91.1_1, etc.
// Build a prefix for sub-region filtering (used mostly for the full-country fallback)
const isSubRegion = gadmId.includes('.');
const gidPrefix = isSubRegion
? gadmId.replace(/_\d+$/, '') + '.' // strip version suffix, add dot
: null;
for (const dir of uniqueCacheDirs) {
const cppFile = join(dir, cppFileName);
if (existsSync(cppFile)) {
try {
const raw = JSON.parse(readFileSync(cppFile, 'utf-8'));
if (raw.features && Array.isArray(raw.features)) {
console.log(`Loading from CPP Cache File ${cppFile} : ${gidPrefix}`)
for (const cppFileName of uniqueFallbackNames) {
for (const dir of uniqueCacheDirs) {
const cppFile = join(dir, cppFileName);
if (existsSync(cppFile)) {
try {
const raw = JSON.parse(readFileSync(cppFile, 'utf-8'));
if (raw.features && Array.isArray(raw.features)) {
console.log(`[gpkg-reader] Loading from CPP Cache File ${cppFile} : ${gidPrefix || 'country-wide'}`)
let rawFeatures = raw.features;
let rawFeatures = raw.features;
// Filter by GID prefix if we had to read the fallback country-wide file
// (If we loaded the exact sub-region file, it's already perfectly chunked)
if (gidPrefix && cppFileName.includes(countryCode) && !cppFileName.includes(gadmId)) {
rawFeatures = rawFeatures.filter((f: any) =>
f.code && f.code.startsWith(gidPrefix)
);
}
// Filter by GID prefix for sub-region queries
if (gidPrefix) {
rawFeatures = rawFeatures.filter((f: any) =>
f.code && f.code.startsWith(gidPrefix)
);
if (rawFeatures.length === 0) continue; // no matches, try next dir
}
const features: BoundaryFeature[] = rawFeatures.map((f: any) => {
const { geometry, code, name, ...enrichment } = f;
return {
type: 'Feature' as const,
properties: { name, code, ...enrichment },
geometry,
};
});
return { type: 'FeatureCollection', features };
const features: BoundaryFeature[] = rawFeatures.map((f: any) => {
const { geometry, code, name, ...enrichment } = f;
return {
type: 'Feature' as const,
properties: { name, code, ...enrichment },
geometry,
};
});
const result: BoundaryResult = { type: 'FeatureCollection', features };
const outCacheFile = join(uniqueCacheDirs[1] || uniqueCacheDirs[0], `${cacheKey}.json`);
try {
writeFileSync(outCacheFile, JSON.stringify(result));
} catch (e) { /* ignore */ }
return result;
}
} catch (e) {
console.warn(`[gpkg-reader] Failed to read C++ cache from ${cppFile}:`, e);
}
} catch (e) {
console.warn(`[gpkg-reader] Failed to read C++ cache from ${cppFile}:`, e);
}
}
}