optimization 3/3
This commit is contained in:
parent
3c0c42a9b2
commit
12d38ef56b
53
README.md
53
README.md
@ -41,7 +41,7 @@ This package is a direct Node.js/TypeScript port of the excellent Python library
|
||||
|
||||
While bringing these capabilities natively to the javascript ecosystem, we built and added several critical enhancements designed specifically for web applications and browser performance:
|
||||
|
||||
- **Aggressive Geometry Simplification:** Natively integrates `@turf/simplify` (0.001 tolerance, high-quality) to instantly compress raw unoptimized 25MB boundary polygons down to ~1MB browser-friendly payloads.
|
||||
- **Aggressive Geometry Simplification:** Natively integrates `@turf/simplify` and `@turf/truncate` with a configurable `resolution` parameter (1=full detail, 10=max simplification, default=4). Compresses raw unoptimized 25MB boundary polygons down to ~1MB browser-friendly payloads while rounding all coordinates (geometry + GHS metadata) to 5 decimal places.
|
||||
- **Unified Cascading Caches:** Intelligent caching ladders that auto-resolve across global `process.env.GADM_CACHE`, active `process.cwd()`, and local workspace `../cache` mounts.
|
||||
- **Target-Level Subdivision Extraction:** A unified `targetLevel` API design that distinctly differentiates between extracting an outer merged geographic perimeter vs. an array of granular inner subdivided states natively derived from recursive `.merge()` operations.
|
||||
- **Smart Pre-cacher Script:** Includes `boundaries.ts`, an auto-resuming build script that iterates downwards to pre-calculate, dissolve, and aggressively compress hierarchy layers 0–5 for instant sub-ms API delivery, bypassing heavy mathematical geometry intersections at runtime.
|
||||
@ -192,7 +192,7 @@ Higher-level API designed for HTTP handlers. Includes file-based caching via `GA
|
||||
| Function | Description |
|
||||
|----------|-------------|
|
||||
| `searchRegions(opts)` | Search by name, returns metadata or GeoJSON |
|
||||
| `getBoundary(gadmId, contentLevel?, cache?)` | Get GeoJSON boundary for a GADM ID |
|
||||
| `getBoundary(gadmId, contentLevel?, cache?, enrichOpts?, resolution?)` | Get GeoJSON boundary for a GADM ID |
|
||||
| `getRegionNames(opts)` | List sub-region names with depth control |
|
||||
|
||||
#### Integration Example (Server API)
|
||||
@ -311,13 +311,29 @@ When building interactive user interfaces or fetching boundaries through the top
|
||||
- **Outer Boundary**: Set `targetLevel` exactly equal to the region's intrinsic level (e.g., Targetting `Level 0` for Spain). The engine uses `turf` to automatically dissolve internal geometries, returning a single merged bounding polygon mimicking the total region envelope.
|
||||
- **Inner Subdivisions**: Provide a `targetLevel` deeper than the intrinsic level (e.g., Targetting `Level 1` for Spain). The engine filters for the exact constituent parts and returns a `FeatureCollection` where each active sub-group (the 17 Spanish States) is a distinctly preserved geometry feature.
|
||||
|
||||
### Smart Caching & Geometry Compression
|
||||
### Geometry Simplification & Resolution
|
||||
|
||||
Both the TypeScript and C++ pipelines apply geometry simplification controlled by a `resolution` parameter (default: **4**):
|
||||
|
||||
| Resolution | Tolerance | Coordinate Precision | Use Case |
|
||||
|------------|-----------|---------------------|----------|
|
||||
| 1 | 0.0001 | 5 decimals | Maximum detail |
|
||||
| 4 | 0.005 | 5 decimals | Default — good balance |
|
||||
| 10 | 0.5 | 5 decimals | Maximum compression |
|
||||
|
||||
The formula: `tolerance = 0.0001 * 10^((resolution-1) * 4/9)`. GHS metadata coordinates (`ghsPopCenter`, `ghsBuiltCenters`, etc.) are also rounded to 5 decimal places to match geometry precision.
|
||||
|
||||
### Smart Caching & Cache Resolution Order
|
||||
|
||||
To ensure instantaneous delivery (sub-10ms) of these polygons to your HTTP APIs:
|
||||
|
||||
1. **Pre-Caching Script**: You can run `npm run boundaries -- --country=all` manually. This script iterates downwards to compute and compress hierarchical layers 0 through 5 for each invoked country automatically. If a file already exists, it intelligently skips it allowing easy resume.
|
||||
2. **Cascading Cache Lookups**: The package resolves caches intelligently by searching matching environments: First checking `process.env.GADM_CACHE` (used by explicit production APIs), then `process.cwd()/cache/gadm`, and finally its own local `../cache/gadm` workspace path. You can also actively inject an external Redis or memory cache interface providing `{ get(key), set(key, val) }` as the 3rd argument to `getBoundary`.
|
||||
3. **Payload Compression (~25MB -> ~1MB)**: Boundary geometries are natively compressed using `@turf/simplify` (with `0.001` tolerance mapping to ~100m geospatial fidelity) immediately prior to caching, ensuring React mapping payloads stay tightly optimized and smooth without crashing browsers.
|
||||
1. **Pre-Caching Scripts**: Run `npm run boundaries -- --country=all` (TypeScript) or `npm run boundaries:cpp` (C++). Both iterate downwards to compute and compress hierarchical layers 0 through 5 for each country. Existing files are skipped for easy resume.
|
||||
2. **Cascading Cache Lookups**: The package resolves caches in order:
|
||||
- Exact sub-region cache file: `boundary_{gadmId}_{level}.json`
|
||||
- Full country cache file: `boundary_{countryCode}_{level}.json` (prefix-filtered for sub-region queries)
|
||||
- Environment paths: `process.env.GADM_CACHE`, then `process.cwd()/cache/gadm`, then `../cache/gadm`
|
||||
- Live GeoPackage query (fallback)
|
||||
3. **Payload Compression (~25MB -> ~1MB)**: Boundary geometries are compressed using `@turf/simplify` (TS) or GEOS `GEOSSimplify_r` (C++) with matching tolerance, ensuring consistent output from both pipelines.
|
||||
|
||||
---
|
||||
|
||||
@ -420,15 +436,17 @@ For full batch generation across all 263 countries × 6 levels, the native C++ p
|
||||
|
||||
```bash
|
||||
# Build (requires vcpkg + CMake)
|
||||
cmake --preset vcpkg-win
|
||||
cmake --build cpp/build --config Release
|
||||
npm run build:cpp # or: cmake --build cpp/build --config Release
|
||||
|
||||
# Run via npm scripts
|
||||
npm run boundaries:cpp # all countries
|
||||
npm run boundaries:cpp -- --country=DEU # single country
|
||||
npm run boundaries:cpp # all countries
|
||||
npm run boundaries:cpp -- --country=DEU # single country
|
||||
|
||||
# Or directly
|
||||
.\dist\win-x64\boundaries.exe --country=DEU --level=0 --force
|
||||
# Sub-region splitting (generates boundary_ESP.6_1_4.json etc.)
|
||||
npm run boundaries:cpp -- --country=all --level=4 --split-levels=1
|
||||
|
||||
# Custom resolution (1-10, default=4)
|
||||
npm run boundaries:cpp -- --country=DEU --resolution=6
|
||||
```
|
||||
|
||||
Output includes GHS enrichment by default when tiff files are present in `data/ghs/`:
|
||||
@ -505,15 +523,24 @@ JSON outputs saved to `tests/tree/` for inspection:
|
||||
|
||||
```
|
||||
packages/gadm/
|
||||
├── cpp/ # C++ native pipeline (GDAL/GEOS/PROJ)
|
||||
│ ├── src/ # main.cpp, gpkg_reader, geo_merge, ghs_enrich
|
||||
│ ├── CMakeLists.txt
|
||||
│ └── vcpkg.json
|
||||
├── data/
|
||||
│ ├── gadm_database.parquet # 356K rows, 6.29 MB
|
||||
│ └── gadm_continent.json # Continent → ISO3 mapping
|
||||
│ ├── gadm_continent.json # Continent → ISO3 mapping
|
||||
│ └── ghs/ # GHS GeoTIFF rasters (optional)
|
||||
├── dist/
|
||||
│ └── win-x64/ # Compiled C++ binary + DLLs
|
||||
├── scripts/
|
||||
│ └── refresh-database.ts # GeoPackage → Parquet converter
|
||||
├── src/
|
||||
│ ├── database.ts # Parquet reader (hyparquet)
|
||||
│ ├── names.ts # Name/code lookup + fuzzy match
|
||||
│ ├── items.ts # GeoJSON boundaries from CDN
|
||||
│ ├── gpkg-reader.ts # GeoPackage boundary reader + C++ cache fallback
|
||||
│ ├── enrich-ghs.ts # GHS GeoTIFF enrichment (TS)
|
||||
│ ├── wrapper.ts # Server-facing API with cache
|
||||
│ ├── tree.ts # Tree builder + iterators
|
||||
│ ├── index.ts # Barrel exports
|
||||
|
||||
@ -52,27 +52,36 @@ Output: `dist/linux-x64/boundaries` + `proj.db`
|
||||
# From the gadm package root (not cpp/)
|
||||
|
||||
# All countries, all levels (263 countries x 6 levels)
|
||||
# Windows:
|
||||
.\dist\win-x64\boundaries.exe --country=all
|
||||
# Linux:
|
||||
./dist/linux-x64/boundaries --country=all
|
||||
|
||||
# Single country, all levels
|
||||
./dist/linux-x64/boundaries --country=DEU
|
||||
.\dist\win-x64\boundaries.exe --country=DEU
|
||||
|
||||
# Single country + level
|
||||
./dist/linux-x64/boundaries --country=DEU --level=0
|
||||
.\dist\win-x64\boundaries.exe --country=DEU --level=0
|
||||
|
||||
# Sub-region specific (dotted GADM ID)
|
||||
.\dist\win-x64\boundaries.exe --country=ESP.6_1 --level=4
|
||||
|
||||
# Batch sub-region generation for ALL countries
|
||||
# Generates boundary_ESP.6_1_4.json, boundary_DEU.2_1_3.json, etc.
|
||||
.\dist\win-x64\boundaries.exe --country=all --level=4 --split-levels=1
|
||||
|
||||
# Custom resolution (1=full detail, 10=max simplification, default=4)
|
||||
.\dist\win-x64\boundaries.exe --country=DEU --resolution=6
|
||||
|
||||
# Force regeneration (ignore cached files)
|
||||
./dist/linux-x64/boundaries --country=NGA --force
|
||||
.\dist\win-x64\boundaries.exe --country=NGA --force
|
||||
```
|
||||
|
||||
### CLI Options
|
||||
|
||||
| Option | Default | Description |
|
||||
|--------|---------|-------------|
|
||||
| `--country` | `all` | ISO3 code or `all` for batch |
|
||||
| `--country` | `all` | ISO3 code, dotted GADM ID (e.g. `ESP.6_1`), or `all` for batch |
|
||||
| `--level` | `-1` | Admin level 0-5, or -1 for all |
|
||||
| `--split-levels` | `0` | Comma-separated list of levels to split output files by (e.g. `0,1`) |
|
||||
| `--resolution` | `4` | Simplification resolution (1=full detail, 10=max simplification) |
|
||||
| `--cache-dir` | `cache/gadm` | Output directory |
|
||||
| `--gpkg` | `data/gadm_410-levels.gpkg` | GADM GeoPackage |
|
||||
| `--continent-json` | `data/gadm_continent.json` | Continent mapping |
|
||||
@ -80,20 +89,34 @@ Output: `dist/linux-x64/boundaries` + `proj.db`
|
||||
| `--built-tiff` | `data/ghs/GHS_BUILT_S_...tif` | GHS built-up raster |
|
||||
| `--force` | `false` | Regenerate even if cached |
|
||||
|
||||
#### `--split-levels` explained
|
||||
|
||||
By default (`--split-levels=0`), the tool outputs one file per country per level: `boundary_ESP_4.json`.
|
||||
|
||||
With `--split-levels=1`, it dynamically queries all distinct level-1 region codes (e.g. `ESP.6_1`, `ESP.1_1`, …) and outputs individual files like `boundary_ESP.6_1_4.json`. This is useful for pre-splitting large countries into smaller, faster-loading cache files.
|
||||
|
||||
The TS wrapper automatically discovers these sub-region files before falling back to the full country file.
|
||||
|
||||
### npm Scripts
|
||||
|
||||
```bash
|
||||
npm run boundaries:cpp # --country=all (full batch, Windows)
|
||||
npm run boundaries:cpp -- --country=DEU # single country
|
||||
npm run build:cpp # rebuild the C++ binary
|
||||
npm run boundaries:cpp # --country=all (full batch, Windows)
|
||||
npm run boundaries:cpp -- --country=DEU # single country
|
||||
```
|
||||
|
||||
On Linux, run the binary directly since the npm script points to the Windows path.
|
||||
|
||||
## Output
|
||||
|
||||
Files written to `cache/gadm/boundary_{CODE}_{LEVEL}.json` — drop-in replacement for the TS pipeline.
|
||||
Files written to `cache/gadm/` as:
|
||||
- `boundary_{CODE}_{LEVEL}.json` — full country file (e.g. `boundary_ESP_4.json`)
|
||||
- `boundary_{GADM_ID}_{LEVEL}.json` — sub-region file when `--split-levels` is used (e.g. `boundary_ESP.6_1_4.json`)
|
||||
|
||||
The TS wrapper (`gpkg-reader.ts`) automatically discovers these cache files and serves them through the API. Sub-region queries (e.g. `getBoundary('DEU.2_1', 3)`) are resolved by prefix-filtering the full country file.
|
||||
The TS wrapper (`gpkg-reader.ts`) resolves cache files in this order:
|
||||
1. Exact sub-region file: `boundary_{gadmId}_{level}.json`
|
||||
2. Full country file: `boundary_{countryCode}_{level}.json` (prefix-filtered for sub-region queries)
|
||||
3. Live GeoPackage query (fallback)
|
||||
|
||||
Each feature includes:
|
||||
- `code` — admin region code
|
||||
|
||||
@ -15,12 +15,7 @@ namespace gpkg_reader {
|
||||
static GDALDatasetH g_ds = nullptr;
|
||||
static std::string g_current_gpkg = "";
|
||||
|
||||
std::vector<boundary::BoundaryFeature> read_features(
|
||||
const std::string& gpkg_path,
|
||||
const std::string& country_code,
|
||||
int level,
|
||||
double tolerance
|
||||
) {
|
||||
static void ensure_open(const std::string& gpkg_path) {
|
||||
if (!g_ds || g_current_gpkg != gpkg_path) {
|
||||
if (g_ds) GDALClose(g_ds);
|
||||
GDALAllRegister();
|
||||
@ -52,9 +47,60 @@ std::vector<boundary::BoundaryFeature> read_features(
|
||||
}
|
||||
g_current_gpkg = gpkg_path;
|
||||
}
|
||||
}
|
||||
|
||||
std::vector<std::string> get_subregions(
|
||||
const std::string& gpkg_path,
|
||||
const std::string& country_code,
|
||||
int split_level
|
||||
) {
|
||||
ensure_open(gpkg_path);
|
||||
GDALDatasetH ds = g_ds;
|
||||
|
||||
std::string layer_name = "ADM_ADM_" + std::to_string(split_level);
|
||||
OGRLayerH layer = GDALDatasetGetLayerByName(ds, layer_name.c_str());
|
||||
if (!layer) {
|
||||
layer = GDALDatasetGetLayer(ds, split_level);
|
||||
if (!layer) layer = GDALDatasetGetLayer(ds, 0);
|
||||
}
|
||||
if (!layer) return {};
|
||||
|
||||
std::string actual_layer_name = OGR_L_GetName(layer);
|
||||
std::string sql = "SELECT DISTINCT GID_" + std::to_string(split_level) + " FROM \"" + actual_layer_name + "\" WHERE GID_0 = '" + country_code + "'";
|
||||
|
||||
OGRLayerH query_layer = GDALDatasetExecuteSQL(ds, sql.c_str(), nullptr, "SQLite");
|
||||
if (!query_layer) return {};
|
||||
|
||||
std::vector<std::string> subregions;
|
||||
OGRFeatureH feat;
|
||||
OGR_L_ResetReading(query_layer);
|
||||
while ((feat = OGR_L_GetNextFeature(query_layer)) != nullptr) {
|
||||
const char* val = OGR_F_GetFieldAsString(feat, 0);
|
||||
if (val) {
|
||||
subregions.push_back(val);
|
||||
}
|
||||
OGR_F_Destroy(feat);
|
||||
}
|
||||
GDALDatasetReleaseResultSet(ds, query_layer);
|
||||
|
||||
return subregions;
|
||||
}
|
||||
|
||||
std::vector<boundary::BoundaryFeature> read_features(
|
||||
const std::string& gpkg_path,
|
||||
const std::string& country_code,
|
||||
int level,
|
||||
double tolerance
|
||||
) {
|
||||
ensure_open(gpkg_path);
|
||||
|
||||
GDALDatasetH ds = g_ds;
|
||||
|
||||
// Parse dot_count for country_code filtering (e.g. ESP.6_1)
|
||||
std::string gadm_id = country_code;
|
||||
int dot_count = std::count(gadm_id.begin(), gadm_id.end(), '.');
|
||||
std::string iso_code = dot_count > 0 ? gadm_id.substr(0, gadm_id.find('.')) : gadm_id;
|
||||
|
||||
// Find the correct layer and its actual name
|
||||
std::string layer_name = "ADM_ADM_" + std::to_string(level);
|
||||
OGRLayerH layer = GDALDatasetGetLayerByName(ds, layer_name.c_str());
|
||||
@ -67,12 +113,19 @@ std::vector<boundary::BoundaryFeature> read_features(
|
||||
std::string actual_layer_name = OGR_L_GetName(layer);
|
||||
|
||||
// Execute SQL query directly via SQLite dialect to force index usage
|
||||
std::string sql = "SELECT * FROM \"" + actual_layer_name + "\" WHERE GID_0 = '" + country_code + "'";
|
||||
std::string sql = "SELECT * FROM \"" + actual_layer_name + "\" WHERE GID_0 = '" + iso_code + "'";
|
||||
if (dot_count > 0 && dot_count <= level) {
|
||||
sql += " AND GID_" + std::to_string(dot_count) + " = '" + gadm_id + "'";
|
||||
}
|
||||
OGRLayerH query_layer = GDALDatasetExecuteSQL(ds, sql.c_str(), nullptr, "SQLite");
|
||||
|
||||
if (!query_layer) {
|
||||
// Fallback to OGR filtering if ExecuteSQL fails
|
||||
OGR_L_SetAttributeFilter(layer, ("GID_0 = '" + country_code + "'").c_str());
|
||||
std::string filter = "GID_0 = '" + iso_code + "'";
|
||||
if (dot_count > 0 && dot_count <= level) {
|
||||
filter += " AND GID_" + std::to_string(dot_count) + " = '" + gadm_id + "'";
|
||||
}
|
||||
OGR_L_SetAttributeFilter(layer, filter.c_str());
|
||||
query_layer = layer;
|
||||
}
|
||||
|
||||
@ -114,7 +167,7 @@ std::vector<boundary::BoundaryFeature> read_features(
|
||||
bf.minY = env.MinY;
|
||||
bf.maxY = env.MaxY;
|
||||
|
||||
int wkbSize = OGR_G_WkbSize(out_geom);
|
||||
size_t wkbSize = OGR_G_WkbSize(out_geom);
|
||||
bf.wkb.resize(wkbSize);
|
||||
OGR_G_ExportToWkb(out_geom, wkbNDR, bf.wkb.data());
|
||||
|
||||
|
||||
@ -9,6 +9,15 @@ namespace gpkg_reader {
|
||||
/// Read features from a GeoPackage file, filtered by country (GID_0) and admin level.
|
||||
/// Groups features by GID_{level} and returns one BoundaryFeature per group
|
||||
/// with its geometry as a GEOS handle.
|
||||
/// Retrieve a list of distinct GID_{split_level} values for a given country code
|
||||
std::vector<std::string> get_subregions(
|
||||
const std::string& gpkg_path,
|
||||
const std::string& country_code,
|
||||
int split_level
|
||||
);
|
||||
|
||||
/// Read features from a GeoPackage file, filtered by country (GID_0) and admin level.
|
||||
/// Can also filter directly by sub-region if country_code is a dotted GADM ID (e.g. ESP.6_1).
|
||||
std::vector<boundary::BoundaryFeature> read_features(
|
||||
const std::string& gpkg_path,
|
||||
const std::string& country_code,
|
||||
|
||||
@ -44,7 +44,8 @@ int main(int argc, char *argv[]) {
|
||||
std::string pop_tiff = "data/ghs/GHS_POP_E2030_GLOBE_R2023A_54009_100_V1_0.tif";
|
||||
std::string built_tiff = "data/ghs/GHS_BUILT_S_E2030_GLOBE_R2023A_54009_100_V1_0.tif";
|
||||
bool force = false;
|
||||
int resolution = 1;
|
||||
int resolution = 4; // Use 4 as default matching the TS wrapper
|
||||
std::vector<int> split_levels = {0};
|
||||
|
||||
app.add_option("--country", country, "ISO3 country code, or 'all' for batch")
|
||||
->default_val("all");
|
||||
@ -64,7 +65,10 @@ int main(int argc, char *argv[]) {
|
||||
->default_val("data/ghs/GHS_BUILT_S_E2030_GLOBE_R2023A_54009_100_V1_0.tif");
|
||||
app.add_flag("--force", force, "Regenerate even if cache file exists");
|
||||
app.add_option("--resolution", resolution, "Simplification resolution (1=full to 10=max simplification)")
|
||||
->default_val(1);
|
||||
->default_val(4);
|
||||
app.add_option("--split-levels", split_levels, "Comma-separated levels to split output files (e.g. 1,2,3)")
|
||||
->default_val(std::vector<int>{0})
|
||||
->delimiter(',');
|
||||
|
||||
CLI11_PARSE(app, argc, argv);
|
||||
|
||||
@ -161,26 +165,37 @@ int main(int argc, char *argv[]) {
|
||||
const auto &code = countries[i];
|
||||
|
||||
for (int lvl : levels) {
|
||||
std::string filename =
|
||||
"boundary_" + code + "_" + std::to_string(lvl) + ".json";
|
||||
fs::path out_path = fs::path(cache_dir) / filename;
|
||||
for (int split_lvl : split_levels) {
|
||||
if (split_lvl > lvl) continue;
|
||||
|
||||
// Skip if cached
|
||||
if (!force && file_exists(out_path)) {
|
||||
++skipped;
|
||||
continue;
|
||||
}
|
||||
|
||||
try {
|
||||
// 1. Read features from GeoPackage
|
||||
auto t0 = std::chrono::high_resolution_clock::now();
|
||||
auto features = gpkg_reader::read_features(gpkg_path, code, lvl, tolerance);
|
||||
auto t1 = std::chrono::high_resolution_clock::now();
|
||||
if (features.empty()) {
|
||||
++skipped;
|
||||
continue;
|
||||
std::vector<std::string> regions;
|
||||
if (split_lvl == 0) {
|
||||
regions.push_back(code);
|
||||
} else {
|
||||
regions = gpkg_reader::get_subregions(gpkg_path, code, split_lvl);
|
||||
}
|
||||
|
||||
for (const auto& region_id : regions) {
|
||||
std::string filename =
|
||||
"boundary_" + region_id + "_" + std::to_string(lvl) + ".json";
|
||||
fs::path out_path = fs::path(cache_dir) / filename;
|
||||
|
||||
// Skip if cached
|
||||
if (!force && file_exists(out_path)) {
|
||||
++skipped;
|
||||
continue;
|
||||
}
|
||||
|
||||
try {
|
||||
// 1. Read features from GeoPackage
|
||||
auto t0 = std::chrono::high_resolution_clock::now();
|
||||
auto features = gpkg_reader::read_features(gpkg_path, region_id, lvl, tolerance);
|
||||
auto t1 = std::chrono::high_resolution_clock::now();
|
||||
if (features.empty()) {
|
||||
++skipped;
|
||||
continue;
|
||||
}
|
||||
|
||||
// 2. Merge geometries
|
||||
auto merged = geo_merge::merge(features);
|
||||
auto t2 = std::chrono::high_resolution_clock::now();
|
||||
@ -201,7 +216,7 @@ int main(int argc, char *argv[]) {
|
||||
props["NAME_" + std::to_string(lvl)] = feat.name;
|
||||
props["GID_" + std::to_string(lvl)] = feat.code;
|
||||
|
||||
int dotCount = std::count(feat.code.begin(), feat.code.end(), '.');
|
||||
int dotCount = static_cast<int>(std::count(feat.code.begin(), feat.code.end(), '.'));
|
||||
if (lvl == dotCount) {
|
||||
props["isOuter"] = true;
|
||||
}
|
||||
@ -261,9 +276,11 @@ int main(int argc, char *argv[]) {
|
||||
"write:" + std::to_string(ms_write) + "ms " +
|
||||
"Total:" + std::to_string(ms_total) + "ms");
|
||||
|
||||
} catch (const std::exception &e) {
|
||||
++errors;
|
||||
logger::error(code + " L" + std::to_string(lvl) + ": " + e.what());
|
||||
} catch (const std::exception &e) {
|
||||
++errors;
|
||||
logger::error(region_id + " L" + std::to_string(lvl) + ": " + e.what());
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
70
dist/gpkg-reader.js
vendored
70
dist/gpkg-reader.js
vendored
File diff suppressed because one or more lines are too long
BIN
dist/win-x64/boundaries.exe
vendored
BIN
dist/win-x64/boundaries.exe
vendored
Binary file not shown.
40
docs/geo.md
Normal file
40
docs/geo.md
Normal file
@ -0,0 +1,40 @@
|
||||
# Geospatial Architecture & Rendering Strategy
|
||||
|
||||
This document outlines the architectural decisions and trade-offs regarding how we process, store, and render GADM administrative boundaries in our web application using MapLibre GL JS.
|
||||
|
||||
## The Problem: High-Resolution Boundaries vs. Web Performance
|
||||
|
||||
GADM boundaries are highly accurate and meticulously detailed. A single raw boundary for a Region/Level 2 area (like a large province in Poland) can easily exceed 5MB in its raw GeoJSON format.
|
||||
|
||||
Sending 5MB+ payloads to a frontend map client to render a basic region outline is an architectural anti-pattern for several reasons:
|
||||
- **Bandwidth:** High download times break the fluidity of the app, especially on slower connections or mobile devices.
|
||||
- **Parsing Overhead:** Browsers struggle and block the main thread when parsing JSON payloads of this size.
|
||||
- **Rendering Lag:** Rendering hundreds of thousands of microscopic, tightly-packed vertices degrades panning and zooming performance because the GPU is forced to process unneeded geometry.
|
||||
|
||||
## Evaluation of Architectural Approaches
|
||||
|
||||
### 1. Geometry Simplification (Chosen Approach)
|
||||
Instead of serving 100% full-resolution geometries, we apply a geometry simplification algorithm (such as Douglas-Peucker via `OGR_G_SimplifyPreserveTopology` in GDAL) during the backend C++ pipeline step before the files are saved to the cache.
|
||||
|
||||
By using a small geographical tolerance (e.g., `0.001` to `0.005` degrees), we eliminate up to 90-95% of vertices that lie essentially on a straight line.
|
||||
- **Pros:** A 5MB file drops to ~100-200KB with zero visually perceptible difference at standard region-view zoom levels. It is extremely easy to cache and serve over a standard API without changing the frontend geometry-loading concepts.
|
||||
- **Cons:** Shared borders between different polygons can sometimes develop tiny gaps or slivers if simplified individually without a shared topological graph/mesh.
|
||||
|
||||
### 2. Mapbox Vector Tiles (MVT / Protocol Buffers)
|
||||
Instead of returning monolithic GeoJSON boundary files, the map client natively requests data in small 256x256 pixel binary tiles (`.mvt` or `.pbf`). The geometries within these tiles are aggressively simplified and clipped based on the user's current zoom level.
|
||||
- **Pros:** Phenomenal performance for rendering global, tremendously heavy datasets. Only data intersecting the user's viewport is loaded at any given time.
|
||||
- **Cons:** High infrastructure overhead. It requires running an active live tile server (like `pg_tileserv` or `martin`) or pre-generating massive pyramidal tile caches (e.g., using `tippecanoe`).
|
||||
|
||||
### 3. TopoJSON
|
||||
An extension of GeoJSON that encodes *topology* rather than discrete geometries. Shared borders between adjacent regions are only recorded once.
|
||||
- **Pros:** Can shrink file sizes by up to 80% compared to GeoJSON. Completely eliminates topological gaps when scaling/simplifying.
|
||||
- **Cons:** Requires the frontend to bundle and run `topojson-client` to decode the data back into GeoJSON on-the-fly before MapLibre can consume it natively.
|
||||
|
||||
## Conclusion & Current Direction
|
||||
|
||||
Given our specific usecase: **We are loading boundaries into MapLibre primarily to display outlines and population centers for *certain/selected* GADM regions, rather than rendering the entire administrative globe all at once.**
|
||||
|
||||
Because we are operating on an "on-demand, selected region" basis, setting up a full-blown Vector Tile infrastructure (Option 2) introduces unnecessary complexity.
|
||||
|
||||
**Our strategy is to leverage Option 1: Pre-cached, heavily simplified GeoJSON.**
|
||||
By applying a GDAL geometry simplification threshold and limiting coordinate precision down to 5 decimal places during the C++ build pipeline, we yield lightweight, highly performant payloads that MapLibre can ingest natively. This solves the file size bottleneck while preserving our simple and highly efficient file-based caching architecture.
|
||||
@ -27,6 +27,7 @@
|
||||
},
|
||||
"scripts": {
|
||||
"build": "tsc",
|
||||
"build:cpp": "cmake --build cpp/build --config Release",
|
||||
"dev": "tsc -w",
|
||||
"tests": "vitest run",
|
||||
"test:boundaries": "vitest run src/__tests__/boundary.test.ts",
|
||||
@ -37,6 +38,7 @@
|
||||
"boundaries": "npx tsx scripts/boundaries.ts",
|
||||
"boundaries:cpp": ".\\dist\\win-x64\\boundaries.exe",
|
||||
"boundaries:cpp:all": ".\\dist\\win-x64\\boundaries.exe --country=all",
|
||||
"boundaries:cpp:all:split": ".\\dist\\win-x64\\boundaries.exe --country=all --split-levels=1,2",
|
||||
"boundaries:cpp:linux": "./dist/linux-x64/boundaries",
|
||||
"boundaries:cpp:linux:all": "./dist/linux-x64/boundaries --country=all",
|
||||
"refresh": "npx tsx scripts/refresh-database.ts"
|
||||
@ -60,4 +62,4 @@
|
||||
"simplify-js": "1.2.4",
|
||||
"zod": "^4.3.6"
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -415,46 +415,63 @@ export async function getBoundaryFromGpkg(
|
||||
const gidLevel = dotCount;
|
||||
const resolvedLevel = contentLevel != null ? contentLevel : gidLevel;
|
||||
const countryCode = gadmId.split('.')[0];
|
||||
const cppFileName = `boundary_${countryCode}_${resolvedLevel}.json`;
|
||||
|
||||
// C++ outputs sub-region precise files too if batched with `--split-levels`.
|
||||
// We look for exact gadmId match first, then fall back to full country.
|
||||
const fallbackNames = [
|
||||
`boundary_${gadmId}_${resolvedLevel}.json`,
|
||||
`boundary_${countryCode}_${resolvedLevel}.json`
|
||||
];
|
||||
|
||||
// Remove duplicates if gadmId == countryCode
|
||||
const uniqueFallbackNames = [...new Set(fallbackNames)];
|
||||
|
||||
// Build a prefix for sub-region filtering:
|
||||
// 'DEU' (country-level) → no filter needed, return all features
|
||||
// 'DEU.2_1' → prefix 'DEU.2.' matches DEU.2.91_1, DEU.2.91.1_1, etc.
|
||||
// Build a prefix for sub-region filtering (used mostly for the full-country fallback)
|
||||
const isSubRegion = gadmId.includes('.');
|
||||
const gidPrefix = isSubRegion
|
||||
? gadmId.replace(/_\d+$/, '') + '.' // strip version suffix, add dot
|
||||
: null;
|
||||
|
||||
for (const dir of uniqueCacheDirs) {
|
||||
const cppFile = join(dir, cppFileName);
|
||||
if (existsSync(cppFile)) {
|
||||
try {
|
||||
const raw = JSON.parse(readFileSync(cppFile, 'utf-8'));
|
||||
if (raw.features && Array.isArray(raw.features)) {
|
||||
console.log(`Loading from CPP Cache File ${cppFile} : ${gidPrefix}`)
|
||||
for (const cppFileName of uniqueFallbackNames) {
|
||||
for (const dir of uniqueCacheDirs) {
|
||||
const cppFile = join(dir, cppFileName);
|
||||
if (existsSync(cppFile)) {
|
||||
try {
|
||||
const raw = JSON.parse(readFileSync(cppFile, 'utf-8'));
|
||||
if (raw.features && Array.isArray(raw.features)) {
|
||||
console.log(`[gpkg-reader] Loading from CPP Cache File ${cppFile} : ${gidPrefix || 'country-wide'}`)
|
||||
|
||||
let rawFeatures = raw.features;
|
||||
let rawFeatures = raw.features;
|
||||
|
||||
// Filter by GID prefix if we had to read the fallback country-wide file
|
||||
// (If we loaded the exact sub-region file, it's already perfectly chunked)
|
||||
if (gidPrefix && cppFileName.includes(countryCode) && !cppFileName.includes(gadmId)) {
|
||||
rawFeatures = rawFeatures.filter((f: any) =>
|
||||
f.code && f.code.startsWith(gidPrefix)
|
||||
);
|
||||
}
|
||||
|
||||
// Filter by GID prefix for sub-region queries
|
||||
if (gidPrefix) {
|
||||
rawFeatures = rawFeatures.filter((f: any) =>
|
||||
f.code && f.code.startsWith(gidPrefix)
|
||||
);
|
||||
if (rawFeatures.length === 0) continue; // no matches, try next dir
|
||||
}
|
||||
|
||||
const features: BoundaryFeature[] = rawFeatures.map((f: any) => {
|
||||
const { geometry, code, name, ...enrichment } = f;
|
||||
return {
|
||||
type: 'Feature' as const,
|
||||
properties: { name, code, ...enrichment },
|
||||
geometry,
|
||||
};
|
||||
});
|
||||
return { type: 'FeatureCollection', features };
|
||||
const features: BoundaryFeature[] = rawFeatures.map((f: any) => {
|
||||
const { geometry, code, name, ...enrichment } = f;
|
||||
return {
|
||||
type: 'Feature' as const,
|
||||
properties: { name, code, ...enrichment },
|
||||
geometry,
|
||||
};
|
||||
});
|
||||
|
||||
const result: BoundaryResult = { type: 'FeatureCollection', features };
|
||||
const outCacheFile = join(uniqueCacheDirs[1] || uniqueCacheDirs[0], `${cacheKey}.json`);
|
||||
try {
|
||||
writeFileSync(outCacheFile, JSON.stringify(result));
|
||||
} catch (e) { /* ignore */ }
|
||||
return result;
|
||||
}
|
||||
} catch (e) {
|
||||
console.warn(`[gpkg-reader] Failed to read C++ cache from ${cppFile}:`, e);
|
||||
}
|
||||
} catch (e) {
|
||||
console.warn(`[gpkg-reader] Failed to read C++ cache from ${cppFile}:`, e);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Loading…
Reference in New Issue
Block a user