mono/packages/ui/docs/locations/gridsearch.md
2026-03-21 20:18:25 +01:00

6.4 KiB

Grid Search / Regional Scanning Documentation

Overview

The Grid Search (or Regional Scanning) feature automates the discovery of leads across large, irregular geographic areas (e.g., entire cities, provinces, or countries). Instead of manual point searches, users select a defined administrative region, and the system intelligently decomposes it into a grid of optimal search points.

This functionality relies on a microservice architecture where GADM (Global Administrative Areas) data provides high-fidelity GeoJSON boundaries for exclusion/inclusion logic.


Conceptual Architecture

1. Region Selection (Client)

The user select a target region (e.g., "Île-de-France, France"). The client fetches the corresponding boundary polygon from the GADM microservice (Admin Level 1/2).

2. Grid Decomposition (Server/Client)

The system calculates a "Search Grid" overlaying the target polygon.

  • Viewport Normalization: A single API search at Zoom Level 15 covers roughly a 2-5km radius.
  • Bounding Box: A rectangular grid is generated covering the polygon's extents.
  • Point-in-Polygon Filtering: Grid centers falling outside the actual administrative boundary (e.g., ocean, neighboring states) are discarded using spatial analysis libraries (e.g., Turf.js).

3. Campaign Orchestration (Server)

The resulting set of valid coordinates (e.g., 450 points) is submitted as a "Scan Campaign".

  • Batching: The server does NOT run 450 searches instantly. It uses PgBoss to queue them as individual jobs.
  • Concurrency: Jobs are processed with strict rate-limiting to respect SerpAPI quotas.
  • Deduplication: Results from overlapping grid circles are merged by place_id.

Workflow Implementation

Step 1: User Selects Region

User interactions with the new "Region Search" UI:

  1. Search: "California"
  2. Dropdown: Selects "California, USA (State/Province)"
  3. Preview: Map validates the polygon overlay.

Step 2: Grid Generation Status

Pre-flight check displayed to user:

  • Total Area: 423,970 km²
  • Grid Density: High (Zoom 15)
  • Estimated Points: ~8,500 scans (Warn: Expensive!)
  • Cost: 8,500 Credits
  • Action: "Confirm & Start Campaign"

Step 3: Campaign Execution

Server receives payload:

{
  "regionId": "USA.5_1", 
  "query": "Plumbers",
  "gridConfig": { "zoom": 15, "overlap": 0.2 }
}

Server decomposes to jobs [Job_1, Job_2, ... Job_8500].

Step 4: Live Updates

The existing SSE stream (stream-sse) adapts to listen for Campaign Events, updating a global progress bar:

  • "Scanned 120/8500 sectors..."
  • "Found 45 new leads..."

Implementation TODO List

Server-Side (test/server)

  • GADM Integration Endpoint:
    • Create route GET /api/regions/search?q={name} to proxy requests to the GADM microservice or query local PostGIS.
    • Create route GET /api/regions/boundary/{gadm_id} to retrieve full GeoJSON.
    • Create route GET /api/regions/names?admin={code} to fetch sub-region names.
  • Grid Logic:
    • Install @turf/turf for geospatial operations.
    • Implement generateGrid(boundaryFeature, zoomLevel) function:
      • Calculate bbox.
      • Generate point grid.
      • Filter pointsWithinPolygon.
  • Campaign Manager:
    • Create CampaignsProduct or extend LocationsProduct.
    • New Job Type: REGION_SCAN_PARENT (decomposes into child jobs).
    • New Job Type: REGION_SCAN_CHILD (actual search).
  • Job Queue Optimization:
    • Ensure PgBoss allows huge batch insertions (thousands of jobs).
    • Implement "Campaign Cancellation" (kill switch for all child jobs).

Client-Side (test/client)

  • Region Picker UI:
    • New Autocomplete component fetching from /api/regions/search.
  • Map Visualization:
    • Render the GeoJSON Polygon on MapLibre.
    • Render the calculated Point grid pre-flight (allow user to manually deselect points?).
  • Campaign Dashboard:
    • New View: "Active Scans".
    • Progress bars per campaign.
    • "Pause/Resume" controls.
  • Result Merging:
    • Ensure the client DataGrid can handle streaming results effectively from potentially thousands of searches (Virtualization required).

Existing Endpoint Reference

(Ref. src/products/locations/index.ts)

The current LocationsProduct is well-poised to be the parent of this logic.

  • handleStreamGet: Can be adapted to accept a campaignId instead of a single location.
  • handleStreamEmail: Shows the pattern for batch processing (accepting arrays of IDs). We can replicate this "Scatter-Gather" pattern for the Region Scan.

Proposed GeoJSON Microservice Interface

We assume the existence of an internal service (or creating a dedicated module) exposing:

  • GET /gadm/v1/search?text=... -> Returns lightweight metadata (ID, Name, Level).
  • GET /gadm/v1/feature/{id} -> Returns heavy GeoJSON Geometry.

4. Potential Data Enrichments

To increase the value of harvested locations, the following layers can be overlaid or merged with the search results:

Demographics & Population

  • WorldPop: High-resolution raster data for estimating the catchment population of a specific business location.

  • Census Data: (US Census / Eurostat) Admin-level statistics on income, age, and household size to score "Market Viability".

Firmographics & Business Intel

  • OpenCorporates: Verify legal entity status and official registration dates.

  • LinkedIn Organization API: Enrich with employee count, industry tags, and recent growth signals.

  • Clearbit / Apollo.io: Deep profile matching to find technographics (what software they use) and key decision-maker contacts.

Environmental & Infrastructure

  • OpenStreetMap (OSM): Calculate "Footfall Potential" by analyzing proximity to transit hubs, parking, and density of other retail POIs.

  • WalkScore / TransitScore: Rate the accessibility of consumer-facing businesses.

Industry Specifics

  • TripAdvisor / Yelp: Cross-reference hospitality ratings to find discrepancies or opportunities (e.g., highly rated on Google, poorly rated on Yelp).

  • Plastics Industry Databases: (Specific to Polymech) Cross-referencing registered recyclers lists provided by regional environmental agencies.

https://pygadm.readthedocs.io/en/latest/usage.html