6.4 KiB
Grid Search / Regional Scanning Documentation
Overview
The Grid Search (or Regional Scanning) feature automates the discovery of leads across large, irregular geographic areas (e.g., entire cities, provinces, or countries). Instead of manual point searches, users select a defined administrative region, and the system intelligently decomposes it into a grid of optimal search points.
This functionality relies on a microservice architecture where GADM (Global Administrative Areas) data provides high-fidelity GeoJSON boundaries for exclusion/inclusion logic.
Conceptual Architecture
1. Region Selection (Client)
The user select a target region (e.g., "Île-de-France, France"). The client fetches the corresponding boundary polygon from the GADM microservice (Admin Level 1/2).
2. Grid Decomposition (Server/Client)
The system calculates a "Search Grid" overlaying the target polygon.
- Viewport Normalization: A single API search at Zoom Level 15 covers roughly a 2-5km radius.
- Bounding Box: A rectangular grid is generated covering the polygon's extents.
- Point-in-Polygon Filtering: Grid centers falling outside the actual administrative boundary (e.g., ocean, neighboring states) are discarded using spatial analysis libraries (e.g.,
Turf.js).
3. Campaign Orchestration (Server)
The resulting set of valid coordinates (e.g., 450 points) is submitted as a "Scan Campaign".
- Batching: The server does NOT run 450 searches instantly. It uses
PgBossto queue them as individual jobs. - Concurrency: Jobs are processed with strict rate-limiting to respect SerpAPI quotas.
- Deduplication: Results from overlapping grid circles are merged by
place_id.
Workflow Implementation
Step 1: User Selects Region
User interactions with the new "Region Search" UI:
- Search: "California"
- Dropdown: Selects "California, USA (State/Province)"
- Preview: Map validates the polygon overlay.
Step 2: Grid Generation Status
Pre-flight check displayed to user:
- Total Area: 423,970 km²
- Grid Density: High (Zoom 15)
- Estimated Points: ~8,500 scans (Warn: Expensive!)
- Cost: 8,500 Credits
- Action: "Confirm & Start Campaign"
Step 3: Campaign Execution
Server receives payload:
{
"regionId": "USA.5_1",
"query": "Plumbers",
"gridConfig": { "zoom": 15, "overlap": 0.2 }
}
Server decomposes to jobs [Job_1, Job_2, ... Job_8500].
Step 4: Live Updates
The existing SSE stream (stream-sse) adapts to listen for Campaign Events, updating a global progress bar:
- "Scanned 120/8500 sectors..."
- "Found 45 new leads..."
Implementation TODO List
Server-Side (test/server)
- GADM Integration Endpoint:
- Create route
GET /api/regions/search?q={name}to proxy requests to the GADM microservice or query local PostGIS. - Create route
GET /api/regions/boundary/{gadm_id}to retrieve full GeoJSON. - Create route
GET /api/regions/names?admin={code}to fetch sub-region names.
- Create route
- Grid Logic:
- Install
@turf/turffor geospatial operations. - Implement
generateGrid(boundaryFeature, zoomLevel)function:- Calculate
bbox. - Generate point grid.
- Filter
pointsWithinPolygon.
- Calculate
- Install
- Campaign Manager:
- Create
CampaignsProductor extendLocationsProduct. - New Job Type:
REGION_SCAN_PARENT(decomposes into child jobs). - New Job Type:
REGION_SCAN_CHILD(actual search).
- Create
- Job Queue Optimization:
- Ensure
PgBossallows huge batch insertions (thousands of jobs). - Implement "Campaign Cancellation" (kill switch for all child jobs).
- Ensure
Client-Side (test/client)
- Region Picker UI:
- New Autocomplete component fetching from
/api/regions/search.
- New Autocomplete component fetching from
- Map Visualization:
- Render the GeoJSON
Polygonon MapLibre. - Render the calculated
Pointgrid pre-flight (allow user to manually deselect points?).
- Render the GeoJSON
- Campaign Dashboard:
- New View: "Active Scans".
- Progress bars per campaign.
- "Pause/Resume" controls.
- Result Merging:
- Ensure the client DataGrid can handle streaming results effectively from potentially thousands of searches (Virtualization required).
Existing Endpoint Reference
(Ref. src/products/locations/index.ts)
The current LocationsProduct is well-poised to be the parent of this logic.
handleStreamGet: Can be adapted to accept acampaignIdinstead of a singlelocation.handleStreamEmail: Shows the pattern for batch processing (accepting arrays of IDs). We can replicate this "Scatter-Gather" pattern for the Region Scan.
Proposed GeoJSON Microservice Interface
We assume the existence of an internal service (or creating a dedicated module) exposing:
GET /gadm/v1/search?text=...-> Returns lightweight metadata (ID, Name, Level).GET /gadm/v1/feature/{id}-> Returns heavy GeoJSON Geometry.
4. Potential Data Enrichments
To increase the value of harvested locations, the following layers can be overlaid or merged with the search results:
Demographics & Population
-
WorldPop: High-resolution raster data for estimating the catchment population of a specific business location.
-
Census Data: (US Census / Eurostat) Admin-level statistics on income, age, and household size to score "Market Viability".
Firmographics & Business Intel
-
OpenCorporates: Verify legal entity status and official registration dates.
-
LinkedIn Organization API: Enrich with employee count, industry tags, and recent growth signals.
-
Clearbit / Apollo.io: Deep profile matching to find technographics (what software they use) and key decision-maker contacts.
Environmental & Infrastructure
-
OpenStreetMap (OSM): Calculate "Footfall Potential" by analyzing proximity to transit hubs, parking, and density of other retail POIs.
-
WalkScore / TransitScore: Rate the accessibility of consumer-facing businesses.
Industry Specifics
-
TripAdvisor / Yelp: Cross-reference hospitality ratings to find discrepancies or opportunities (e.g., highly rated on Google, poorly rated on Yelp).
-
Plastics Industry Databases: (Specific to Polymech) Cross-referencing registered recyclers lists provided by regional environmental agencies.