docs:workers.

This commit is contained in:
lovebird 2026-03-24 22:23:13 +01:00
parent eb62d53173
commit 966cca2bf1
2 changed files with 459 additions and 0 deletions

459
docs/workers.md Normal file
View File

@ -0,0 +1,459 @@
# Running Products via Native Node.js Worker Threads
Moving heavy queues (like `ImagesProduct` crunching images via `sharp`, or `LocationsProduct` running grid searches) out of the main Event Loop is essential to preserve API performance and maintain a high Event Loop FPS.
We orchestrate this entirely within Node.js using the native `worker_threads` module, driven by a centralized JSON configuration. No PM2 dependency is required.
---
## Architecture: Config-Driven Worker Spawning
The application topology is defined in `server/config/products.json`. The main thread reads this file on boot. If a product has `"workers" > 0`, the main thread spawns dedicated native `Worker` threads to handle its `pg-boss` background jobs — while still registering the product's HTTP routes on the main thread.
### 1. The Configuration Format (`config/products.json`)
Each product entry specifies:
- **`name`** — maps to a key in `PRODUCT_IMPORTS` in `registry.ts`
- **`enabled`** — whether to load the product at all
- **`workers`** — how many native Worker threads to spawn (0 = run everything on the main thread)
- **`deps`** — informational dependency list
```json
{
"products": [
{ "name": "images", "enabled": true, "workers": 1, "deps": ["serving", "storage"] },
{ "name": "videos", "enabled": true, "workers": 0, "deps": ["serving", "storage"] },
{ "name": "locations", "enabled": true, "workers": 0, "deps": ["serving", "storage"] },
{ "name": "serving", "enabled": true, "workers": 0, "deps": ["images"] },
{ "name": "email", "enabled": true, "workers": 0, "deps": [] },
{ "name": "openai", "enabled": true, "workers": 0, "deps": [] },
{ "name": "analytics", "enabled": true, "workers": 0, "deps": [] },
{ "name": "storage", "enabled": true, "workers": 0, "deps": [] },
{ "name": "ecommerce", "enabled": true, "workers": 0, "deps": ["images"] },
{ "name": "contacts", "enabled": true, "workers": 0, "deps": [] },
{ "name": "campaigns", "enabled": true, "workers": 0, "deps": ["contacts"] },
{ "name": "mcp", "enabled": true, "workers": 0, "deps": ["serving"] }
]
}
```
### 2. Main Thread: The Orchestrator (`src/products/registry.ts`)
Boot-up is split into two phases:
**Phase 1 — `registerProductRoutes(app)`:** Reads `products.json`, lazy-imports only the enabled product modules via a `PRODUCT_IMPORTS` map (avoids importing everything on boot), instantiates them, and registers their HTTP routes on the Hono app.
**Phase 2 — `startProducts(boss)`:** For each product:
- If `workers > 0`, spawns native Worker threads (see §3).
- Always calls `product.start(boss)` on the main thread so the product can register pg-boss queue names and perform local init.
```typescript
// Lazy imports — only loaded when the product is enabled
const PRODUCT_IMPORTS: Record<string, () => Promise<any>> = {
'images': () => import('./images/index.js'),
'videos': () => import('./videos/index.js'),
'locations': () => import('./locations/index.js'),
// ... all 12 products
};
export const startProducts = async (boss?: any) => {
for (const product of instances) {
const pConfig = product.__config;
if (pConfig && pConfig.workers > 0) {
const isDev = process.env.NODE_ENV !== 'production';
// Dev: uses vite-node wrapper to load TS directly
// Prod: uses pre-bundled worker.cjs
let workerEntry = isDev
? path.resolve(process.cwd(), 'src', 'worker_wrapper.mjs')
: path.resolve(process.cwd(), 'worker.cjs');
for (let i = 0; i < pConfig.workers; i++) {
const worker = new Worker(workerEntry, {
workerData: { productName: pConfig.name, workerScript }
});
nativeWorkers.push({ id: product.id, worker });
// Forward EventBus events from worker → main thread
worker.on('message', (msg) => {
if (msg?.type === 'event' && msg.name) {
EventBus.emit(msg.name, msg.data);
}
});
}
}
// Main-thread init (HTTP deps, caching, boss queue creation)
await product.start(boss);
}
};
```
### 3. Worker Entrypoint (`src/worker.ts`)
When a Worker thread boots, `worker.ts` is loaded. It reads `workerData.productName`, instantiates the matching product class, and starts only its pg-boss consumers. It does **not** start an HTTP server.
Key responsibilities:
- **PG-Boss queue consumers** — the product's `onStart(boss)` registers workers for its queues.
- **IPC health checks** — responds to `{ type: 'ping' }` messages with `{ type: 'pong', activeJobs, ... }`.
- **IPC job dispatch** — handles `{ type: 'job' }` messages for synchronous request-response via `dispatchToWorker()`.
- **EventBus bridging** — forwards `job:progress`, `job:complete`, and `job:error` events to the parent thread via `parentPort.postMessage()`.
```typescript
// worker.ts (runs inside the Worker thread)
import { workerData, isMainThread, parentPort } from 'worker_threads';
if (isMainThread) throw new Error('Must run inside a Worker thread.');
const ProductClass = PRODUCT_CLASSES[workerData.productName];
const instance = new ProductClass();
// IPC: ping/pong + job dispatch
parentPort.on('message', async (msg) => {
if (msg.type === 'ping') return parentPort.postMessage({ type: 'pong', ... });
if (msg.type === 'job') { /* handleJob → postMessage result */ }
});
// Bridge internal events to parent thread
EventBus.on('job:progress', (data) => parentPort.postMessage({ type: 'event', name: 'job:progress', data }));
EventBus.on('job:complete', (data) => parentPort.postMessage({ type: 'event', name: 'job:complete', data }));
// Start isolated PG-Boss and bind the product
const workerBoss = await startBoss();
await instance.start(workerBoss);
```
### 4. Dev Mode: `worker_wrapper.mjs`
In dev, Worker threads can't inherit `tsx` hooks from the parent process. To support TypeScript directly, a plain `.mjs` bootstrap uses `vite-node`'s programmatic API to load and execute `worker.ts` with full TS resolution:
```javascript
// worker_wrapper.mjs
import { workerData } from 'node:worker_threads';
import { createServer } from 'vite';
import { ViteNodeRunner } from 'vite-node/client';
const server = await createServer({ /* hmr: false, @-alias setup */ });
const runner = new ViteNodeRunner({ root, base, fetchModule, resolveId });
await runner.executeFile(workerData.workerScript);
```
### 5. Smart Consumer Skipping
Products that support pg-boss workers (like `LocationsProduct`) use this pattern in `onStart()` to avoid double-consuming:
```typescript
async onStart(boss?: PgBoss) {
const { isMainThread } = await import('node:worker_threads');
const workersConfig = this.__config?.workers ?? 0;
const shouldConsume = !isMainThread || workersConfig === 0;
for (const WorkerClass of this.workers) {
const worker = new WorkerClass();
await boss.createQueue(worker.queueName);
if (shouldConsume) {
await boss.work(worker.queueName, options, worker.handler.bind(worker));
}
}
}
```
If the product is running with dedicated Worker threads (`workers > 0`), the main thread skips consuming from pg-boss queues — only the Worker threads will consume them.
### 6. IPC Job Dispatch (`src/commons/worker-ipc.ts`)
For synchronous request-response between the main thread and worker threads (e.g., image processing called from an HTTP handler), there is a utility:
```typescript
import { dispatchToWorker, hasWorker } from '@/commons/worker-ipc.js';
// Check if a live worker exists
if (await hasWorker('images')) {
const result = await dispatchToWorker('images', 'process_image', { buffer, ... }, [buffer]);
}
```
- Uses round-robin across multiple worker threads for the same product.
- Supports zero-copy `ArrayBuffer` transfers via the `transferList` parameter.
- Has a configurable timeout (default 30s).
---
## Base Classes
### `AbstractProduct` (`src/products/AbstractProduct.ts`)
All products extend this. Provides:
- `start(boss)` / `stop()` lifecycle hooks
- `handleJob(action, msg)` — for IPC job dispatch from worker threads
- `handleStream()` — SSE streaming helper with cache-checking
- `generateHash()` — deterministic deep-sorted SHA-256 hashing
### `AbstractWorker` (`src/jobs/boss/AbstractWorker.ts`)
PG-Boss queue consumers extend this. Provides:
- `queueName` — the pg-boss queue to consume
- `process(job)` — override with business logic
- `calculateCost(job, result)` — usage metering
- `handler()` — wraps `process()` with error handling and emits `job:complete` / `job:failed`
Worker classes use the `@Worker(queueName)` decorator for registration.
---
## Case Study: `ImagesProduct` — The Canonical Worker-Offloaded Product
`ImagesProduct` (`src/products/images/index.ts`) is currently the **only product running with `workers: 1`** in production. It demonstrates the full IPC lifecycle — from HTTP request through worker dispatch to cached response. It does **not** use `AbstractWorker` or pg-boss queues; instead, it uses the synchronous IPC dispatch pattern via `worker-ipc.ts`.
### The Hybrid Pattern: `hasWorker` + Inline Fallback
Every image processing path checks whether a live worker thread exists. If yes, the heavy `sharp` work is offloaded. If no (e.g., during tests, or if `workers: 0` in config), it falls back to inline processing on the main thread:
```typescript
// src/products/images/index.ts — _ensureCachedImage()
if (await hasWorker('images')) {
// Zero-copy transfer: copy Buffer into a transferable ArrayBuffer
const arrayBuffer = new ArrayBuffer(inputBuffer.length);
new Uint8Array(arrayBuffer).set(inputBuffer);
await dispatchToWorker('images', 'process_image', {
buffer: arrayBuffer, width, height, format, fit
}, [arrayBuffer]); // ← transfer list: moves memory, doesn't clone
} else {
// Inline fallback (same thread)
const pipeline = sharp(inputBuffer).resize({ width, height, fit }).toFormat(format);
await fs.writeFile(filepath, await pipeline.toBuffer());
}
```
This pattern is used in three HTTP handlers:
- **`handlePostImage`** — file upload → resize → cache (or forward to Supabase Storage)
- **`handleRenderImage`** — URL → fetch → resize → serve as binary (used by lazy srcset URLs)
- **`handlePostResponsive`** / **`handleGetResponsive`** — generate multi-format, multi-size srcset variants
### Worker-Side: `handleJob()` Actions
Inside the worker thread, the `ImagesProduct` instance receives IPC job messages and routes them by `action`:
```typescript
// src/products/images/index.ts — handleJob()
async handleJob(action: string, msg: any): Promise<any> {
if (action === 'process_image') {
// Reconstruct Buffer from transferred ArrayBuffer
const inputBuffer = Buffer.from(msg.buffer);
await this.performProcessImage(inputBuffer, filepath, { width, height, format, fit });
return { filename };
}
if (action === 'render_image') {
// Supports square crop, contain fit, etc.
await this.performRenderImage(inputBuffer, filepath, { width, height, format, square, contain });
return { filename };
}
return super.handleJob(action, msg); // Throws for unknown actions
}
```
Both actions write the processed image to the shared `cache/` directory on disk. The main thread then reads the file to serve or forward the response.
### The Responsive Image Pipeline
The responsive endpoint generates multiple width × format variants (e.g., `[180, 640, 1024, 2048] × [avif, webp]`). It splits work between **eager** and **lazy** generation:
| Variant Width | Strategy | What Happens |
|--------------|----------|--------------|
| ≤ 600px | **Eager** | Processed immediately (via worker or inline) and cached to disk. Returns direct cache URL. |
| > 600px | **Lazy** | Returns a dynamic `/api/images/render?url=...&width=...&format=...` URL. Processed on-demand when the browser requests it. |
This avoids eagerly generating large, rarely-used variants for every upload while ensuring small thumbnails are always instant.
### Request Coalescing
When multiple concurrent requests reference the same source URL, `fetchImageCoalesced()` deduplicates them using an in-flight `Map<string, Promise<Buffer>>`. Only one HTTP fetch goes out; all callers share the same Promise.
### Data Flow Summary
```
HTTP Request (main thread)
→ hasWorker('images')? ──yes──→ dispatchToWorker()
│ │
│ ├─ postMessage({ type:'job', action:'render_image', buffer }, [buffer])
│ │ ↓ (zero-copy ArrayBuffer transfer)
│ │ Worker Thread: handleJob('render_image', msg)
│ │ ↓
│ │ sharp(buffer).resize().toFormat().toFile(filepath)
│ │ ↓ (streams directly to disk)
│ └─ postMessage({ type:'job_result', result: { filename } })
│ ↓
│ main thread: fs.readFile(cache/hash.format)
│ ↓
│ return c.redirect() or c.body()
└──no──→ Inline: sharp().resize().toFile(filepath) → serve
```
---
## Why this Pattern is Powerful
1. **Zero PM2 Dependency:** Entirely native to Node.js. Containerization, Nexe builds — nothing changes.
2. **True Multi-Core Utilization:** `worker_threads` run on distinct OS threads. Setting `workers: 2` for `images` dedicates two CPU cores to Sharp.
3. **API Immunity:** Workers have their own V8 heap and Event Loop. A massive image resize will have zero impact on the main API's Event Loop FPS.
4. **EventBus Bridging:** Worker events (progress, completion) are forwarded to the main thread via IPC `postMessage`, enabling real-time SSE streams to API clients.
5. **Dev/Prod Parity:** The `worker_wrapper.mjs` + vite-node setup means TypeScript runs natively in dev worker threads, while production uses pre-bundled JS — same behavior in both environments.
6. **Round-Robin Dispatch:** The `worker-ipc.ts` utility distributes synchronous job requests across multiple threads, enabling true horizontal scaling within a single process.
---
## Constraints & Gotchas (Lessons from Inngest + Our Benchmarks)
Node.js worker threads have real constraints that Go/Rust/Python developers would never expect. The [Inngest post on worker threads](https://www.inngest.com/blog/node-worker-threads) formalizes these well. Here's how each constraint applies to **our** architecture:
### 1. Workers Are NOT Lightweight
Each worker thread is a **full V8 isolate** — its own heap, its own event loop. ~10 MB memory overhead per worker, with tens-of-milliseconds startup cost. This is why our `products.json` caps workers at 1-2 per product, and workers are spawned **once at boot** and persist for the process lifetime. We never create/destroy workers per-job.
### 2. You Can't Pass Logic — Only Messages
Unlike Go goroutines or Rust threads, you can't pass a function to `new Worker()`. The structured clone algorithm can't serialize functions. This is why:
- Our `EventBus` listeners live on the **main thread** — worker threads post `{ type: 'event' }` messages that get bridged to the main-thread EventBus
- Pino `logger` instances can't cross the boundary — worker threads use their own logger
- `pg-boss` connections are per-thread — each worker establishes its own
### 3. Bundler Discovery Is Fragile
Bundlers (webpack) can't statically analyze `new Worker(path)`. Our approach:
- **Dev:** `worker_wrapper.mjs` uses vite-node's `ViteNodeRunner` to resolve TypeScript at runtime
- **Prod:** `build.sh` compiles `worker.ts``worker.cjs` as a separate webpack entry point, and the registry uses `__dirname + '/worker.cjs'` — a plain string the bundler can't trace
Both paths are hardcoded and tested — no dynamic path construction that could break silently.
### 4. Dev-Mode vite-node Overhead (CRITICAL)
Benchmarked 2024-03-24, same 386KB JPEG source at 800px webp:
| Path | Encode Time | Notes |
|------|-------------|-------|
| Worker thread (vite-node) | **3:265** (3.26s) | IPC + vite-node module transform overhead |
| Main thread (inline) | **0:140** (140ms) | Direct sharp call, no IPC |
**~23× slower in dev mode via worker thread.** The vite-node `ViteNodeRunner` inside the worker's V8 isolate adds massive overhead for module resolution and transformation. Sharp itself (native C++ addon) runs at the same speed — the cost is entirely in the JS wrapper.
In **production** with pre-bundled `worker.cjs`, the worker thread runs at near-native speed. The overhead is a **dev-only artifact**.
> **Practical implication:** Consider setting `"workers": 0` for `images` during local development to avoid the vite-node penalty. The main thread handles 140ms encodes without impacting dev-server responsiveness.
### 5. No Respawning (Current Gap)
Inngest implements exponential backoff respawning — if a worker thread crashes (unhandled exception, OOM), the main thread detects the `exit` event and spins up a replacement with increasing delay.
**We don't do this yet.** If a worker thread dies, it's gone until a full server restart. The `registry.ts` spawner doesn't watch for `exit` events. This is acceptable for now because:
- Workers are simple (sharp pipeline, no external connections beyond pg-boss)
- Crashes are rare in production
- The inline fallback (`hasWorker() === false`) means the main thread picks up the work
But for robustness, adding respawn-with-backoff to the worker spawner in `registry.ts` would be a good future improvement.
### 6. Elastic Autospawning & Tier-Based Limits (Grid Searches)
Monolithic jobs that process tens of thousands of items (e.g., massive Grid Searches) expose a flaw in static worker pools: **head-of-line blocking**. If all workers are occupied by a massive Enterprise search, Free/Pro users starve.
To solve this we use an **Elastic Autospawn / Fan-Out Architecture**:
1. **Fan-Out (Map-Reduce):** Instead of processing 10,000 grid cells in a single Node.js worker loop, an *Orchestrator* job enumerates the area and splits it into 10,000 individual `gridsearch-cell` jobs pushed to PG-Boss.
2. **Tier-Based Queue Routing/Throttling:** We use PG-Boss `singletonKey` (tied to `userId`) and tier-based concurrency limits (e.g., `teamConcurrency: 5` for Pro vs `20` for Enterprise) to ensure fairness at the database queue level.
3. **Distributed SSE (Pub/Sub):** Because micro-jobs fan out across multiple elastic workers, tying SSE to a local `EventBus` via `parentPort` fails. Instead, workers emit progress via **Postgres `NOTIFY`** or Supabase Realtime channels. The main API process (handling the SSE route) uses `LISTEN` to receive events from any worker on any machine, bridging them back to the user's HTTP stream.
---
## Exploring Native (Rust/C++) Replacements
Given the constraints of V8 Isolates (10MB overhead, slow startup, lack of shared memory serialization), a viable future replacement for CPU-bound or massively concurrent products (like `images` or `locations` grid searches) is replacing Node.js `worker_threads` with **Per-Product Rust or C++ implementations (Binaries or N-API)**.
If a Native (Rust/C++) worker is implemented:
- **Fast Autospawn:** Native binaries spawn in under 1ms. If compiled as an N-API native module (via `napi-rs` or `node-addon-api` for C++), worker execution is effectively instantaneous function calls avoiding V8 Isolate boot.
- **IPC Performance:**
- Subprocesses communicating via raw UNIX socket or `stdout` streams provide near-native memory transfer without structured-clone serialization bounds.
- N-API bindings allow direct zero-copy memory (SharedArrayBuffer) access between the main Thread JavaScript and native execution.
- **Memory Efficiency:** A single Native concurrency pipeline scaling to 10,000 asynchronous grid cells uses a fraction of the RAM of dozens of isolated Node.js context engines.
### Side-By-Side Comparison
| Feature | Node.js `worker_threads` | Rust (N-API / Subprocess) | C++ (N-API / Subprocess) |
| :--- | :--- | :--- | :--- |
| **Startup Time** | ~30-50ms (V8 Isolate boot) | **<1ms** (Native / Binary spawn) | **<1ms** (Native / Binary spawn) |
| **Memory per Instance** | High (~10-30MB baseline) | **Minimal** (<2MB) | **Minimal** (<2MB) |
| **IPC Performance** | Slow (`postMessage` Structured Clone) | **High** (Zero-Copy SharedArrayBuffer or MsgPack UDS) | **High** (Zero-Copy SharedArrayBuffer or MsgPack UDS) |
| **Autospawning** | Poor (Spiking spawns causes OOM) | **Excellent** | **Excellent** |
| **Development Speed** | Fastest | Slower (Strict compiler, borrow checker) | Slower (Manual compilation, header management) |
| **Memory Safety** | High (V8 Engine) | **High** (Compiler-enforced lifetimes) | Lower (Prone to segfaults / memory leaks) |
| **Ecosystem (Parallelism)** | Limited (libuv threadpool) | **Best-in-class** (Tokio, Rayon) | Strong (std::thread, Boost) |
---
## 7. Messaging: Internal & External Workers (Protobuf)
When moving to an Elastic Autospawn architecture with Native workers, the serialization format and communication transport become the most crucial factors for performance and system integrity.
### Why Protobuf?
While MessagePack over Unix Domain Sockets works, **Protocol Buffers (Protobuf)** offers several distinct advantages, especially when scaling from "Internal Subprocesses" to "External Distributed Workers":
1. **Strict Type Contracts:** Both Node.js (TypeScript) and Native (Rust/C++) share the exact same `.proto` schema. If a payload field is required, the compiler ensures it exists. If the Node.js API changes a field structure, the Native worker fails to compile, preventing silent production parsing errors.
2. **Backwards Compatibility:** Protobuf is inherently designed for evolving APIs without breaking older workers.
3. **RPC Native (gRPC):** As we expand from *Internal Workers* on the same machine to *External Workers* on entirely different physical servers, Protobuf naturally upgrades into gRPC with zero serialization changes.
### The "Dual Model" Architecture
The beauty of standardizing on Protobuf is that the *exact same serialization code* is used regardless of where the worker lives.
#### 1. Internal Workers (Local IPC via Subprocesses)
- **The Scenario:** The main Node.js API process spawns a native Rust/C++ executable as a child process on the **same machine**.
- **The Transport:** Unix Domain Sockets (UDS) / Named Pipes or Standard I/O (stdio). UDS is preferred because it's full-duplex and avoids Node's `stdout` buffering constraints.
- **How it works:**
1. Node.js encodes the `JobPayload` message using the compiled `protobufjs` TypeScript library.
2. Node.js writes the binary payload to the local UNIX Domain Socket (e.g., `/tmp/worker_grid_123.sock`). Because UDS is a TCP-like stream, payloads must be **length-prefixed** (e.g., 4 bytes for length, followed by the Protobuf bytes) so the receiver knows when the message ends.
3. The Rust/C++ subprocess reads the length prefix, reads the exact byte count, and uses `prost` (Rust) or the Google Protobuf C++ library to deserialize instantly.
4. The worker executes the CPU-heavy logic, serializes the `JobResult`, prefixes the length, and streams it back.
#### 2. External Workers (Distributed Execution)
- **The Scenario:** Fanning out 10,000 Grid Search cells across dozens of physical worker nodes to prevent local CPU exhaustion.
- **The Transport:** Pg-Boss / Postgres (or gRPC).
- **How it works:**
1. **The Queue:** The main Node.js process encodes the job payload via Protobuf and saves the raw bytes (or Base64-encoded bytes) into the `pgboss.job` table.
2. **The Fleet:** Hundreds of external Rust/C++ worker nodes connect directly to the database layer (or via a gRPC interface) pulling jobs.
3. **The Decoding:** The remote execution node pulls the binary payload and deserializes the Protobuf bytes. Since the schema is strict, all external workers instantly understand the payload, ensuring perfect schema synchronization across the heterogeneous distributed fleet.
---
## 8. Storage & Database Integrations for Native Workers
Transitioning to Native Autospawning workers heavily impacts how the database and storage layers scale, specifically around connection pooling, payload limits, and blob storage.
### Connection Limits (Supavisor)
If 5,000 autospawned native processes all open distinct `libpq` connections to Postgres, the database will instantly lock up with `FATAL: too many clients`.
**The Rule:** All native workers (whether internal executables or external nodes) *must* connect to Postgres via a connection pooler like **Supavisor** or **PgBouncer**, which transparently multiplexes thousands of transient client connections onto a handful of persistent database connections.
### Event Bus Limits (Postgres NOTIFY)
As established, we use `LISTEN / NOTIFY` to bridge Server-Sent Events (SSE) from the Native workers back to the Node.js API stream.
**The Constraint:** Postgres `NOTIFY` string payloads are hard-limited to **8000 bytes**. You cannot emit massive JSON/Protobuf result arrays over `NOTIFY`. It must only contain progression percentages or tiny metadata.
### Returning Artifacts & Large Results
When a Native worker finishes crunching data, it needs to save the result.
1. **Small Results (JSON/Protobuf < 1MB):**
- The native worker calls the `pg-boss.complete(jobId, protobuf_bytes)` equivalent, storing the payload back in the `pgboss.job` table.
2. **Tabular Results (Big Data):**
- e.g., 50,000 scraped locations from a massive grid cell. The native worker uses the incredibly fast SQL `COPY` command (bulk insert) to slam the data directly into a dedicated Postgres table (e.g., `places`), and completes the `pg-boss` job with an empty payload.
3. **Huge Blobs (Images / Videos / AI Models):**
- The native worker *does not touch Postgres for blobs*. The Node API orchestrator pre-signs a **Supabase Storage Upload URL** and embeds it in the job payload. The Native worker generates the 50MB file and streams it via `libcurl` directly to S3/Supabase Storage, completely bypassing the database stack.
---
## 9. Next-level Abstracting: Embedded Scripting (Lua/WASM)
While writing the *infrastructure layer* (UDS reading, Protobuf decoding, Postgres connection pooling) in strictly-typed Native code (Rust/C++) is essential for performance, writing volatile *business logic* (like search heuristics) in C++ hurts developer velocity and requires constant recompilations.
To solve this we use the **Native Host + Embedded Scripting** pattern:
1. **The Architecture:** We compile a standalone Native Executable (the "Host") in Rust or C++. This host statically embeds a lightweight scripting engine (like **LuaJIT** or a **WASM** runtime like Wasmtime).
2. **Execution:** The Native Host safely handles all the heavy lifting—reading Unix Domain Sockets, managing DB connections, and parsing Protobuf. Once the payload is ready, it passes it into the embedded Lua state or WASM function instance.
3. **The Benefit:** Developers write the actual product logic in high-level Lua (or AssemblyScript for WASM). It executes wildly faster than Node.js (LuaJIT approaches raw C speed) while maintaining the tiny `<2MB` memory footprint, but allows for instant hot-reloading of the scripts without ever running a C++ compiler.