Image Generation Architecture — Platform v5

This document captures the shape of the refreshed multiplatform image generation plan that we will break down into actionable tasks next. It keeps the current CLI + desktop flow, layers in mobile (Android/iOS) expectations, and sketches a browser/web-app path with configurable endpoints.

1. CLI Desktop (Current Flow)

Ownership: src/commands/images.ts remains the orchestration point; it spawns the packaged Tauri desktop binary and handles filesystem writes.
IPC Contract: JSON payloads over stdin/stdout between the CLI and Tauri. The CLI continues to push resolved prompts, destination paths, API key, and included files.
Image Ops: Google Generative AI integration stays in Node-land (createImage, editImage) with @polymech/fs helpers for persistence.

// CLI-side launch (simplified excerpt)
const tauriProcess = spawn(getGuiAppPath(), args, { stdio: ['pipe', 'pipe', 'pipe'] });
tauriProcess.stdin?.write(JSON.stringify({
  cmd: 'forward_config_to_frontend',
  prompt: argv.prompt,
  dst: argv.dst,
  apiKey: apiKey,
  files: absoluteIncludes,
}) + '\n');

Libraries: existing stack (@polymech packages, tslog, Node core modules). No new work required beyond polish/bugfix.

2. Android / iOS — Standalone Tauri

Desktop spawning is not available on mobile; the GUI ships as the full application. We lean on the TypeScript layer plus Tauri’s HTTP plugin to hit Google’s endpoints without wiring Rust-side HTTP clients.

Requirements

Bundle @tauri-apps/plugin-http, @tauri-apps/plugin-os, @tauri-apps/plugin-fs.
Rely on the existing tauriApi.fetch abstraction so we do not unwrap the plugin everywhere.
Persist lightweight state (prompt history, cached API key) in app data dir just like desktop.

Example TypeScript Mobile Client

// gui/tauri-app/src/lib/mobileClient.ts
import { tauriApi } from './tauriApi';

const GOOGLE_BASE = 'https://generativelanguage.googleapis.com/v1beta';

export async function mobileCreateImage(prompt: string, apiKey: string, model = 'gemini-2.5-flash-image-preview') {
  const response = await tauriApi.fetch(`${GOOGLE_BASE}/models/${model}:generateContent`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      Authorization: `Bearer ${apiKey}`,
    },
    body: JSON.stringify({ contents: [{ parts: [{ text: prompt }] }] }),
  });

  const data = await response.json();
  const inline = data.candidates?.[0]?.content?.parts?.find((part: any) => part.inlineData)?.inlineData;
  if (!inline?.data) throw new Error('No image data in Gemini response');
  return Buffer.from(inline.data, 'base64');
}

Configuration Notes

tauri.conf.json must whitelist https://generativelanguage.googleapis.com/** inside the HTTP plugin scope and CSP connect-src.
Add platform detection inside the React/Svelte front-end to toggle mobile-first UX and storage paths.

Libraries: @tauri-apps/plugin-http, @tauri-apps/api, @google/generative-ai (optional; the REST fetch example above avoids it if desired), existing UI stack.

3. Web App — Browser, Configurable Endpoints

Constraints (CORS, secret handling) require a server-side companion and a client that can be pointed at custom endpoints per user/tenant. The browser front-end holds no secrets; all API keys live server-side.

Backend Sketch (Hono)

// web/api/imageServer.ts
import { Hono } from 'hono';
import { cors } from 'hono/cors';
import { GoogleGenerativeAI } from '@google/generative-ai';

const app = new Hono();

app.use('/*', cors({
  origin: ['http://localhost:3000', 'https://your-frontend.example'],
  allowHeaders: ['Content-Type', 'Authorization'],
  allowMethods: ['POST', 'OPTIONS'],
}));

app.post('/api/images/create', async (c) => {
  const { prompt, apiKey, model = 'gemini-2.5-flash-image-preview' } = await c.req.json();
  const genAI = new GoogleGenerativeAI(apiKey);
  const modelClient = genAI.getGenerativeModel({ model });
  const result = await modelClient.generateContent(prompt);
  const inline = result.response.candidates?.[0]?.content?.parts?.find((part) => 'inlineData' in part)?.inlineData;
  if (!inline?.data) return c.json({ success: false, error: 'No image data' }, 500);
  return c.json({ success: true, image: inline });
});

export default app;

Browser Client Stub

// web/client/webImageClient.ts
export class WebImageClient {
  constructor(private endpoint: string) {}

  async createImage(prompt: string, apiKeyAlias: string) {
    const res = await fetch(`${this.endpoint}/api/images/create`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ prompt, apiKey: apiKeyAlias }),
    });
    if (!res.ok) throw new Error(`HTTP ${res.status}`);
    const data = await res.json();
    if (!data.success) throw new Error(data.error || 'Unknown backend error');
    return data.image; // caller decides how to render Blob/Base64
  }
}

Configuration Extension

Expand shared config schema with a web.apiEndpoint block and optional per-user overrides.
Allow cli users to pass --web-endpoint for headless flows that still want the backend.
Document environment variable support (REACT_APP_API_ENDPOINT, VITE_IMAGE_API_URL, etc.).

Libraries: hono, hono/cors, @google/generative-ai, hosting runtime (bun, node, or serverless). Front-end remains React/Vite/SvelteKit as today.

Cross-Platform Checklist (Preview)

Align TypeScript interfaces (UnifiedImageGenerator) so desktop/mobile/web can plug into the same UI surface.
Ensure persistent storage format (.kbot-gui.json) works across platforms—consider namespacing mobile vs desktop history entries.
Plan rate limiting and API key management per platform (mobile secure storage, web backend vault).
Identify testing layers (unit mocks for fetch, integration harness for Tauri mobile, e2e web flows).

This structure will be decomposed into a detailed TODO roadmap in the following slice.

6.1 KiB Raw Blame History Unescape Escape