mono/packages/kbot/docs/images-tauri-5.md
2025-10-14 10:54:12 +02:00

6.1 KiB
Raw Blame History

Image Generation Architecture — Platform v5

This document captures the shape of the refreshed multiplatform image generation plan that we will break down into actionable tasks next. It keeps the current CLI + desktop flow, layers in mobile (Android/iOS) expectations, and sketches a browser/web-app path with configurable endpoints.

1. CLI Desktop (Current Flow)

  • Ownership: src/commands/images.ts remains the orchestration point; it spawns the packaged Tauri desktop binary and handles filesystem writes.
  • IPC Contract: JSON payloads over stdin/stdout between the CLI and Tauri. The CLI continues to push resolved prompts, destination paths, API key, and included files.
  • Image Ops: Google Generative AI integration stays in Node-land (createImage, editImage) with @polymech/fs helpers for persistence.
// CLI-side launch (simplified excerpt)
const tauriProcess = spawn(getGuiAppPath(), args, { stdio: ['pipe', 'pipe', 'pipe'] });
tauriProcess.stdin?.write(JSON.stringify({
  cmd: 'forward_config_to_frontend',
  prompt: argv.prompt,
  dst: argv.dst,
  apiKey: apiKey,
  files: absoluteIncludes,
}) + '\n');

Libraries: existing stack (@polymech packages, tslog, Node core modules). No new work required beyond polish/bugfix.

2. Android / iOS — Standalone Tauri

Desktop spawning is not available on mobile; the GUI ships as the full application. We lean on the TypeScript layer plus Tauris HTTP plugin to hit Googles endpoints without wiring Rust-side HTTP clients.

Requirements

  • Bundle @tauri-apps/plugin-http, @tauri-apps/plugin-os, @tauri-apps/plugin-fs.
  • Rely on the existing tauriApi.fetch abstraction so we do not unwrap the plugin everywhere.
  • Persist lightweight state (prompt history, cached API key) in app data dir just like desktop.

Example TypeScript Mobile Client

// gui/tauri-app/src/lib/mobileClient.ts
import { tauriApi } from './tauriApi';

const GOOGLE_BASE = 'https://generativelanguage.googleapis.com/v1beta';

export async function mobileCreateImage(prompt: string, apiKey: string, model = 'gemini-2.5-flash-image-preview') {
  const response = await tauriApi.fetch(`${GOOGLE_BASE}/models/${model}:generateContent`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      Authorization: `Bearer ${apiKey}`,
    },
    body: JSON.stringify({ contents: [{ parts: [{ text: prompt }] }] }),
  });

  const data = await response.json();
  const inline = data.candidates?.[0]?.content?.parts?.find((part: any) => part.inlineData)?.inlineData;
  if (!inline?.data) throw new Error('No image data in Gemini response');
  return Buffer.from(inline.data, 'base64');
}

Configuration Notes

  • tauri.conf.json must whitelist https://generativelanguage.googleapis.com/** inside the HTTP plugin scope and CSP connect-src.
  • Add platform detection inside the React/Svelte front-end to toggle mobile-first UX and storage paths.

Libraries: @tauri-apps/plugin-http, @tauri-apps/api, @google/generative-ai (optional; the REST fetch example above avoids it if desired), existing UI stack.

3. Web App — Browser, Configurable Endpoints

Constraints (CORS, secret handling) require a server-side companion and a client that can be pointed at custom endpoints per user/tenant. The browser front-end holds no secrets; all API keys live server-side.

Backend Sketch (Hono)

// web/api/imageServer.ts
import { Hono } from 'hono';
import { cors } from 'hono/cors';
import { GoogleGenerativeAI } from '@google/generative-ai';

const app = new Hono();

app.use('/*', cors({
  origin: ['http://localhost:3000', 'https://your-frontend.example'],
  allowHeaders: ['Content-Type', 'Authorization'],
  allowMethods: ['POST', 'OPTIONS'],
}));

app.post('/api/images/create', async (c) => {
  const { prompt, apiKey, model = 'gemini-2.5-flash-image-preview' } = await c.req.json();
  const genAI = new GoogleGenerativeAI(apiKey);
  const modelClient = genAI.getGenerativeModel({ model });
  const result = await modelClient.generateContent(prompt);
  const inline = result.response.candidates?.[0]?.content?.parts?.find((part) => 'inlineData' in part)?.inlineData;
  if (!inline?.data) return c.json({ success: false, error: 'No image data' }, 500);
  return c.json({ success: true, image: inline });
});

export default app;

Browser Client Stub

// web/client/webImageClient.ts
export class WebImageClient {
  constructor(private endpoint: string) {}

  async createImage(prompt: string, apiKeyAlias: string) {
    const res = await fetch(`${this.endpoint}/api/images/create`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ prompt, apiKey: apiKeyAlias }),
    });
    if (!res.ok) throw new Error(`HTTP ${res.status}`);
    const data = await res.json();
    if (!data.success) throw new Error(data.error || 'Unknown backend error');
    return data.image; // caller decides how to render Blob/Base64
  }
}

Configuration Extension

  • Expand shared config schema with a web.apiEndpoint block and optional per-user overrides.
  • Allow cli users to pass --web-endpoint for headless flows that still want the backend.
  • Document environment variable support (REACT_APP_API_ENDPOINT, VITE_IMAGE_API_URL, etc.).

Libraries: hono, hono/cors, @google/generative-ai, hosting runtime (bun, node, or serverless). Front-end remains React/Vite/SvelteKit as today.

Cross-Platform Checklist (Preview)

  • Align TypeScript interfaces (UnifiedImageGenerator) so desktop/mobile/web can plug into the same UI surface.
  • Ensure persistent storage format (.kbot-gui.json) works across platforms—consider namespacing mobile vs desktop history entries.
  • Plan rate limiting and API key management per platform (mobile secure storage, web backend vault).
  • Identify testing layers (unit mocks for fetch, integration harness for Tauri mobile, e2e web flows).

This structure will be decomposed into a detailed TODO roadmap in the following slice.