6.1 KiB
Image Generation Architecture — Platform v5
This document captures the shape of the refreshed multiplatform image generation plan that we will break down into actionable tasks next. It keeps the current CLI + desktop flow, layers in mobile (Android/iOS) expectations, and sketches a browser/web-app path with configurable endpoints.
1. CLI Desktop (Current Flow)
- Ownership:
src/commands/images.tsremains the orchestration point; it spawns the packaged Tauri desktop binary and handles filesystem writes. - IPC Contract: JSON payloads over
stdin/stdoutbetween the CLI and Tauri. The CLI continues to push resolved prompts, destination paths, API key, and included files. - Image Ops: Google Generative AI integration stays in Node-land (
createImage,editImage) with@polymech/fshelpers for persistence.
// CLI-side launch (simplified excerpt)
const tauriProcess = spawn(getGuiAppPath(), args, { stdio: ['pipe', 'pipe', 'pipe'] });
tauriProcess.stdin?.write(JSON.stringify({
cmd: 'forward_config_to_frontend',
prompt: argv.prompt,
dst: argv.dst,
apiKey: apiKey,
files: absoluteIncludes,
}) + '\n');
Libraries: existing stack (@polymech packages, tslog, Node core modules). No new work required beyond polish/bugfix.
2. Android / iOS — Standalone Tauri
Desktop spawning is not available on mobile; the GUI ships as the full application. We lean on the TypeScript layer plus Tauri’s HTTP plugin to hit Google’s endpoints without wiring Rust-side HTTP clients.
Requirements
- Bundle
@tauri-apps/plugin-http,@tauri-apps/plugin-os,@tauri-apps/plugin-fs. - Rely on the existing
tauriApi.fetchabstraction so we do not unwrap the plugin everywhere. - Persist lightweight state (prompt history, cached API key) in app data dir just like desktop.
Example TypeScript Mobile Client
// gui/tauri-app/src/lib/mobileClient.ts
import { tauriApi } from './tauriApi';
const GOOGLE_BASE = 'https://generativelanguage.googleapis.com/v1beta';
export async function mobileCreateImage(prompt: string, apiKey: string, model = 'gemini-2.5-flash-image-preview') {
const response = await tauriApi.fetch(`${GOOGLE_BASE}/models/${model}:generateContent`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${apiKey}`,
},
body: JSON.stringify({ contents: [{ parts: [{ text: prompt }] }] }),
});
const data = await response.json();
const inline = data.candidates?.[0]?.content?.parts?.find((part: any) => part.inlineData)?.inlineData;
if (!inline?.data) throw new Error('No image data in Gemini response');
return Buffer.from(inline.data, 'base64');
}
Configuration Notes
tauri.conf.jsonmust whitelisthttps://generativelanguage.googleapis.com/**inside the HTTP plugin scope and CSPconnect-src.- Add platform detection inside the React/Svelte front-end to toggle mobile-first UX and storage paths.
Libraries: @tauri-apps/plugin-http, @tauri-apps/api, @google/generative-ai (optional; the REST fetch example above avoids it if desired), existing UI stack.
3. Web App — Browser, Configurable Endpoints
Constraints (CORS, secret handling) require a server-side companion and a client that can be pointed at custom endpoints per user/tenant. The browser front-end holds no secrets; all API keys live server-side.
Backend Sketch (Hono)
// web/api/imageServer.ts
import { Hono } from 'hono';
import { cors } from 'hono/cors';
import { GoogleGenerativeAI } from '@google/generative-ai';
const app = new Hono();
app.use('/*', cors({
origin: ['http://localhost:3000', 'https://your-frontend.example'],
allowHeaders: ['Content-Type', 'Authorization'],
allowMethods: ['POST', 'OPTIONS'],
}));
app.post('/api/images/create', async (c) => {
const { prompt, apiKey, model = 'gemini-2.5-flash-image-preview' } = await c.req.json();
const genAI = new GoogleGenerativeAI(apiKey);
const modelClient = genAI.getGenerativeModel({ model });
const result = await modelClient.generateContent(prompt);
const inline = result.response.candidates?.[0]?.content?.parts?.find((part) => 'inlineData' in part)?.inlineData;
if (!inline?.data) return c.json({ success: false, error: 'No image data' }, 500);
return c.json({ success: true, image: inline });
});
export default app;
Browser Client Stub
// web/client/webImageClient.ts
export class WebImageClient {
constructor(private endpoint: string) {}
async createImage(prompt: string, apiKeyAlias: string) {
const res = await fetch(`${this.endpoint}/api/images/create`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ prompt, apiKey: apiKeyAlias }),
});
if (!res.ok) throw new Error(`HTTP ${res.status}`);
const data = await res.json();
if (!data.success) throw new Error(data.error || 'Unknown backend error');
return data.image; // caller decides how to render Blob/Base64
}
}
Configuration Extension
- Expand shared config schema with a
web.apiEndpointblock and optional per-user overrides. - Allow
cliusers to pass--web-endpointfor headless flows that still want the backend. - Document environment variable support (
REACT_APP_API_ENDPOINT,VITE_IMAGE_API_URL, etc.).
Libraries: hono, hono/cors, @google/generative-ai, hosting runtime (bun, node, or serverless). Front-end remains React/Vite/SvelteKit as today.
Cross-Platform Checklist (Preview)
- Align TypeScript interfaces (
UnifiedImageGenerator) so desktop/mobile/web can plug into the same UI surface. - Ensure persistent storage format (
.kbot-gui.json) works across platforms—consider namespacing mobile vs desktop history entries. - Plan rate limiting and API key management per platform (mobile secure storage, web backend vault).
- Identify testing layers (unit mocks for fetch, integration harness for Tauri mobile, e2e web flows).
This structure will be decomposed into a detailed TODO roadmap in the following slice.