27 KiB
27 KiB
Multi-Platform Image Generation Architecture
Overview
This document outlines the architecture for supporting image generation across multiple platforms:
- CLI Desktop (current implementation) - Node.js CLI spawning Tauri GUI
- Mobile (Android/iOS) - Standalone Tauri app with HTTP API calls
- Web App - Browser-based application with configurable endpoints
Current Architecture (CLI Desktop)
Flow
CLI (images.ts) → Spawn Tauri Process → IPC Communication → Google AI API → Image Generation
Key Components
- CLI Entry:
src/commands/images.ts- Main command handler - Image Generation:
src/lib/images-google.ts- Google Generative AI integration - Tauri GUI:
gui/tauri-app/- Desktop GUI application - IPC Bridge: Stdin/stdout communication between CLI and Tauri
Current Implementation Details
// CLI spawns Tauri process
const tauriProcess = spawn(guiAppPath, args, { stdio: ['pipe', 'pipe', 'pipe'] });
// Communication via JSON messages
const configResponse = {
cmd: 'forward_config_to_frontend',
prompt: argv.prompt || null,
dst: argv.dst || null,
apiKey: apiKey || null,
files: absoluteIncludes
};
Platform-Specific Architectures
1. CLI Desktop (Current - Keep As-Is)
Pros:
- Direct file system access
- Native performance
- Existing implementation works well
Architecture:
┌─────────────┐ ┌──────────────┐ ┌─────────────────┐
│ CLI App │───▶│ Tauri GUI │───▶│ Google AI API │
│ (images.ts) │ │ (Rust) │ │ (Direct) │
└─────────────┘ └──────────────┘ └─────────────────┘
2. Mobile (Android/iOS) - Standalone Tauri
Challenge: No CLI spawning capability on mobile Solution: Standalone Tauri app with HTTP client for API calls
Architecture:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Tauri App │───▶│ HTTP Client │───▶│ Google AI API │
│ (Standalone) │ │ (tauri-plugin- │ │ (via HTTP) │
│ │ │ http) │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Implementation Strategy:
Option A: TypeScript Frontend HTTP (Recommended)
// src/lib/images-mobile.ts
import { tauriApi } from '../gui/tauri-app/src/lib/tauriApi';
export class MobileImageGenerator {
private apiKey: string;
private baseUrl = 'https://generativelanguage.googleapis.com/v1beta';
constructor(apiKey: string) {
this.apiKey = apiKey;
}
async createImage(prompt: string): Promise<Buffer> {
const response = await tauriApi.fetch(`${this.baseUrl}/models/gemini-2.5-flash-image-preview:generateContent`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${this.apiKey}`
},
body: JSON.stringify({
contents: [{
parts: [{ text: prompt }]
}]
})
});
const data = await response.json();
const imageData = data.candidates[0].content.parts[0].inlineData.data;
return Buffer.from(imageData, 'base64');
}
async editImage(prompt: string, imageFiles: File[]): Promise<Buffer> {
const parts = [];
// Add image parts
for (const file of imageFiles) {
const arrayBuffer = await file.arrayBuffer();
const base64 = btoa(String.fromCharCode(...new Uint8Array(arrayBuffer)));
parts.push({
inlineData: {
mimeType: file.type,
data: base64
}
});
}
// Add text prompt
parts.push({ text: prompt });
const response = await tauriApi.fetch(`${this.baseUrl}/models/gemini-2.5-flash-image-preview:generateContent`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${this.apiKey}`
},
body: JSON.stringify({
contents: [{ parts }]
})
});
const data = await response.json();
const imageData = data.candidates[0].content.parts[0].inlineData.data;
return Buffer.from(imageData, 'base64');
}
}
Mobile-Specific Tauri Configuration
// gui/tauri-app/src-tauri/tauri.conf.json (mobile additions)
{
"plugins": {
"http": {
"all": true,
"request": true,
"scope": [
"https://generativelanguage.googleapis.com/**"
]
}
},
"security": {
"csp": {
"default-src": "'self'",
"connect-src": "'self' https://generativelanguage.googleapis.com"
}
}
}
3. Web App - Browser-Based with Configurable Endpoints
Challenge: CORS restrictions, no direct Google AI API access Solution: Backend API server (Hono) + configurable endpoints
Architecture:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Web App │───▶│ Backend API │───▶│ Google AI API │
│ (React/TS) │ │ (Hono.js) │ │ (Server) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Backend API Server (Hono.js)
// src/web/image-api-server.ts
import { Hono } from 'hono';
import { cors } from 'hono/cors';
import { GoogleGenerativeAI } from '@google/generative-ai';
const app = new Hono();
app.use('/*', cors({
origin: ['http://localhost:3000', 'https://your-domain.com'],
allowHeaders: ['Content-Type', 'Authorization'],
allowMethods: ['POST', 'GET', 'OPTIONS'],
}));
interface ImageRequest {
prompt: string;
images?: Array<{
data: string; // base64
mimeType: string;
}>;
apiKey: string;
model?: string;
}
app.post('/api/images/create', async (c) => {
try {
const { prompt, apiKey, model = 'gemini-2.5-flash-image-preview' }: ImageRequest = await c.req.json();
const genAI = new GoogleGenerativeAI(apiKey);
const genModel = genAI.getGenerativeModel({ model });
const result = await genModel.generateContent(prompt);
const response = result.response;
if (!response.candidates?.[0]?.content?.parts) {
throw new Error('No image generated');
}
const imageData = response.candidates[0].content.parts.find(part =>
'inlineData' in part
)?.inlineData;
if (!imageData) {
throw new Error('No image data in response');
}
return c.json({
success: true,
image: {
data: imageData.data,
mimeType: imageData.mimeType
}
});
} catch (error) {
return c.json({
success: false,
error: error.message
}, 500);
}
});
app.post('/api/images/edit', async (c) => {
try {
const { prompt, images, apiKey, model = 'gemini-2.5-flash-image-preview' }: ImageRequest = await c.req.json();
const genAI = new GoogleGenerativeAI(apiKey);
const genModel = genAI.getGenerativeModel({ model });
const parts = [];
// Add image parts
if (images) {
for (const img of images) {
parts.push({
inlineData: {
mimeType: img.mimeType,
data: img.data
}
});
}
}
// Add text prompt
parts.push({ text: prompt });
const result = await genModel.generateContent(parts);
const response = result.response;
if (!response.candidates?.[0]?.content?.parts) {
throw new Error('No image generated');
}
const imageData = response.candidates[0].content.parts.find(part =>
'inlineData' in part
)?.inlineData;
if (!imageData) {
throw new Error('No image data in response');
}
return c.json({
success: true,
image: {
data: imageData.data,
mimeType: imageData.mimeType
}
});
} catch (error) {
return c.json({
success: false,
error: error.message
}, 500);
}
});
export default app;
// Server startup
if (import.meta.main) {
const port = parseInt(process.env.PORT || '3001');
console.log(`🚀 Image API server starting on port ${port}`);
Bun.serve({
fetch: app.fetch,
port,
});
}
Web Frontend Client
// src/web/image-client.ts
export interface WebImageConfig {
apiEndpoint: string; // e.g., 'http://localhost:3001' or 'https://api.yourservice.com'
apiKey: string;
}
export class WebImageGenerator {
private config: WebImageConfig;
constructor(config: WebImageConfig) {
this.config = config;
}
async createImage(prompt: string): Promise<Blob> {
const response = await fetch(`${this.config.apiEndpoint}/api/images/create`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
prompt,
apiKey: this.config.apiKey
})
});
if (!response.ok) {
throw new Error(`HTTP error! status: ${response.status}`);
}
const data = await response.json();
if (!data.success) {
throw new Error(data.error || 'Unknown error');
}
// Convert base64 to blob
const binaryString = atob(data.image.data);
const bytes = new Uint8Array(binaryString.length);
for (let i = 0; i < binaryString.length; i++) {
bytes[i] = binaryString.charCodeAt(i);
}
return new Blob([bytes], { type: data.image.mimeType });
}
async editImage(prompt: string, imageFiles: File[]): Promise<Blob> {
const images = [];
for (const file of imageFiles) {
const arrayBuffer = await file.arrayBuffer();
const base64 = btoa(String.fromCharCode(...new Uint8Array(arrayBuffer)));
images.push({
data: base64,
mimeType: file.type
});
}
const response = await fetch(`${this.config.apiEndpoint}/api/images/edit`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
prompt,
images,
apiKey: this.config.apiKey
})
});
if (!response.ok) {
throw new Error(`HTTP error! status: ${response.status}`);
}
const data = await response.json();
if (!data.success) {
throw new Error(data.error || 'Unknown error');
}
// Convert base64 to blob
const binaryString = atob(data.image.data);
const bytes = new Uint8Array(binaryString.length);
for (let i = 0; i < binaryString.length; i++) {
bytes[i] = binaryString.charCodeAt(i);
}
return new Blob([bytes], { type: data.image.mimeType });
}
}
Web App Configuration
// src/web/config.ts
export interface PlatformConfig {
platform: 'cli' | 'mobile' | 'web';
// Web-specific config
web?: {
apiEndpoint: string;
corsEnabled: boolean;
allowedOrigins: string[];
};
// Mobile-specific config
mobile?: {
directApiAccess: boolean;
cacheImages: boolean;
maxImageSize: number;
};
// CLI-specific config (existing)
cli?: {
guiEnabled: boolean;
tempDir: string;
};
}
export const getDefaultConfig = (): PlatformConfig => {
// Detect platform
const isTauri = !!(window as any).__TAURI__;
const isMobile = isTauri && /Android|iPhone|iPad|iPod|BlackBerry|IEMobile|Opera Mini/i.test(navigator.userAgent);
const isWeb = !isTauri;
if (isMobile) {
return {
platform: 'mobile',
mobile: {
directApiAccess: true,
cacheImages: true,
maxImageSize: 5 * 1024 * 1024 // 5MB
}
};
} else if (isWeb) {
return {
platform: 'web',
web: {
apiEndpoint: process.env.REACT_APP_API_ENDPOINT || 'http://localhost:3001',
corsEnabled: true,
allowedOrigins: ['http://localhost:3000']
}
};
} else {
return {
platform: 'cli',
cli: {
guiEnabled: true,
tempDir: process.env.TEMP || '/tmp'
}
};
}
};
Platform Detection & Unified Interface
// src/lib/image-generator-factory.ts
import { WebImageGenerator } from '../web/image-client';
import { MobileImageGenerator } from './images-mobile';
import { createImage, editImage } from './images-google'; // CLI version
import { getDefaultConfig, PlatformConfig } from '../web/config';
export interface UnifiedImageGenerator {
createImage(prompt: string): Promise<Buffer | Blob>;
editImage(prompt: string, images: File[] | string[]): Promise<Buffer | Blob>;
}
export class ImageGeneratorFactory {
static create(config?: PlatformConfig): UnifiedImageGenerator {
const platformConfig = config || getDefaultConfig();
switch (platformConfig.platform) {
case 'web':
return new WebImageGeneratorAdapter(
new WebImageGenerator({
apiEndpoint: platformConfig.web!.apiEndpoint,
apiKey: '' // Will be set later
})
);
case 'mobile':
return new MobileImageGeneratorAdapter(
new MobileImageGenerator('') // API key set later
);
case 'cli':
default:
return new CLIImageGeneratorAdapter();
}
}
}
// Adapters to normalize the interface
class WebImageGeneratorAdapter implements UnifiedImageGenerator {
constructor(private generator: WebImageGenerator) {}
async createImage(prompt: string): Promise<Blob> {
return this.generator.createImage(prompt);
}
async editImage(prompt: string, images: File[]): Promise<Blob> {
return this.generator.editImage(prompt, images);
}
}
class MobileImageGeneratorAdapter implements UnifiedImageGenerator {
constructor(private generator: MobileImageGenerator) {}
async createImage(prompt: string): Promise<Buffer> {
return this.generator.createImage(prompt);
}
async editImage(prompt: string, images: File[]): Promise<Buffer> {
return this.generator.editImage(prompt, images);
}
}
class CLIImageGeneratorAdapter implements UnifiedImageGenerator {
async createImage(prompt: string): Promise<Buffer> {
// Use existing CLI implementation
return createImage(prompt, {} as any) as Promise<Buffer>;
}
async editImage(prompt: string, images: string[]): Promise<Buffer> {
// Use existing CLI implementation
return editImage(prompt, images, {} as any) as Promise<Buffer>;
}
}
Required Dependencies
CLI (Existing)
{
"dependencies": {
"@google/generative-ai": "^0.21.0",
"tauri": "^2.0.0"
}
}
Mobile (Tauri)
{
"dependencies": {
"@tauri-apps/plugin-http": "^2.0.0",
"@tauri-apps/api": "^2.0.0"
}
}
Web Backend (Hono)
{
"dependencies": {
"hono": "^4.0.0",
"@google/generative-ai": "^0.21.0",
"bun": "^1.0.0"
}
}
Web Frontend
{
"dependencies": {
"react": "^18.0.0",
"@types/react": "^18.0.0"
}
}
Deployment Strategies
CLI Desktop
- Current: Nexe bundling with Tauri executable
- Distribution: GitHub releases with platform-specific binaries
Mobile
- Android: APK via Tauri build system
- iOS: App Store via Tauri + Xcode
- Distribution: App stores or direct APK/IPA
Web App
- Frontend: Static hosting (Vercel, Netlify, Cloudflare Pages)
- Backend:
- Option 1: Bun/Node.js server (Railway, Render, DigitalOcean)
- Option 2: Serverless functions (Vercel Functions, Cloudflare Workers)
- Option 3: Docker containers (any cloud provider)
Migration Path
Phase 1: Maintain CLI (Current)
- Keep existing CLI implementation
- No changes to current workflow
Phase 2: Add Mobile Support
- Implement
MobileImageGeneratorclass - Add HTTP client configuration
- Test on Android/iOS simulators
Phase 3: Add Web Support
- Create Hono backend API
- Implement web frontend client
- Add configuration management
Phase 4: Unified Interface
- Implement factory pattern
- Add platform detection
- Create unified API surface
Security Considerations
API Key Management
- CLI: Local config files, environment variables
- Mobile: Secure storage via Tauri
- Web: Backend-only, never expose to frontend
CORS & CSP
- Web: Strict CORS policies, CSP headers
- Mobile: Tauri security policies
- CLI: Not applicable (local execution)
Rate Limiting
- All Platforms: Implement client-side rate limiting
- Web: Server-side rate limiting per IP/user
Testing Strategy
Unit Tests
// tests/image-generator.test.ts
import { ImageGeneratorFactory } from '../src/lib/image-generator-factory';
describe('ImageGenerator', () => {
test('CLI platform creates correct generator', () => {
const generator = ImageGeneratorFactory.create({ platform: 'cli' });
expect(generator).toBeInstanceOf(CLIImageGeneratorAdapter);
});
test('Web platform creates correct generator', () => {
const generator = ImageGeneratorFactory.create({
platform: 'web',
web: { apiEndpoint: 'http://test.com', corsEnabled: true, allowedOrigins: [] }
});
expect(generator).toBeInstanceOf(WebImageGeneratorAdapter);
});
});
Integration Tests
- CLI: Test Tauri process spawning
- Mobile: Test HTTP API calls with mock server
- Web: Test full frontend-backend flow
Performance Considerations
Image Handling
- CLI: Direct file system access (fastest)
- Mobile: In-memory processing, consider caching
- Web: Base64 encoding overhead, consider streaming
Network Optimization
- Mobile: Implement request queuing, retry logic
- Web: Connection pooling, request batching
Memory Management
- All Platforms: Stream large images, avoid loading entire files into memory
- Mobile: Implement image compression before API calls
Implementation Todo List
Phase 1: Mobile Platform Support (Priority: High)
1.1 Mobile HTTP Client Implementation
- Create mobile image generator class (
src/lib/images-mobile.ts)- Implement
MobileImageGeneratorclass with HTTP client - Add TypeScript fetch wrapper using
tauriApi.fetch - Handle Google AI API authentication and requests
- Add error handling for network failures and API errors
- Implement image creation endpoint integration
- Implement image editing endpoint integration
- Implement
1.2 Mobile Tauri Configuration
- Update Tauri config for mobile HTTP access
- Add
tauri-plugin-httpto dependencies - Configure HTTP scope for Google AI API endpoints
- Update CSP policies for external API access
- Test HTTP plugin functionality on mobile simulators
- Add
1.3 Mobile Platform Detection
- Add mobile platform detection logic
- Detect Android/iOS in Tauri environment
- Create mobile-specific configuration defaults
- Add mobile UI adaptations (touch-friendly controls)
- Implement mobile-specific file handling
Phase 2: Web Platform Support (Priority: Medium)
2.1 Backend API Server (Hono)
- Create Hono.js backend server (
src/web/image-api-server.ts)- Set up Hono app with CORS middleware
- Implement
/api/images/createendpoint - Implement
/api/images/editendpoint - Add request validation and error handling
- Add rate limiting middleware
- Add API key validation
- Add logging and monitoring
2.2 Web Frontend Client
- Create web image client (
src/web/image-client.ts)- Implement
WebImageGeneratorclass - Add fetch-based API communication
- Handle file uploads and base64 conversion
- Add progress tracking for large requests
- Implement retry logic for failed requests
- Implement
2.3 Web Configuration Management
- Add web-specific configuration (
src/web/config.ts)- Create configurable API endpoints
- Add environment variable support
- Implement CORS configuration
- Add deployment-specific settings
Phase 3: Unified Interface (Priority: Medium)
3.1 Factory Pattern Implementation
- Create image generator factory (
src/lib/image-generator-factory.ts)- Implement platform detection logic
- Create unified interface for all platforms
- Add adapter classes for each platform
- Implement configuration-based generator selection
3.2 Platform Adapters
- Create platform adapters
CLIImageGeneratorAdapter- wrap existing CLI implementationMobileImageGeneratorAdapter- wrap mobile HTTP clientWebImageGeneratorAdapter- wrap web API client- Normalize return types (Buffer vs Blob handling)
Phase 4: Testing & Quality Assurance (Priority: High)
4.1 Unit Tests
- Write comprehensive unit tests
- Test factory pattern and platform detection
- Test each adapter class individually
- Mock HTTP requests for mobile/web testing
- Test error handling scenarios
- Test configuration loading and validation
4.2 Integration Tests
- Create integration test suite
- Test CLI-to-Tauri communication (existing)
- Test mobile HTTP API calls with mock server
- Test web frontend-backend communication
- Test cross-platform image format compatibility
- Test API key management across platforms
4.3 Platform-Specific Testing
-
Mobile testing
- Test on Android emulator/device
- Test on iOS simulator/device
- Test network connectivity edge cases
- Test file system permissions
- Performance testing with large images
-
Web testing
- Test CORS configuration
- Test different browsers (Chrome, Firefox, Safari)
- Test file upload limits
- Test API server deployment
- Load testing for concurrent requests
Phase 5: Deployment & Distribution (Priority: Low)
5.1 Mobile Deployment
- Set up mobile build pipeline
- Configure Android build (APK/AAB)
- Configure iOS build (IPA)
- Set up code signing for both platforms
- Create app store metadata and screenshots
- Test installation and updates
5.2 Web Deployment
- Deploy web application
- Set up frontend hosting (Vercel/Netlify)
- Deploy backend API server
- Configure domain and SSL certificates
- Set up monitoring and logging
- Configure CDN for static assets
5.3 Documentation & Guides
- Create user documentation
- Platform-specific installation guides
- API configuration instructions
- Troubleshooting guides
- Performance optimization tips
- Security best practices
Phase 6: Advanced Features (Priority: Low)
6.1 Performance Optimizations
- Implement performance improvements
- Image compression before API calls
- Request batching for multiple images
- Caching layer for repeated requests
- Progressive image loading
- Background processing for large operations
6.2 Enhanced Security
- Add security enhancements
- API key encryption at rest
- Request signing for web API
- Rate limiting per user/session
- Input sanitization and validation
- Audit logging for API calls
6.3 User Experience Improvements
- Enhance user interface
- Drag-and-drop file uploads
- Real-time preview of edits
- Batch processing interface
- History and favorites management
- Keyboard shortcuts and accessibility
Estimated Timeline
- Phase 1 (Mobile): 2-3 weeks
- Phase 2 (Web): 2-3 weeks
- Phase 3 (Unified): 1 week
- Phase 4 (Testing): 2 weeks
- Phase 5 (Deployment): 1 week
- Phase 6 (Advanced): 3-4 weeks
Total Estimated Time: 11-16 weeks
Dependencies & Prerequisites
Required Skills
- TypeScript/JavaScript development
- Tauri framework knowledge
- React/frontend development
- Hono.js/backend API development
- Mobile app development (Android/iOS)
- Google AI API integration
Required Tools
- Node.js 18+
- Rust toolchain
- Android Studio (for Android builds)
- Xcode (for iOS builds)
- Bun runtime (for Hono server)
External Services
- Google AI API access and billing
- Cloud hosting for web backend
- App store developer accounts (mobile)
- Domain registration (web)