mono/packages/kbot/docs/images-tauri.md
2025-09-23 20:32:47 +02:00

27 KiB

Multi-Platform Image Generation Architecture

Overview

This document outlines the architecture for supporting image generation across multiple platforms:

  1. CLI Desktop (current implementation) - Node.js CLI spawning Tauri GUI
  2. Mobile (Android/iOS) - Standalone Tauri app with HTTP API calls
  3. Web App - Browser-based application with configurable endpoints

Current Architecture (CLI Desktop)

Flow

CLI (images.ts) → Spawn Tauri Process → IPC Communication → Google AI API → Image Generation

Key Components

  • CLI Entry: src/commands/images.ts - Main command handler
  • Image Generation: src/lib/images-google.ts - Google Generative AI integration
  • Tauri GUI: gui/tauri-app/ - Desktop GUI application
  • IPC Bridge: Stdin/stdout communication between CLI and Tauri

Current Implementation Details

// CLI spawns Tauri process
const tauriProcess = spawn(guiAppPath, args, { stdio: ['pipe', 'pipe', 'pipe'] });

// Communication via JSON messages
const configResponse = {
    cmd: 'forward_config_to_frontend',
    prompt: argv.prompt || null,
    dst: argv.dst || null,
    apiKey: apiKey || null,
    files: absoluteIncludes
};

Platform-Specific Architectures

1. CLI Desktop (Current - Keep As-Is)

Pros:

  • Direct file system access
  • Native performance
  • Existing implementation works well

Architecture:

┌─────────────┐    ┌──────────────┐    ┌─────────────────┐
│   CLI App   │───▶│  Tauri GUI   │───▶│  Google AI API  │
│ (images.ts) │    │   (Rust)     │    │   (Direct)      │
└─────────────┘    └──────────────┘    └─────────────────┘

2. Mobile (Android/iOS) - Standalone Tauri

Challenge: No CLI spawning capability on mobile Solution: Standalone Tauri app with HTTP client for API calls

Architecture:

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Tauri App     │───▶│   HTTP Client   │───▶│  Google AI API  │
│ (Standalone)    │    │ (tauri-plugin-  │    │   (via HTTP)    │
│                 │    │     http)       │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Implementation Strategy:

// src/lib/images-mobile.ts
import { tauriApi } from '../gui/tauri-app/src/lib/tauriApi';

export class MobileImageGenerator {
    private apiKey: string;
    private baseUrl = 'https://generativelanguage.googleapis.com/v1beta';

    constructor(apiKey: string) {
        this.apiKey = apiKey;
    }

    async createImage(prompt: string): Promise<Buffer> {
        const response = await tauriApi.fetch(`${this.baseUrl}/models/gemini-2.5-flash-image-preview:generateContent`, {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
                'Authorization': `Bearer ${this.apiKey}`
            },
            body: JSON.stringify({
                contents: [{
                    parts: [{ text: prompt }]
                }]
            })
        });

        const data = await response.json();
        const imageData = data.candidates[0].content.parts[0].inlineData.data;
        return Buffer.from(imageData, 'base64');
    }

    async editImage(prompt: string, imageFiles: File[]): Promise<Buffer> {
        const parts = [];
        
        // Add image parts
        for (const file of imageFiles) {
            const arrayBuffer = await file.arrayBuffer();
            const base64 = btoa(String.fromCharCode(...new Uint8Array(arrayBuffer)));
            parts.push({
                inlineData: {
                    mimeType: file.type,
                    data: base64
                }
            });
        }
        
        // Add text prompt
        parts.push({ text: prompt });

        const response = await tauriApi.fetch(`${this.baseUrl}/models/gemini-2.5-flash-image-preview:generateContent`, {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
                'Authorization': `Bearer ${this.apiKey}`
            },
            body: JSON.stringify({
                contents: [{ parts }]
            })
        });

        const data = await response.json();
        const imageData = data.candidates[0].content.parts[0].inlineData.data;
        return Buffer.from(imageData, 'base64');
    }
}

Mobile-Specific Tauri Configuration

// gui/tauri-app/src-tauri/tauri.conf.json (mobile additions)
{
  "plugins": {
    "http": {
      "all": true,
      "request": true,
      "scope": [
        "https://generativelanguage.googleapis.com/**"
      ]
    }
  },
  "security": {
    "csp": {
      "default-src": "'self'",
      "connect-src": "'self' https://generativelanguage.googleapis.com"
    }
  }
}

3. Web App - Browser-Based with Configurable Endpoints

Challenge: CORS restrictions, no direct Google AI API access Solution: Backend API server (Hono) + configurable endpoints

Architecture:

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Web App       │───▶│   Backend API   │───▶│  Google AI API  │
│ (React/TS)      │    │   (Hono.js)     │    │   (Server)      │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Backend API Server (Hono.js)

// src/web/image-api-server.ts
import { Hono } from 'hono';
import { cors } from 'hono/cors';
import { GoogleGenerativeAI } from '@google/generative-ai';

const app = new Hono();

app.use('/*', cors({
    origin: ['http://localhost:3000', 'https://your-domain.com'],
    allowHeaders: ['Content-Type', 'Authorization'],
    allowMethods: ['POST', 'GET', 'OPTIONS'],
}));

interface ImageRequest {
    prompt: string;
    images?: Array<{
        data: string; // base64
        mimeType: string;
    }>;
    apiKey: string;
    model?: string;
}

app.post('/api/images/create', async (c) => {
    try {
        const { prompt, apiKey, model = 'gemini-2.5-flash-image-preview' }: ImageRequest = await c.req.json();
        
        const genAI = new GoogleGenerativeAI(apiKey);
        const genModel = genAI.getGenerativeModel({ model });
        
        const result = await genModel.generateContent(prompt);
        const response = result.response;
        
        if (!response.candidates?.[0]?.content?.parts) {
            throw new Error('No image generated');
        }
        
        const imageData = response.candidates[0].content.parts.find(part => 
            'inlineData' in part
        )?.inlineData;
        
        if (!imageData) {
            throw new Error('No image data in response');
        }
        
        return c.json({
            success: true,
            image: {
                data: imageData.data,
                mimeType: imageData.mimeType
            }
        });
        
    } catch (error) {
        return c.json({
            success: false,
            error: error.message
        }, 500);
    }
});

app.post('/api/images/edit', async (c) => {
    try {
        const { prompt, images, apiKey, model = 'gemini-2.5-flash-image-preview' }: ImageRequest = await c.req.json();
        
        const genAI = new GoogleGenerativeAI(apiKey);
        const genModel = genAI.getGenerativeModel({ model });
        
        const parts = [];
        
        // Add image parts
        if (images) {
            for (const img of images) {
                parts.push({
                    inlineData: {
                        mimeType: img.mimeType,
                        data: img.data
                    }
                });
            }
        }
        
        // Add text prompt
        parts.push({ text: prompt });
        
        const result = await genModel.generateContent(parts);
        const response = result.response;
        
        if (!response.candidates?.[0]?.content?.parts) {
            throw new Error('No image generated');
        }
        
        const imageData = response.candidates[0].content.parts.find(part => 
            'inlineData' in part
        )?.inlineData;
        
        if (!imageData) {
            throw new Error('No image data in response');
        }
        
        return c.json({
            success: true,
            image: {
                data: imageData.data,
                mimeType: imageData.mimeType
            }
        });
        
    } catch (error) {
        return c.json({
            success: false,
            error: error.message
        }, 500);
    }
});

export default app;

// Server startup
if (import.meta.main) {
    const port = parseInt(process.env.PORT || '3001');
    console.log(`🚀 Image API server starting on port ${port}`);
    
    Bun.serve({
        fetch: app.fetch,
        port,
    });
}

Web Frontend Client

// src/web/image-client.ts
export interface WebImageConfig {
    apiEndpoint: string; // e.g., 'http://localhost:3001' or 'https://api.yourservice.com'
    apiKey: string;
}

export class WebImageGenerator {
    private config: WebImageConfig;

    constructor(config: WebImageConfig) {
        this.config = config;
    }

    async createImage(prompt: string): Promise<Blob> {
        const response = await fetch(`${this.config.apiEndpoint}/api/images/create`, {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
            },
            body: JSON.stringify({
                prompt,
                apiKey: this.config.apiKey
            })
        });

        if (!response.ok) {
            throw new Error(`HTTP error! status: ${response.status}`);
        }

        const data = await response.json();
        
        if (!data.success) {
            throw new Error(data.error || 'Unknown error');
        }

        // Convert base64 to blob
        const binaryString = atob(data.image.data);
        const bytes = new Uint8Array(binaryString.length);
        for (let i = 0; i < binaryString.length; i++) {
            bytes[i] = binaryString.charCodeAt(i);
        }
        
        return new Blob([bytes], { type: data.image.mimeType });
    }

    async editImage(prompt: string, imageFiles: File[]): Promise<Blob> {
        const images = [];
        
        for (const file of imageFiles) {
            const arrayBuffer = await file.arrayBuffer();
            const base64 = btoa(String.fromCharCode(...new Uint8Array(arrayBuffer)));
            images.push({
                data: base64,
                mimeType: file.type
            });
        }

        const response = await fetch(`${this.config.apiEndpoint}/api/images/edit`, {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
            },
            body: JSON.stringify({
                prompt,
                images,
                apiKey: this.config.apiKey
            })
        });

        if (!response.ok) {
            throw new Error(`HTTP error! status: ${response.status}`);
        }

        const data = await response.json();
        
        if (!data.success) {
            throw new Error(data.error || 'Unknown error');
        }

        // Convert base64 to blob
        const binaryString = atob(data.image.data);
        const bytes = new Uint8Array(binaryString.length);
        for (let i = 0; i < binaryString.length; i++) {
            bytes[i] = binaryString.charCodeAt(i);
        }
        
        return new Blob([bytes], { type: data.image.mimeType });
    }
}

Web App Configuration

// src/web/config.ts
export interface PlatformConfig {
    platform: 'cli' | 'mobile' | 'web';
    
    // Web-specific config
    web?: {
        apiEndpoint: string;
        corsEnabled: boolean;
        allowedOrigins: string[];
    };
    
    // Mobile-specific config  
    mobile?: {
        directApiAccess: boolean;
        cacheImages: boolean;
        maxImageSize: number;
    };
    
    // CLI-specific config (existing)
    cli?: {
        guiEnabled: boolean;
        tempDir: string;
    };
}

export const getDefaultConfig = (): PlatformConfig => {
    // Detect platform
    const isTauri = !!(window as any).__TAURI__;
    const isMobile = isTauri && /Android|iPhone|iPad|iPod|BlackBerry|IEMobile|Opera Mini/i.test(navigator.userAgent);
    const isWeb = !isTauri;
    
    if (isMobile) {
        return {
            platform: 'mobile',
            mobile: {
                directApiAccess: true,
                cacheImages: true,
                maxImageSize: 5 * 1024 * 1024 // 5MB
            }
        };
    } else if (isWeb) {
        return {
            platform: 'web',
            web: {
                apiEndpoint: process.env.REACT_APP_API_ENDPOINT || 'http://localhost:3001',
                corsEnabled: true,
                allowedOrigins: ['http://localhost:3000']
            }
        };
    } else {
        return {
            platform: 'cli',
            cli: {
                guiEnabled: true,
                tempDir: process.env.TEMP || '/tmp'
            }
        };
    }
};

Platform Detection & Unified Interface

// src/lib/image-generator-factory.ts
import { WebImageGenerator } from '../web/image-client';
import { MobileImageGenerator } from './images-mobile';
import { createImage, editImage } from './images-google'; // CLI version
import { getDefaultConfig, PlatformConfig } from '../web/config';

export interface UnifiedImageGenerator {
    createImage(prompt: string): Promise<Buffer | Blob>;
    editImage(prompt: string, images: File[] | string[]): Promise<Buffer | Blob>;
}

export class ImageGeneratorFactory {
    static create(config?: PlatformConfig): UnifiedImageGenerator {
        const platformConfig = config || getDefaultConfig();
        
        switch (platformConfig.platform) {
            case 'web':
                return new WebImageGeneratorAdapter(
                    new WebImageGenerator({
                        apiEndpoint: platformConfig.web!.apiEndpoint,
                        apiKey: '' // Will be set later
                    })
                );
                
            case 'mobile':
                return new MobileImageGeneratorAdapter(
                    new MobileImageGenerator('') // API key set later
                );
                
            case 'cli':
            default:
                return new CLIImageGeneratorAdapter();
        }
    }
}

// Adapters to normalize the interface
class WebImageGeneratorAdapter implements UnifiedImageGenerator {
    constructor(private generator: WebImageGenerator) {}
    
    async createImage(prompt: string): Promise<Blob> {
        return this.generator.createImage(prompt);
    }
    
    async editImage(prompt: string, images: File[]): Promise<Blob> {
        return this.generator.editImage(prompt, images);
    }
}

class MobileImageGeneratorAdapter implements UnifiedImageGenerator {
    constructor(private generator: MobileImageGenerator) {}
    
    async createImage(prompt: string): Promise<Buffer> {
        return this.generator.createImage(prompt);
    }
    
    async editImage(prompt: string, images: File[]): Promise<Buffer> {
        return this.generator.editImage(prompt, images);
    }
}

class CLIImageGeneratorAdapter implements UnifiedImageGenerator {
    async createImage(prompt: string): Promise<Buffer> {
        // Use existing CLI implementation
        return createImage(prompt, {} as any) as Promise<Buffer>;
    }
    
    async editImage(prompt: string, images: string[]): Promise<Buffer> {
        // Use existing CLI implementation  
        return editImage(prompt, images, {} as any) as Promise<Buffer>;
    }
}

Required Dependencies

CLI (Existing)

{
  "dependencies": {
    "@google/generative-ai": "^0.21.0",
    "tauri": "^2.0.0"
  }
}

Mobile (Tauri)

{
  "dependencies": {
    "@tauri-apps/plugin-http": "^2.0.0",
    "@tauri-apps/api": "^2.0.0"
  }
}

Web Backend (Hono)

{
  "dependencies": {
    "hono": "^4.0.0",
    "@google/generative-ai": "^0.21.0",
    "bun": "^1.0.0"
  }
}

Web Frontend

{
  "dependencies": {
    "react": "^18.0.0",
    "@types/react": "^18.0.0"
  }
}

Deployment Strategies

CLI Desktop

  • Current: Nexe bundling with Tauri executable
  • Distribution: GitHub releases with platform-specific binaries

Mobile

  • Android: APK via Tauri build system
  • iOS: App Store via Tauri + Xcode
  • Distribution: App stores or direct APK/IPA

Web App

  • Frontend: Static hosting (Vercel, Netlify, Cloudflare Pages)
  • Backend:
    • Option 1: Bun/Node.js server (Railway, Render, DigitalOcean)
    • Option 2: Serverless functions (Vercel Functions, Cloudflare Workers)
    • Option 3: Docker containers (any cloud provider)

Migration Path

Phase 1: Maintain CLI (Current)

  • Keep existing CLI implementation
  • No changes to current workflow

Phase 2: Add Mobile Support

  • Implement MobileImageGenerator class
  • Add HTTP client configuration
  • Test on Android/iOS simulators

Phase 3: Add Web Support

  • Create Hono backend API
  • Implement web frontend client
  • Add configuration management

Phase 4: Unified Interface

  • Implement factory pattern
  • Add platform detection
  • Create unified API surface

Security Considerations

API Key Management

  • CLI: Local config files, environment variables
  • Mobile: Secure storage via Tauri
  • Web: Backend-only, never expose to frontend

CORS & CSP

  • Web: Strict CORS policies, CSP headers
  • Mobile: Tauri security policies
  • CLI: Not applicable (local execution)

Rate Limiting

  • All Platforms: Implement client-side rate limiting
  • Web: Server-side rate limiting per IP/user

Testing Strategy

Unit Tests

// tests/image-generator.test.ts
import { ImageGeneratorFactory } from '../src/lib/image-generator-factory';

describe('ImageGenerator', () => {
    test('CLI platform creates correct generator', () => {
        const generator = ImageGeneratorFactory.create({ platform: 'cli' });
        expect(generator).toBeInstanceOf(CLIImageGeneratorAdapter);
    });
    
    test('Web platform creates correct generator', () => {
        const generator = ImageGeneratorFactory.create({ 
            platform: 'web',
            web: { apiEndpoint: 'http://test.com', corsEnabled: true, allowedOrigins: [] }
        });
        expect(generator).toBeInstanceOf(WebImageGeneratorAdapter);
    });
});

Integration Tests

  • CLI: Test Tauri process spawning
  • Mobile: Test HTTP API calls with mock server
  • Web: Test full frontend-backend flow

Performance Considerations

Image Handling

  • CLI: Direct file system access (fastest)
  • Mobile: In-memory processing, consider caching
  • Web: Base64 encoding overhead, consider streaming

Network Optimization

  • Mobile: Implement request queuing, retry logic
  • Web: Connection pooling, request batching

Memory Management

  • All Platforms: Stream large images, avoid loading entire files into memory
  • Mobile: Implement image compression before API calls

Implementation Todo List

Phase 1: Mobile Platform Support (Priority: High)

1.1 Mobile HTTP Client Implementation

  • Create mobile image generator class (src/lib/images-mobile.ts)
    • Implement MobileImageGenerator class with HTTP client
    • Add TypeScript fetch wrapper using tauriApi.fetch
    • Handle Google AI API authentication and requests
    • Add error handling for network failures and API errors
    • Implement image creation endpoint integration
    • Implement image editing endpoint integration

1.2 Mobile Tauri Configuration

  • Update Tauri config for mobile HTTP access
    • Add tauri-plugin-http to dependencies
    • Configure HTTP scope for Google AI API endpoints
    • Update CSP policies for external API access
    • Test HTTP plugin functionality on mobile simulators

1.3 Mobile Platform Detection

  • Add mobile platform detection logic
    • Detect Android/iOS in Tauri environment
    • Create mobile-specific configuration defaults
    • Add mobile UI adaptations (touch-friendly controls)
    • Implement mobile-specific file handling

Phase 2: Web Platform Support (Priority: Medium)

2.1 Backend API Server (Hono)

  • Create Hono.js backend server (src/web/image-api-server.ts)
    • Set up Hono app with CORS middleware
    • Implement /api/images/create endpoint
    • Implement /api/images/edit endpoint
    • Add request validation and error handling
    • Add rate limiting middleware
    • Add API key validation
    • Add logging and monitoring

2.2 Web Frontend Client

  • Create web image client (src/web/image-client.ts)
    • Implement WebImageGenerator class
    • Add fetch-based API communication
    • Handle file uploads and base64 conversion
    • Add progress tracking for large requests
    • Implement retry logic for failed requests

2.3 Web Configuration Management

  • Add web-specific configuration (src/web/config.ts)
    • Create configurable API endpoints
    • Add environment variable support
    • Implement CORS configuration
    • Add deployment-specific settings

Phase 3: Unified Interface (Priority: Medium)

3.1 Factory Pattern Implementation

  • Create image generator factory (src/lib/image-generator-factory.ts)
    • Implement platform detection logic
    • Create unified interface for all platforms
    • Add adapter classes for each platform
    • Implement configuration-based generator selection

3.2 Platform Adapters

  • Create platform adapters
    • CLIImageGeneratorAdapter - wrap existing CLI implementation
    • MobileImageGeneratorAdapter - wrap mobile HTTP client
    • WebImageGeneratorAdapter - wrap web API client
    • Normalize return types (Buffer vs Blob handling)

Phase 4: Testing & Quality Assurance (Priority: High)

4.1 Unit Tests

  • Write comprehensive unit tests
    • Test factory pattern and platform detection
    • Test each adapter class individually
    • Mock HTTP requests for mobile/web testing
    • Test error handling scenarios
    • Test configuration loading and validation

4.2 Integration Tests

  • Create integration test suite
    • Test CLI-to-Tauri communication (existing)
    • Test mobile HTTP API calls with mock server
    • Test web frontend-backend communication
    • Test cross-platform image format compatibility
    • Test API key management across platforms

4.3 Platform-Specific Testing

  • Mobile testing

    • Test on Android emulator/device
    • Test on iOS simulator/device
    • Test network connectivity edge cases
    • Test file system permissions
    • Performance testing with large images
  • Web testing

    • Test CORS configuration
    • Test different browsers (Chrome, Firefox, Safari)
    • Test file upload limits
    • Test API server deployment
    • Load testing for concurrent requests

Phase 5: Deployment & Distribution (Priority: Low)

5.1 Mobile Deployment

  • Set up mobile build pipeline
    • Configure Android build (APK/AAB)
    • Configure iOS build (IPA)
    • Set up code signing for both platforms
    • Create app store metadata and screenshots
    • Test installation and updates

5.2 Web Deployment

  • Deploy web application
    • Set up frontend hosting (Vercel/Netlify)
    • Deploy backend API server
    • Configure domain and SSL certificates
    • Set up monitoring and logging
    • Configure CDN for static assets

5.3 Documentation & Guides

  • Create user documentation
    • Platform-specific installation guides
    • API configuration instructions
    • Troubleshooting guides
    • Performance optimization tips
    • Security best practices

Phase 6: Advanced Features (Priority: Low)

6.1 Performance Optimizations

  • Implement performance improvements
    • Image compression before API calls
    • Request batching for multiple images
    • Caching layer for repeated requests
    • Progressive image loading
    • Background processing for large operations

6.2 Enhanced Security

  • Add security enhancements
    • API key encryption at rest
    • Request signing for web API
    • Rate limiting per user/session
    • Input sanitization and validation
    • Audit logging for API calls

6.3 User Experience Improvements

  • Enhance user interface
    • Drag-and-drop file uploads
    • Real-time preview of edits
    • Batch processing interface
    • History and favorites management
    • Keyboard shortcuts and accessibility

Estimated Timeline

  • Phase 1 (Mobile): 2-3 weeks
  • Phase 2 (Web): 2-3 weeks
  • Phase 3 (Unified): 1 week
  • Phase 4 (Testing): 2 weeks
  • Phase 5 (Deployment): 1 week
  • Phase 6 (Advanced): 3-4 weeks

Total Estimated Time: 11-16 weeks

Dependencies & Prerequisites

Required Skills

  • TypeScript/JavaScript development
  • Tauri framework knowledge
  • React/frontend development
  • Hono.js/backend API development
  • Mobile app development (Android/iOS)
  • Google AI API integration

Required Tools

  • Node.js 18+
  • Rust toolchain
  • Android Studio (for Android builds)
  • Xcode (for iOS builds)
  • Bun runtime (for Hono server)

External Services

  • Google AI API access and billing
  • Cloud hosting for web backend
  • App store developer accounts (mobile)
  • Domain registration (web)