polymech/mono

Fork 0

babayaga 9284894589 images interface

2025-09-23 20:32:47 +02:00

27 KiB

Raw Permalink Blame History

Multi-Platform Image Generation Architecture

Overview

This document outlines the architecture for supporting image generation across multiple platforms:

CLI Desktop (current implementation) - Node.js CLI spawning Tauri GUI
Mobile (Android/iOS) - Standalone Tauri app with HTTP API calls
Web App - Browser-based application with configurable endpoints

Current Architecture (CLI Desktop)

Flow

CLI (images.ts) → Spawn Tauri Process → IPC Communication → Google AI API → Image Generation

Key Components

CLI Entry: src/commands/images.ts - Main command handler
Image Generation: src/lib/images-google.ts - Google Generative AI integration
Tauri GUI: gui/tauri-app/ - Desktop GUI application
IPC Bridge: Stdin/stdout communication between CLI and Tauri

Current Implementation Details

// CLI spawns Tauri process
const tauriProcess = spawn(guiAppPath, args, { stdio: ['pipe', 'pipe', 'pipe'] });

// Communication via JSON messages
const configResponse = {
    cmd: 'forward_config_to_frontend',
    prompt: argv.prompt || null,
    dst: argv.dst || null,
    apiKey: apiKey || null,
    files: absoluteIncludes
};

Platform-Specific Architectures

1. CLI Desktop (Current - Keep As-Is)

Pros:

Direct file system access
Native performance
Existing implementation works well

Architecture:

┌─────────────┐    ┌──────────────┐    ┌─────────────────┐
│   CLI App   │───▶│  Tauri GUI   │───▶│  Google AI API  │
│ (images.ts) │    │   (Rust)     │    │   (Direct)      │
└─────────────┘    └──────────────┘    └─────────────────┘

2. Mobile (Android/iOS) - Standalone Tauri

Challenge: No CLI spawning capability on mobile Solution: Standalone Tauri app with HTTP client for API calls

Architecture:

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Tauri App     │───▶│   HTTP Client   │───▶│  Google AI API  │
│ (Standalone)    │    │ (tauri-plugin-  │    │   (via HTTP)    │
│                 │    │     http)       │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Implementation Strategy:

Option A: TypeScript Frontend HTTP (Recommended)

// src/lib/images-mobile.ts
import { tauriApi } from '../gui/tauri-app/src/lib/tauriApi';

export class MobileImageGenerator {
    private apiKey: string;
    private baseUrl = 'https://generativelanguage.googleapis.com/v1beta';

    constructor(apiKey: string) {
        this.apiKey = apiKey;
    }

    async createImage(prompt: string): Promise<Buffer> {
        const response = await tauriApi.fetch(`${this.baseUrl}/models/gemini-2.5-flash-image-preview:generateContent`, {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
                'Authorization': `Bearer ${this.apiKey}`
            },
            body: JSON.stringify({
                contents: [{
                    parts: [{ text: prompt }]
                }]
            })
        });

        const data = await response.json();
        const imageData = data.candidates[0].content.parts[0].inlineData.data;
        return Buffer.from(imageData, 'base64');
    }

    async editImage(prompt: string, imageFiles: File[]): Promise<Buffer> {
        const parts = [];
        
        // Add image parts
        for (const file of imageFiles) {
            const arrayBuffer = await file.arrayBuffer();
            const base64 = btoa(String.fromCharCode(...new Uint8Array(arrayBuffer)));
            parts.push({
                inlineData: {
                    mimeType: file.type,
                    data: base64
                }
            });
        }
        
        // Add text prompt
        parts.push({ text: prompt });

        const response = await tauriApi.fetch(`${this.baseUrl}/models/gemini-2.5-flash-image-preview:generateContent`, {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
                'Authorization': `Bearer ${this.apiKey}`
            },
            body: JSON.stringify({
                contents: [{ parts }]
            })
        });

        const data = await response.json();
        const imageData = data.candidates[0].content.parts[0].inlineData.data;
        return Buffer.from(imageData, 'base64');
    }
}

Mobile-Specific Tauri Configuration

// gui/tauri-app/src-tauri/tauri.conf.json (mobile additions)
{
  "plugins": {
    "http": {
      "all": true,
      "request": true,
      "scope": [
        "https://generativelanguage.googleapis.com/**"
      ]
    }
  },
  "security": {
    "csp": {
      "default-src": "'self'",
      "connect-src": "'self' https://generativelanguage.googleapis.com"
    }
  }
}

3. Web App - Browser-Based with Configurable Endpoints

Challenge: CORS restrictions, no direct Google AI API access Solution: Backend API server (Hono) + configurable endpoints

Architecture:

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Web App       │───▶│   Backend API   │───▶│  Google AI API  │
│ (React/TS)      │    │   (Hono.js)     │    │   (Server)      │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Backend API Server (Hono.js)

// src/web/image-api-server.ts
import { Hono } from 'hono';
import { cors } from 'hono/cors';
import { GoogleGenerativeAI } from '@google/generative-ai';

const app = new Hono();

app.use('/*', cors({
    origin: ['http://localhost:3000', 'https://your-domain.com'],
    allowHeaders: ['Content-Type', 'Authorization'],
    allowMethods: ['POST', 'GET', 'OPTIONS'],
}));

interface ImageRequest {
    prompt: string;
    images?: Array<{
        data: string; // base64
        mimeType: string;
    }>;
    apiKey: string;
    model?: string;
}

app.post('/api/images/create', async (c) => {
    try {
        const { prompt, apiKey, model = 'gemini-2.5-flash-image-preview' }: ImageRequest = await c.req.json();
        
        const genAI = new GoogleGenerativeAI(apiKey);
        const genModel = genAI.getGenerativeModel({ model });
        
        const result = await genModel.generateContent(prompt);
        const response = result.response;
        
        if (!response.candidates?.[0]?.content?.parts) {
            throw new Error('No image generated');
        }
        
        const imageData = response.candidates[0].content.parts.find(part => 
            'inlineData' in part
        )?.inlineData;
        
        if (!imageData) {
            throw new Error('No image data in response');
        }
        
        return c.json({
            success: true,
            image: {
                data: imageData.data,
                mimeType: imageData.mimeType
            }
        });
        
    } catch (error) {
        return c.json({
            success: false,
            error: error.message
        }, 500);
    }
});

app.post('/api/images/edit', async (c) => {
    try {
        const { prompt, images, apiKey, model = 'gemini-2.5-flash-image-preview' }: ImageRequest = await c.req.json();
        
        const genAI = new GoogleGenerativeAI(apiKey);
        const genModel = genAI.getGenerativeModel({ model });
        
        const parts = [];
        
        // Add image parts
        if (images) {
            for (const img of images) {
                parts.push({
                    inlineData: {
                        mimeType: img.mimeType,
                        data: img.data
                    }
                });
            }
        }
        
        // Add text prompt
        parts.push({ text: prompt });
        
        const result = await genModel.generateContent(parts);
        const response = result.response;
        
        if (!response.candidates?.[0]?.content?.parts) {
            throw new Error('No image generated');
        }
        
        const imageData = response.candidates[0].content.parts.find(part => 
            'inlineData' in part
        )?.inlineData;
        
        if (!imageData) {
            throw new Error('No image data in response');
        }
        
        return c.json({
            success: true,
            image: {
                data: imageData.data,
                mimeType: imageData.mimeType
            }
        });
        
    } catch (error) {
        return c.json({
            success: false,
            error: error.message
        }, 500);
    }
});

export default app;

// Server startup
if (import.meta.main) {
    const port = parseInt(process.env.PORT || '3001');
    console.log(`🚀 Image API server starting on port ${port}`);
    
    Bun.serve({
        fetch: app.fetch,
        port,
    });
}

Web Frontend Client

// src/web/image-client.ts
export interface WebImageConfig {
    apiEndpoint: string; // e.g., 'http://localhost:3001' or 'https://api.yourservice.com'
    apiKey: string;
}

export class WebImageGenerator {
    private config: WebImageConfig;

    constructor(config: WebImageConfig) {
        this.config = config;
    }

    async createImage(prompt: string): Promise<Blob> {
        const response = await fetch(`${this.config.apiEndpoint}/api/images/create`, {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
            },
            body: JSON.stringify({
                prompt,
                apiKey: this.config.apiKey
            })
        });

        if (!response.ok) {
            throw new Error(`HTTP error! status: ${response.status}`);
        }

        const data = await response.json();
        
        if (!data.success) {
            throw new Error(data.error || 'Unknown error');
        }

        // Convert base64 to blob
        const binaryString = atob(data.image.data);
        const bytes = new Uint8Array(binaryString.length);
        for (let i = 0; i < binaryString.length; i++) {
            bytes[i] = binaryString.charCodeAt(i);
        }
        
        return new Blob([bytes], { type: data.image.mimeType });
    }

    async editImage(prompt: string, imageFiles: File[]): Promise<Blob> {
        const images = [];
        
        for (const file of imageFiles) {
            const arrayBuffer = await file.arrayBuffer();
            const base64 = btoa(String.fromCharCode(...new Uint8Array(arrayBuffer)));
            images.push({
                data: base64,
                mimeType: file.type
            });
        }

        const response = await fetch(`${this.config.apiEndpoint}/api/images/edit`, {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
            },
            body: JSON.stringify({
                prompt,
                images,
                apiKey: this.config.apiKey
            })
        });

        if (!response.ok) {
            throw new Error(`HTTP error! status: ${response.status}`);
        }

        const data = await response.json();
        
        if (!data.success) {
            throw new Error(data.error || 'Unknown error');
        }

        // Convert base64 to blob
        const binaryString = atob(data.image.data);
        const bytes = new Uint8Array(binaryString.length);
        for (let i = 0; i < binaryString.length; i++) {
            bytes[i] = binaryString.charCodeAt(i);
        }
        
        return new Blob([bytes], { type: data.image.mimeType });
    }
}

Web App Configuration

// src/web/config.ts
export interface PlatformConfig {
    platform: 'cli' | 'mobile' | 'web';
    
    // Web-specific config
    web?: {
        apiEndpoint: string;
        corsEnabled: boolean;
        allowedOrigins: string[];
    };
    
    // Mobile-specific config  
    mobile?: {
        directApiAccess: boolean;
        cacheImages: boolean;
        maxImageSize: number;
    };
    
    // CLI-specific config (existing)
    cli?: {
        guiEnabled: boolean;
        tempDir: string;
    };
}

export const getDefaultConfig = (): PlatformConfig => {
    // Detect platform
    const isTauri = !!(window as any).__TAURI__;
    const isMobile = isTauri && /Android|iPhone|iPad|iPod|BlackBerry|IEMobile|Opera Mini/i.test(navigator.userAgent);
    const isWeb = !isTauri;
    
    if (isMobile) {
        return {
            platform: 'mobile',
            mobile: {
                directApiAccess: true,
                cacheImages: true,
                maxImageSize: 5 * 1024 * 1024 // 5MB
            }
        };
    } else if (isWeb) {
        return {
            platform: 'web',
            web: {
                apiEndpoint: process.env.REACT_APP_API_ENDPOINT || 'http://localhost:3001',
                corsEnabled: true,
                allowedOrigins: ['http://localhost:3000']
            }
        };
    } else {
        return {
            platform: 'cli',
            cli: {
                guiEnabled: true,
                tempDir: process.env.TEMP || '/tmp'
            }
        };
    }
};

Platform Detection & Unified Interface

// src/lib/image-generator-factory.ts
import { WebImageGenerator } from '../web/image-client';
import { MobileImageGenerator } from './images-mobile';
import { createImage, editImage } from './images-google'; // CLI version
import { getDefaultConfig, PlatformConfig } from '../web/config';

export interface UnifiedImageGenerator {
    createImage(prompt: string): Promise<Buffer | Blob>;
    editImage(prompt: string, images: File[] | string[]): Promise<Buffer | Blob>;
}

export class ImageGeneratorFactory {
    static create(config?: PlatformConfig): UnifiedImageGenerator {
        const platformConfig = config || getDefaultConfig();
        
        switch (platformConfig.platform) {
            case 'web':
                return new WebImageGeneratorAdapter(
                    new WebImageGenerator({
                        apiEndpoint: platformConfig.web!.apiEndpoint,
                        apiKey: '' // Will be set later
                    })
                );
                
            case 'mobile':
                return new MobileImageGeneratorAdapter(
                    new MobileImageGenerator('') // API key set later
                );
                
            case 'cli':
            default:
                return new CLIImageGeneratorAdapter();
        }
    }
}

// Adapters to normalize the interface
class WebImageGeneratorAdapter implements UnifiedImageGenerator {
    constructor(private generator: WebImageGenerator) {}
    
    async createImage(prompt: string): Promise<Blob> {
        return this.generator.createImage(prompt);
    }
    
    async editImage(prompt: string, images: File[]): Promise<Blob> {
        return this.generator.editImage(prompt, images);
    }
}

class MobileImageGeneratorAdapter implements UnifiedImageGenerator {
    constructor(private generator: MobileImageGenerator) {}
    
    async createImage(prompt: string): Promise<Buffer> {
        return this.generator.createImage(prompt);
    }
    
    async editImage(prompt: string, images: File[]): Promise<Buffer> {
        return this.generator.editImage(prompt, images);
    }
}

class CLIImageGeneratorAdapter implements UnifiedImageGenerator {
    async createImage(prompt: string): Promise<Buffer> {
        // Use existing CLI implementation
        return createImage(prompt, {} as any) as Promise<Buffer>;
    }
    
    async editImage(prompt: string, images: string[]): Promise<Buffer> {
        // Use existing CLI implementation  
        return editImage(prompt, images, {} as any) as Promise<Buffer>;
    }
}

Required Dependencies

CLI (Existing)

{
  "dependencies": {
    "@google/generative-ai": "^0.21.0",
    "tauri": "^2.0.0"
  }
}

Mobile (Tauri)

{
  "dependencies": {
    "@tauri-apps/plugin-http": "^2.0.0",
    "@tauri-apps/api": "^2.0.0"
  }
}

Web Backend (Hono)

{
  "dependencies": {
    "hono": "^4.0.0",
    "@google/generative-ai": "^0.21.0",
    "bun": "^1.0.0"
  }
}

Web Frontend

{
  "dependencies": {
    "react": "^18.0.0",
    "@types/react": "^18.0.0"
  }
}

Deployment Strategies

CLI Desktop

Current: Nexe bundling with Tauri executable
Distribution: GitHub releases with platform-specific binaries

Mobile

Android: APK via Tauri build system
iOS: App Store via Tauri + Xcode
Distribution: App stores or direct APK/IPA

Web App

Frontend: Static hosting (Vercel, Netlify, Cloudflare Pages)
Backend:
- Option 1: Bun/Node.js server (Railway, Render, DigitalOcean)
- Option 2: Serverless functions (Vercel Functions, Cloudflare Workers)
- Option 3: Docker containers (any cloud provider)

Migration Path

Phase 1: Maintain CLI (Current)

Keep existing CLI implementation
No changes to current workflow

Phase 2: Add Mobile Support

Implement MobileImageGenerator class
Add HTTP client configuration
Test on Android/iOS simulators

Phase 3: Add Web Support

Create Hono backend API
Implement web frontend client
Add configuration management

Phase 4: Unified Interface

Implement factory pattern
Add platform detection
Create unified API surface

Security Considerations

API Key Management

CLI: Local config files, environment variables
Mobile: Secure storage via Tauri
Web: Backend-only, never expose to frontend

CORS & CSP

Web: Strict CORS policies, CSP headers
Mobile: Tauri security policies
CLI: Not applicable (local execution)

Rate Limiting

All Platforms: Implement client-side rate limiting
Web: Server-side rate limiting per IP/user

Testing Strategy

Unit Tests

// tests/image-generator.test.ts
import { ImageGeneratorFactory } from '../src/lib/image-generator-factory';

describe('ImageGenerator', () => {
    test('CLI platform creates correct generator', () => {
        const generator = ImageGeneratorFactory.create({ platform: 'cli' });
        expect(generator).toBeInstanceOf(CLIImageGeneratorAdapter);
    });
    
    test('Web platform creates correct generator', () => {
        const generator = ImageGeneratorFactory.create({ 
            platform: 'web',
            web: { apiEndpoint: 'http://test.com', corsEnabled: true, allowedOrigins: [] }
        });
        expect(generator).toBeInstanceOf(WebImageGeneratorAdapter);
    });
});

Integration Tests

CLI: Test Tauri process spawning
Mobile: Test HTTP API calls with mock server
Web: Test full frontend-backend flow

Performance Considerations

Image Handling

CLI: Direct file system access (fastest)
Mobile: In-memory processing, consider caching
Web: Base64 encoding overhead, consider streaming

Network Optimization

Mobile: Implement request queuing, retry logic
Web: Connection pooling, request batching

Memory Management

All Platforms: Stream large images, avoid loading entire files into memory
Mobile: Implement image compression before API calls

Implementation Todo List

Phase 1: Mobile Platform Support (Priority: High)

1.1 Mobile HTTP Client Implementation

Create mobile image generator class (src/lib/images-mobile.ts)
- Implement MobileImageGenerator class with HTTP client
- Add TypeScript fetch wrapper using tauriApi.fetch
- Handle Google AI API authentication and requests
- Add error handling for network failures and API errors
- Implement image creation endpoint integration
- Implement image editing endpoint integration

1.2 Mobile Tauri Configuration

Update Tauri config for mobile HTTP access
- Add tauri-plugin-http to dependencies
- Configure HTTP scope for Google AI API endpoints
- Update CSP policies for external API access
- Test HTTP plugin functionality on mobile simulators

1.3 Mobile Platform Detection

Add mobile platform detection logic
- Detect Android/iOS in Tauri environment
- Create mobile-specific configuration defaults
- Add mobile UI adaptations (touch-friendly controls)
- Implement mobile-specific file handling

Phase 2: Web Platform Support (Priority: Medium)

2.1 Backend API Server (Hono)

Create Hono.js backend server (src/web/image-api-server.ts)
- Set up Hono app with CORS middleware
- Implement /api/images/create endpoint
- Implement /api/images/edit endpoint
- Add request validation and error handling
- Add rate limiting middleware
- Add API key validation
- Add logging and monitoring

2.2 Web Frontend Client

Create web image client (src/web/image-client.ts)
- Implement WebImageGenerator class
- Add fetch-based API communication
- Handle file uploads and base64 conversion
- Add progress tracking for large requests
- Implement retry logic for failed requests

2.3 Web Configuration Management

Add web-specific configuration (src/web/config.ts)
- Create configurable API endpoints
- Add environment variable support
- Implement CORS configuration
- Add deployment-specific settings

Phase 3: Unified Interface (Priority: Medium)

3.1 Factory Pattern Implementation

Create image generator factory (src/lib/image-generator-factory.ts)
- Implement platform detection logic
- Create unified interface for all platforms
- Add adapter classes for each platform
- Implement configuration-based generator selection

3.2 Platform Adapters

Create platform adapters
- CLIImageGeneratorAdapter - wrap existing CLI implementation
- MobileImageGeneratorAdapter - wrap mobile HTTP client
- WebImageGeneratorAdapter - wrap web API client
- Normalize return types (Buffer vs Blob handling)

Phase 4: Testing & Quality Assurance (Priority: High)

4.1 Unit Tests

Write comprehensive unit tests
- Test factory pattern and platform detection
- Test each adapter class individually
- Mock HTTP requests for mobile/web testing
- Test error handling scenarios
- Test configuration loading and validation

4.2 Integration Tests

Create integration test suite
- Test CLI-to-Tauri communication (existing)
- Test mobile HTTP API calls with mock server
- Test web frontend-backend communication
- Test cross-platform image format compatibility
- Test API key management across platforms

4.3 Platform-Specific Testing

Mobile testing
- Test on Android emulator/device
- Test on iOS simulator/device
- Test network connectivity edge cases
- Test file system permissions
- Performance testing with large images
Web testing
- Test CORS configuration
- Test different browsers (Chrome, Firefox, Safari)
- Test file upload limits
- Test API server deployment
- Load testing for concurrent requests

Phase 5: Deployment & Distribution (Priority: Low)

5.1 Mobile Deployment

Set up mobile build pipeline
- Configure Android build (APK/AAB)
- Configure iOS build (IPA)
- Set up code signing for both platforms
- Create app store metadata and screenshots
- Test installation and updates

5.2 Web Deployment

Deploy web application
- Set up frontend hosting (Vercel/Netlify)
- Deploy backend API server
- Configure domain and SSL certificates
- Set up monitoring and logging
- Configure CDN for static assets

5.3 Documentation & Guides

Create user documentation
- Platform-specific installation guides
- API configuration instructions
- Troubleshooting guides
- Performance optimization tips
- Security best practices

Phase 6: Advanced Features (Priority: Low)

6.1 Performance Optimizations

Implement performance improvements
- Image compression before API calls
- Request batching for multiple images
- Caching layer for repeated requests
- Progressive image loading
- Background processing for large operations

6.2 Enhanced Security

Add security enhancements
- API key encryption at rest
- Request signing for web API
- Rate limiting per user/session
- Input sanitization and validation
- Audit logging for API calls

6.3 User Experience Improvements

Enhance user interface
- Drag-and-drop file uploads
- Real-time preview of edits
- Batch processing interface
- History and favorites management
- Keyboard shortcuts and accessibility

Estimated Timeline

Phase 1 (Mobile): 2-3 weeks
Phase 2 (Web): 2-3 weeks
Phase 3 (Unified): 1 week
Phase 4 (Testing): 2 weeks
Phase 5 (Deployment): 1 week
Phase 6 (Advanced): 3-4 weeks

Total Estimated Time: 11-16 weeks

Dependencies & Prerequisites

Required Skills

TypeScript/JavaScript development
Tauri framework knowledge
React/frontend development
Hono.js/backend API development
Mobile app development (Android/iOS)
Google AI API integration

Required Tools

Node.js 18+
Rust toolchain
Android Studio (for Android builds)
Xcode (for iOS builds)
Bun runtime (for Hono server)

External Services

Google AI API access and billing
Cloud hosting for web backend
App store developer accounts (mobile)
Domain registration (web)

27 KiB Raw Permalink Blame History

Multi-Platform Image Generation Architecture

Overview

Current Architecture (CLI Desktop)

Flow

Key Components

Current Implementation Details

Platform-Specific Architectures

1. CLI Desktop (Current - Keep As-Is)

2. Mobile (Android/iOS) - Standalone Tauri

Option A: TypeScript Frontend HTTP (Recommended)

Mobile-Specific Tauri Configuration

3. Web App - Browser-Based with Configurable Endpoints

Backend API Server (Hono.js)

Web Frontend Client

Web App Configuration

Platform Detection & Unified Interface

Required Dependencies

CLI (Existing)

Mobile (Tauri)

Web Backend (Hono)

Web Frontend

Deployment Strategies

CLI Desktop

Mobile

Web App

Migration Path

Phase 1: Maintain CLI (Current)

Phase 2: Add Mobile Support

Phase 3: Add Web Support

Phase 4: Unified Interface

Security Considerations

API Key Management

CORS & CSP

Rate Limiting

Testing Strategy

Unit Tests

Integration Tests

Performance Considerations

Image Handling

Network Optimization

Memory Management

Implementation Todo List

Phase 1: Mobile Platform Support (Priority: High)

1.1 Mobile HTTP Client Implementation

1.2 Mobile Tauri Configuration

1.3 Mobile Platform Detection

Phase 2: Web Platform Support (Priority: Medium)

2.1 Backend API Server (Hono)

2.2 Web Frontend Client

2.3 Web Configuration Management

Phase 3: Unified Interface (Priority: Medium)

3.1 Factory Pattern Implementation

3.2 Platform Adapters

Phase 4: Testing & Quality Assurance (Priority: High)

4.1 Unit Tests

4.2 Integration Tests

4.3 Platform-Specific Testing

Phase 5: Deployment & Distribution (Priority: Low)

5.1 Mobile Deployment

5.2 Web Deployment

5.3 Documentation & Guides

Phase 6: Advanced Features (Priority: Low)

6.1 Performance Optimizations

6.2 Enhanced Security

6.3 User Experience Improvements

Estimated Timeline

Dependencies & Prerequisites

Required Skills

Required Tools

External Services

27 KiB

Raw Permalink Blame History