mono/packages/kbot/docs/ipc.md
2025-09-17 19:41:09 +02:00

7.3 KiB

IPC Communication Documentation

Overview

This document describes the Inter-Process Communication (IPC) system between the images.ts command and the Tauri GUI application.

Current Architecture

Components

  1. images.ts - Node.js CLI command process
  2. tauri-app.exe - Tauri desktop application (Rust + Web frontend)
  3. IPC Client - Node.js library for managing communication
  4. Tauri Commands - Rust functions exposed to frontend
  5. React Frontend - TypeScript/React UI

Communication Flows

1. Initial Configuration Passing

sequenceDiagram
    participant CLI as images.ts CLI
    participant IPC as IPC Client
    participant Tauri as tauri-app.exe
    participant Frontend as React Frontend
    participant Rust as Tauri Rust Backend

    CLI->>IPC: createIPCClient()
    CLI->>IPC: launch([])
    IPC->>Tauri: spawn tauri-app.exe
    
    Note over CLI,Rust: Initial data sending
    CLI->>IPC: sendInitData(prompt, dst, apiKey, files)
    IPC->>Tauri: stdout: {"type":"init_data","data":{...}}
    Tauri->>Frontend: IPC message handling
    Frontend->>Frontend: setPrompt(), setDst(), setApiKey()
    
    Note over CLI,Rust: Image data sending
    CLI->>IPC: sendImageMessage(base64, mimeType, filename)
    IPC->>Tauri: stdout: {"type":"image","data":{...}}
    Tauri->>Frontend: IPC message handling
    Frontend->>Frontend: addFiles([{path, src}])

2. GUI to CLI Messaging (Current Implementation)

sequenceDiagram
    participant Frontend as React Frontend
    participant Rust as Tauri Rust Backend
    participant Tauri as tauri-app.exe
    participant IPC as IPC Client
    participant CLI as images.ts CLI

    Note over Frontend,CLI: User sends message from GUI
    Frontend->>Frontend: sendMessageToImages()
    Frontend->>Rust: safeInvoke('send_message_to_stdout', message)
    Rust->>Rust: send_message_to_stdout command
    Rust->>Tauri: println!(message) to stdout
    Tauri->>IPC: stdout data received
    IPC->>IPC: parse JSON from stdout
    IPC->>CLI: handleMessage() callback
    CLI->>CLI: gui_message handler
    
    Note over Frontend,CLI: Echo response
    CLI->>IPC: sendDebugMessage('Echo: ...')
    IPC->>Tauri: stdout: {"type":"debug","data":{...}}
    Tauri->>Frontend: IPC message handling
    Frontend->>Frontend: addDebugMessage()

3. Console Message Forwarding

sequenceDiagram
    participant Frontend as React Frontend
    participant Console as Console Hijack
    participant Rust as Tauri Rust Backend
    participant CLI as images.ts CLI

    Note over Frontend,CLI: Console messages forwarding
    Frontend->>Console: console.log/error/warn()
    Console->>Console: hijacked in main.tsx
    Console->>Rust: safeInvoke('log_error_to_console')
    Rust->>Rust: log_error_to_console command
    Rust->>CLI: eprintln! to stderr
    CLI->>CLI: stderr logging

Current Issues & Complexity

Problem 1: Multiple Communication Channels

We have 3 different communication paths:

  1. IPC Messages (structured): {"type": "init_data", "data": {...}}
  2. Raw GUI Messages (via Tauri command): {"message": "hello", "source": "gui"}
  3. Console Forwarding (via hijacking): All console.* calls

Problem 2: Inconsistent Message Formats

  • From CLI to GUI: Structured IPC messages
  • From GUI to CLI: Raw JSON via stdout
  • Console logs: String messages via stderr

Problem 3: Complex Parsing Logic

The IPC client has to handle multiple message formats:

// Structured IPC message
if (parsed.type && parsed.data !== undefined) {
    this.handleMessage(parsed as IPCMessage);
}
// Raw GUI message  
else if (parsed.message && parsed.source === 'gui') {
    const ipcMessage: IPCMessage = {
        type: 'gui_message',
        data: parsed,
        // ...
    };
    this.handleMessage(ipcMessage);
}

Option 1: Unified IPC Messages

All communication should use the same format:

interface IPCMessage {
    type: 'init_data' | 'gui_message' | 'debug' | 'image' | 'prompt_submit';
    data: any;
    timestamp: number;
    id: string;
}

Sequence:

sequenceDiagram
    participant Frontend as React Frontend
    participant Rust as Tauri Rust Backend
    participant CLI as images.ts CLI

    Note over Frontend,CLI: Unified messaging
    Frontend->>Rust: safeInvoke('send_ipc_message', {type, data})
    Rust->>CLI: stdout: {"type":"gui_message","data":{...},"timestamp":...}
    CLI->>Rust: stdout: {"type":"debug","data":{...},"timestamp":...}
    Rust->>Frontend: handleMessage(message)

Use Tauri's built-in event system:

sequenceDiagram
    participant Frontend as React Frontend
    participant Rust as Tauri Rust Backend
    participant CLI as images.ts CLI

    Note over Frontend,CLI: Tauri events
    Frontend->>Rust: emit('gui-message', data)
    Rust->>CLI: HTTP/WebSocket/Named Pipe
    CLI->>Rust: HTTP/WebSocket/Named Pipe response
    Rust->>Frontend: emit('cli-response', data)

Current File Structure

src/
├── lib/ipc.ts              # IPC Client (Node.js side)
├── commands/images.ts       # CLI command with IPC integration
gui/tauri-app/
├── src/App.tsx             # React frontend with IPC handling
├── src/main.tsx            # Console hijacking setup
└── src-tauri/src/lib.rs    # Tauri commands and state management

Configuration Passing Methods

Method 1: CLI Arguments (Original)

tauri-app.exe --api-key "key" --dst "output.png" --prompt "text" file1.png file2.png

Method 2: IPC Messages (Current)

ipcClient.sendInitData(prompt, dst, apiKey, files);

Method 3: Environment Variables

export API_KEY="key"
export DST="output.png"
tauri-app.exe

Method 4: Temporary Config File

// Write config.json
fs.writeFileSync('/tmp/config.json', JSON.stringify({prompt, dst, apiKey}));
// Launch app
spawn('tauri-app.exe', ['--config', '/tmp/config.json']);

Recommendations

  1. Simplify to single communication method - Either all CLI args OR all IPC messages
  2. Remove console hijacking - Use proper logging/debug channels
  3. Use consistent message format - Same structure for all message types
  4. Consider Tauri's built-in IPC - Events, commands, or invoke system
  5. Separate concerns - Config passing vs. runtime messaging

Questions for Review

  1. Do we need bidirectional messaging during runtime, or just initial config passing?
  2. Should console messages be forwarded, or use proper debug channels?
  3. Is the complexity worth it, or should we use simpler CLI args + file output?
  4. Could we use Tauri's built-in event system instead of stdout parsing?

Current Status

  • Config passing works (init_data messages)
  • Image passing works (base64 via IPC)
  • GUI → CLI messaging works (via Tauri command)
  • CLI → GUI messaging works (debug messages)
  • System is overly complex with multiple communication paths
  • Inconsistent message formats
  • Console hijacking adds unnecessary complexity