mono/packages/kbot/docs_/iterator.md

9.5 KiB

Iterator Documentation

The Iterator module provides a powerful way to transform data structures using asynchronous operations, particularly suited for applying LLM-based transformations to JSON data. This document covers the core functionality, usage patterns, and examples.

Overview

The Iterator module allows you to:

  1. Define mappings between JSON paths and transformations
  2. Apply transformations in place or to new fields
  3. Customize filtering, error handling, and concurrency
  4. Chain multiple transformations together

Example Implementations

For complete working examples, see:

These examples demonstrate transforming a sample product dataset with various JSONPath expressions and LLM-powered transformations.

Core Components

AsyncTransformer

The fundamental unit that applies transformations:

type AsyncTransformer = (input: string, path: string) => Promise<string>

Every transformer takes a string input and its path in the JSON structure, then returns a transformed string.

Field Mappings

Field mappings define which parts of the data to transform and how:

interface FieldMapping {
    jsonPath: string       // JSONPath expression to find values
    targetPath?: string    // Optional target field for transformed values
    options?: IKBotTask    // Options for the transformation
}

Basic Usage

Creating an Iterator

import { createIterator } from '@polymech/kbot'

// Create an iterator instance
const iterator = createIterator(
    data,                  // The data to transform
    globalOptionsMixin,    // Global options for all transformations
    {
        throttleDelay: 1000,
        concurrentTasks: 1,
        errorCallback: (path, value, error) => console.error(`Error at ${path}: ${error.message}`),
        filterCallback: async () => true,
        transformerFactory: createCustomTransformer
    }
)

// Define field mappings
const mappings = [
    {
        jsonPath: '$.products.*.name',
        targetPath: null,  // Transform in place
        options: {
            prompt: 'Make this product name more appealing'
        }
    }
]

// Apply transformations
await iterator.transform(mappings)

JSONPath Patterns

The Iterator uses JSONPath to identify fields for transformation:

  • $..name - All name fields at any level
  • $.products..name - All name fields under the products key
  • $.products.*.*.name - Names of items in product categories
  • $[*].description - All description fields at the first level

Advanced Usage

Custom Transformers

Create custom transformers for specific transformation logic:

const createCustomTransformer = (options: IKBotTask): AsyncTransformer => {
    return async (input: string, jsonPath: string): Promise<string> => {
        // Transform the input string based on options
        return transformedValue
    }
}

In-Place vs. Target Field Transformations

In-Place Transformation

To transform values in place, set targetPath to null:

{
    jsonPath: '$.products.*.*.description',
    targetPath: null,
    options: {
        prompt: 'Make this description more engaging'
    }
}

This will replace the original description with the transformed value.

Adding New Fields

To keep the original value and add a transformed version, specify a targetPath:

{
    jsonPath: '$.products.*.*.name',
    targetPath: 'marketingName',
    options: {
        prompt: 'Generate a marketing name based on this product'
    }
}

This keeps the original name and adds a new marketingName field.

Filtering

Filters determine which values should be transformed:

// Default filters that skip numbers, booleans, and empty strings
const defaultFilters = [isNumber, isBoolean, isValidString]

// Custom filter example
const skipFirstItem: FilterCallback = async (input, path) => {
    return !path.includes('[0]')
}

Throttling and Concurrency

Control API rate limits and parallel processing:

{
    throttleDelay: 1000,    // Milliseconds between requests
    concurrentTasks: 2      // Number of parallel transformations
}

Complete Example

Here's a complete example of transforming product data using LLM:

import { createIterator, FieldMapping } from '@polymech/kbot'

async function transformProducts() {
    // Product data
    const data = {
        products: {
            fruits: [
                {
                    id: 'f1',
                    name: 'apple',
                    description: 'A sweet fruit',
                },
                {
                    id: 'f2',
                    name: 'banana',
                    description: 'A yellow fruit',
                }
            ]
        }
    }

    // Create a transformer factory
    const createLLMTransformer = (options): AsyncTransformer => {
        return async (input, path) => {
            // Call LLM API with input and options.prompt
            // Return the LLM response
            console.log(`Transforming ${path}: ${input}`)
            return `Enhanced: ${input}` // Simulated response
        }
    }

    // Global options
    const globalOptions = {
        model: 'openai/gpt-4',
        mode: 'completion'
    }

    // Create iterator
    const iterator = createIterator(
        data,
        globalOptions,
        {
            throttleDelay: 1000,
            concurrentTasks: 1,
            errorCallback: (path, value, error) => console.error(`Error: ${error.message}`),
            filterCallback: async () => true,
            transformerFactory: createLLMTransformer
        }
    )

    // Define transformations
    const mappings: FieldMapping[] = [
        {
            jsonPath: '$.products.fruits.*.description',
            targetPath: null,
            options: {
                prompt: 'Make this description more detailed'
            }
        },
        {
            jsonPath: '$.products.fruits.*.name',
            targetPath: 'marketingName',
            options: {
                prompt: 'Generate a marketing name for this product'
            }
        }
    ]

    // Apply transformations
    await iterator.transform(mappings)

    // Output the transformed data
    console.log(JSON.stringify(data, null, 2))
}

Best Practices

  1. Be specific with JSONPaths: Use precise JSONPath expressions to target only the fields you want to transform.

  2. Handle errors gracefully: Provide an error callback to handle failed transformations without breaking the entire process.

  3. Respect rate limits: Set appropriate throttle delays when working with external APIs.

  4. Test with small datasets first: Validate your transformations on a smaller subset before processing large datasets.

  5. Prefer targeted transformations: Transform only what you need to minimize costs and processing time.

API Reference

Main Functions

  • createIterator(data, optionsMixin, globalOptions): Creates an iterator instance
  • transformObjectWithOptions(obj, transform, options): Low-level function to transform objects
  • transformObject(obj, transform, path, ...): Transforms matching paths in an object

Helper Functions

  • testFilters(filters): Creates a filter callback from filter functions
  • defaultFilters(): Returns commonly used filters
  • defaultError: Default error handler that logs to console

Types and Interfaces

  • AsyncTransformer: Function that transforms strings asynchronously
  • FilterCallback: Function that determines if a value should be transformed
  • ErrorCallback: Function that handles transformation errors
  • FieldMapping: Configuration for a transformation
  • TransformOptions: Options for the transformation process

Limitations

  1. The Iterator works with string values; objects and arrays are traversed but not directly transformed.
  2. Large datasets might require pagination or chunking for efficient processing.
  3. External API rate limits might require careful throttling configuration.

Troubleshooting

Common issues and solutions:

  • No transformations occurring: Check your JSONPath expressions and filter conditions
  • Unexpected field structure: Examine the exact structure of your data
  • Rate limiting errors: Increase the throttleDelay between requests
  • Transformation errors: Implement a custom error callback for detailed logging

Running the Examples

To run the included examples:

# Run the basic async iterator example
npm run examples:async-iterator

# Run the iterator factory example
npm run examples:iterator-factory

# Run with debug logging
npm run examples:async-iterator -- --debug

The examples will transform sample JSON data and save the results to the tests/test-data/core/ directory.

See Also