9.5 KiB
Iterator Documentation
The Iterator module provides a powerful way to transform data structures using asynchronous operations, particularly suited for applying LLM-based transformations to JSON data. This document covers the core functionality, usage patterns, and examples.
Overview
The Iterator module allows you to:
- Define mappings between JSON paths and transformations
- Apply transformations in place or to new fields
- Customize filtering, error handling, and concurrency
- Chain multiple transformations together
Example Implementations
For complete working examples, see:
async-iterator-example.ts: Shows basic data transformation with LLM and targetPath usageiterator-factory-example.ts: Demonstrates factory pattern with multiple field transformations
These examples demonstrate transforming a sample product dataset with various JSONPath expressions and LLM-powered transformations.
Core Components
AsyncTransformer
The fundamental unit that applies transformations:
type AsyncTransformer = (input: string, path: string) => Promise<string>
Every transformer takes a string input and its path in the JSON structure, then returns a transformed string.
Field Mappings
Field mappings define which parts of the data to transform and how:
interface FieldMapping {
jsonPath: string // JSONPath expression to find values
targetPath?: string // Optional target field for transformed values
options?: IKBotTask // Options for the transformation
}
Basic Usage
Creating an Iterator
import { createIterator } from '@polymech/kbot'
// Create an iterator instance
const iterator = createIterator(
data, // The data to transform
globalOptionsMixin, // Global options for all transformations
{
throttleDelay: 1000,
concurrentTasks: 1,
errorCallback: (path, value, error) => console.error(`Error at ${path}: ${error.message}`),
filterCallback: async () => true,
transformerFactory: createCustomTransformer
}
)
// Define field mappings
const mappings = [
{
jsonPath: '$.products.*.name',
targetPath: null, // Transform in place
options: {
prompt: 'Make this product name more appealing'
}
}
]
// Apply transformations
await iterator.transform(mappings)
JSONPath Patterns
The Iterator uses JSONPath to identify fields for transformation:
$..name- All name fields at any level$.products..name- All name fields under the products key$.products.*.*.name- Names of items in product categories$[*].description- All description fields at the first level
Advanced Usage
Custom Transformers
Create custom transformers for specific transformation logic:
const createCustomTransformer = (options: IKBotTask): AsyncTransformer => {
return async (input: string, jsonPath: string): Promise<string> => {
// Transform the input string based on options
return transformedValue
}
}
In-Place vs. Target Field Transformations
In-Place Transformation
To transform values in place, set targetPath to null:
{
jsonPath: '$.products.*.*.description',
targetPath: null,
options: {
prompt: 'Make this description more engaging'
}
}
This will replace the original description with the transformed value.
Adding New Fields
To keep the original value and add a transformed version, specify a targetPath:
{
jsonPath: '$.products.*.*.name',
targetPath: 'marketingName',
options: {
prompt: 'Generate a marketing name based on this product'
}
}
This keeps the original name and adds a new marketingName field.
Filtering
Filters determine which values should be transformed:
// Default filters that skip numbers, booleans, and empty strings
const defaultFilters = [isNumber, isBoolean, isValidString]
// Custom filter example
const skipFirstItem: FilterCallback = async (input, path) => {
return !path.includes('[0]')
}
Throttling and Concurrency
Control API rate limits and parallel processing:
{
throttleDelay: 1000, // Milliseconds between requests
concurrentTasks: 2 // Number of parallel transformations
}
Complete Example
Here's a complete example of transforming product data using LLM:
import { createIterator, FieldMapping } from '@polymech/kbot'
async function transformProducts() {
// Product data
const data = {
products: {
fruits: [
{
id: 'f1',
name: 'apple',
description: 'A sweet fruit',
},
{
id: 'f2',
name: 'banana',
description: 'A yellow fruit',
}
]
}
}
// Create a transformer factory
const createLLMTransformer = (options): AsyncTransformer => {
return async (input, path) => {
// Call LLM API with input and options.prompt
// Return the LLM response
console.log(`Transforming ${path}: ${input}`)
return `Enhanced: ${input}` // Simulated response
}
}
// Global options
const globalOptions = {
model: 'openai/gpt-4',
mode: 'completion'
}
// Create iterator
const iterator = createIterator(
data,
globalOptions,
{
throttleDelay: 1000,
concurrentTasks: 1,
errorCallback: (path, value, error) => console.error(`Error: ${error.message}`),
filterCallback: async () => true,
transformerFactory: createLLMTransformer
}
)
// Define transformations
const mappings: FieldMapping[] = [
{
jsonPath: '$.products.fruits.*.description',
targetPath: null,
options: {
prompt: 'Make this description more detailed'
}
},
{
jsonPath: '$.products.fruits.*.name',
targetPath: 'marketingName',
options: {
prompt: 'Generate a marketing name for this product'
}
}
]
// Apply transformations
await iterator.transform(mappings)
// Output the transformed data
console.log(JSON.stringify(data, null, 2))
}
Best Practices
-
Be specific with JSONPaths: Use precise JSONPath expressions to target only the fields you want to transform.
-
Handle errors gracefully: Provide an error callback to handle failed transformations without breaking the entire process.
-
Respect rate limits: Set appropriate throttle delays when working with external APIs.
-
Test with small datasets first: Validate your transformations on a smaller subset before processing large datasets.
-
Prefer targeted transformations: Transform only what you need to minimize costs and processing time.
API Reference
Main Functions
createIterator(data, optionsMixin, globalOptions): Creates an iterator instancetransformObjectWithOptions(obj, transform, options): Low-level function to transform objectstransformObject(obj, transform, path, ...): Transforms matching paths in an object
Helper Functions
testFilters(filters): Creates a filter callback from filter functionsdefaultFilters(): Returns commonly used filtersdefaultError: Default error handler that logs to console
Types and Interfaces
AsyncTransformer: Function that transforms strings asynchronouslyFilterCallback: Function that determines if a value should be transformedErrorCallback: Function that handles transformation errorsFieldMapping: Configuration for a transformationTransformOptions: Options for the transformation process
Limitations
- The Iterator works with string values; objects and arrays are traversed but not directly transformed.
- Large datasets might require pagination or chunking for efficient processing.
- External API rate limits might require careful throttling configuration.
Troubleshooting
Common issues and solutions:
- No transformations occurring: Check your JSONPath expressions and filter conditions
- Unexpected field structure: Examine the exact structure of your data
- Rate limiting errors: Increase the throttleDelay between requests
- Transformation errors: Implement a custom error callback for detailed logging
Running the Examples
To run the included examples:
# Run the basic async iterator example
npm run examples:async-iterator
# Run the iterator factory example
npm run examples:iterator-factory
# Run with debug logging
npm run examples:async-iterator -- --debug
The examples will transform sample JSON data and save the results to the tests/test-data/core/ directory.
See Also
- JSON Path Syntax - Reference for JSONPath expressions
- p-throttle - The throttling library used internally
- p-map - For concurrent asynchronous mapping