mono/packages/media/docs/cli-pdf2jpg.md
2025-08-10 12:13:29 +02:00

12 KiB

PDF2JPG CLI Documentation

The pm-media pdf2jpg command converts PDF documents to high-quality images (JPG or PNG). This tool supports batch processing, custom DPI settings, page range selection, and advanced rendering options using MuPDF.

Table of Contents

Installation

npm install @polymech/media

Basic Usage

pm-media pdf2jpg --input <pdf-file> [options]

Required Parameters

  • --input (or -i): Path to the input PDF file

Optional Parameters

  • --output (or -o): Output path template
  • --dpi: Resolution for output images (default: 300)
  • --format: Output format - 'jpg' or 'png' (default: 'jpg')
  • --scale: Scaling factor (default: 2)
  • --startPage: First page to convert (1-based)
  • --endPage: Last page to convert (1-based)

Output Formats

JPEG Format

  • Extension: .jpg
  • Use case: Photographs, complex images
  • Pros: Smaller file sizes, good compression
  • Cons: Lossy compression, no transparency

PNG Format

  • Extension: .png
  • Use case: Text, diagrams, images with transparency
  • Pros: Lossless compression, transparency support
  • Cons: Larger file sizes

Command Line Options

Core Options

Option Alias Type Default Description
--input -i string required Path to the input PDF file
--output -o string auto Output path template
--dpi - number 300 Resolution for output images
--format - choice 'jpg' Output format: 'jpg' or 'png'
--scale - number 2 Scaling factor for rendering

Page Range Options

Option Type Description
--startPage number First page to convert (1-based index)
--endPage number Last page to convert (1-based index)

Quality & Performance

Option Type Default Description
--dpi number 300 Higher = better quality, larger files
--scale number 2 Rendering scale factor

API Usage

Import the Library

import { 
  runConversion,
  convertPdfToImages,
  ConvertCommandConfig,
  ImageFormat 
} from '@polymech/media';

Single PDF Conversion

import { runConversion } from '@polymech/media';
import { logger } from 'tslog';

const config: ConvertCommandConfig = {
  input: 'document.pdf',
  output: 'output/page_{PAGE}.jpg',
  dpi: 300,
  format: 'jpg',
  scale: 2
};

const outputFiles = await runConversion(config, logger);
console.log(`Generated ${outputFiles.length} images`);

Advanced PDF Conversion

import { convertPdfToImages } from '@polymech/media';
import { readFile } from 'fs/promises';

const pdfBuffer = await readFile('document.pdf');

const options = {
  baseVariables: {
    SRC_NAME: 'document',
    SRC_DIR: '/output'
  },
  outputPathTemplate: '${SRC_DIR}/${SRC_NAME}_${PAGE}.${FORMAT}',
  dpi: 600,
  scale: 1.5,
  format: 'png' as ImageFormat,
  startPage: 1,
  endPage: 10
};

const outputFiles = await convertPdfToImages(pdfBuffer, options);

Custom Page Range

const config: ConvertCommandConfig = {
  input: 'large-document.pdf',
  output: 'pages/chapter_{PAGE}.png',
  format: 'png',
  startPage: 10,
  endPage: 25,
  dpi: 150  // Lower DPI for faster processing
};

await runConversion(config, logger);

Examples

Example 1: Basic PDF to JPG

# Convert entire PDF to JPG images
pm-media pdf2jpg \
  --input "document.pdf" \
  --output "images/page_{PAGE}.jpg"

Example 2: High-Quality PNG Conversion

# High-quality PNG conversion
pm-media pdf2jpg \
  --input "technical-manual.pdf" \
  --output "manual/page_{PAGE}.png" \
  --format png \
  --dpi 600 \
  --scale 1

Example 3: Specific Page Range

# Convert only pages 5-15
pm-media pdf2jpg \
  --input "book.pdf" \
  --output "chapter/page_{PAGE}.jpg" \
  --startPage 5 \
  --endPage 15 \
  --dpi 300

Example 4: Fast Preview Generation

# Low-quality previews for quick processing
pm-media pdf2jpg \
  --input "presentation.pdf" \
  --output "previews/slide_{PAGE}.jpg" \
  --dpi 150 \
  --scale 1

Example 5: Print-Quality Images

# Print-quality conversion
pm-media pdf2jpg \
  --input "brochure.pdf" \
  --output "print-ready/page_{PAGE}.png" \
  --format png \
  --dpi 600 \
  --scale 2

Example 6: Organized Output Structure

# Organized folder structure
pm-media pdf2jpg \
  --input "reports/quarterly-report.pdf" \
  --output "output/quarterly/{PAGE:03d}_page.jpg" \
  --dpi 300

Example 7: Single Page Extraction

# Extract just the first page
pm-media pdf2jpg \
  --input "document.pdf" \
  --output "cover.jpg" \
  --startPage 1 \
  --endPage 1 \
  --dpi 300

Example 8: Batch Processing Script

#!/bin/bash
# Process multiple PDFs

for pdf in *.pdf; do
  name=$(basename "$pdf" .pdf)
  pm-media pdf2jpg \
    --input "$pdf" \
    --output "converted/${name}/{PAGE}.jpg" \
    --dpi 300 \
    --format jpg
done

Template Variables

The output path supports template variables for dynamic naming:

Available Variables

Variable Description Example
{PAGE} Page number 1, 2, 3
{PAGE:03d} Zero-padded page 001, 002, 003
{SRC_NAME} PDF filename (no extension) document
{SRC_DIR} Source directory /path/to/source
{FORMAT} Output format jpg, png

Template Examples

# Basic template
--output "pages/page_{PAGE}.jpg"
# Result: pages/page_1.jpg, pages/page_2.jpg

# Zero-padded pages
--output "output/{PAGE:03d}_page.{FORMAT}"
# Result: output/001_page.jpg, output/002_page.jpg

# Include source name
--output "{SRC_NAME}/page_{PAGE}.{FORMAT}"
# Result: document/page_1.jpg, document/page_2.jpg

# Complex structure
--output "converted/{SRC_NAME}/{PAGE:04d}_{SRC_NAME}.{FORMAT}"
# Result: converted/manual/0001_manual.jpg

DPI and Quality Guidelines

DPI Recommendations

Use Case DPI File Size Quality
Web preview 72-96 Small Basic
Screen viewing 150 Medium Good
Standard print 300 Large High
Professional print 600+ Very large Excellent

Quality vs Performance

# Fast processing (previews)
pm-media pdf2jpg --input doc.pdf --dpi 150 --scale 1

# Balanced (general use)  
pm-media pdf2jpg --input doc.pdf --dpi 300 --scale 2

# High quality (print/archive)
pm-media pdf2jpg --input doc.pdf --dpi 600 --scale 1 --format png

Performance Optimization

1. Adjust DPI for Use Case

# Thumbnails/previews
pm-media pdf2jpg --input large.pdf --dpi 96 --output "thumbs/{PAGE}.jpg"

# Print quality
pm-media pdf2jpg --input doc.pdf --dpi 300 --output "print/{PAGE}.png"

2. Use Appropriate Format

# Photos/complex images → JPG
pm-media pdf2jpg --input photo-catalog.pdf --format jpg --dpi 300

# Text/diagrams → PNG  
pm-media pdf2jpg --input technical-docs.pdf --format png --dpi 300

3. Process Page Ranges

# Process in chunks for large PDFs
pm-media pdf2jpg --input huge.pdf --startPage 1 --endPage 50 --output "batch1/{PAGE}.jpg"
pm-media pdf2jpg --input huge.pdf --startPage 51 --endPage 100 --output "batch2/{PAGE}.jpg"

4. Scale Factor Optimization

# Scale = 1 for faster processing
pm-media pdf2jpg --input doc.pdf --scale 1 --dpi 300

# Scale = 2 for better quality (default)
pm-media pdf2jpg --input doc.pdf --scale 2 --dpi 300

File Organization

Automatic Directory Creation

The tool automatically creates output directories:

# Creates nested directories as needed
pm-media pdf2jpg \
  --input "doc.pdf" \
  --output "output/projects/client-a/pages/{PAGE}.jpg"

Organized Output Patterns

# By document name
--output "{SRC_NAME}/page_{PAGE:03d}.jpg"

# By date and document
--output "$(date +%Y-%m-%d)/{SRC_NAME}/{PAGE}.jpg"

# By quality level
--output "high-res/{SRC_NAME}/page_{PAGE}.png"

Troubleshooting

Common Issues

"Input file does not exist"

# Check file path and permissions
ls -la path/to/file.pdf
pm-media pdf2jpg --input "./document.pdf" --output "output/{PAGE}.jpg"

"Unable to open PDF"

  • Ensure PDF is not corrupted
  • Check if PDF is password-protected
  • Verify file permissions

Poor image quality

# Increase DPI and use PNG for text
pm-media pdf2jpg \
  --input doc.pdf \
  --format png \
  --dpi 600 \
  --scale 1

Large file sizes

# Reduce DPI or use JPG format
pm-media pdf2jpg \
  --input doc.pdf \
  --format jpg \
  --dpi 300 \
  --scale 1

Memory issues with large PDFs

# Process in smaller page ranges
pm-media pdf2jpg --input large.pdf --startPage 1 --endPage 10
pm-media pdf2jpg --input large.pdf --startPage 11 --endPage 20

Debug Mode

No specific debug mode, but check output for errors:

pm-media pdf2jpg --input doc.pdf --output "{PAGE}.jpg" 2>&1 | tee conversion.log

Page Range Validation

# Check PDF page count first
pm-media pdf2jpg --input doc.pdf --startPage 1 --endPage 1  # Test first page

Advanced Usage

Integration with Other Tools

#!/bin/bash
# Convert PDF and create thumbnails

PDF_FILE="$1"
BASE_NAME=$(basename "$PDF_FILE" .pdf)

# Convert to high-res images
pm-media pdf2jpg \
  --input "$PDF_FILE" \
  --output "images/${BASE_NAME}/{PAGE:03d}.png" \
  --format png \
  --dpi 300

# Create thumbnails
pm-media resize \
  --src "images/${BASE_NAME}/*.png" \
  --dst "thumbnails/${BASE_NAME}/" \
  --width 200 \
  --height 200 \
  --fit cover

Automation Scripts

#!/bin/bash
# Automated PDF processing pipeline

INPUT_DIR="$1"
OUTPUT_DIR="$2"

find "$INPUT_DIR" -name "*.pdf" | while read pdf; do
  name=$(basename "$pdf" .pdf)
  echo "Processing: $pdf"
  
  pm-media pdf2jpg \
    --input "$pdf" \
    --output "$OUTPUT_DIR/$name/{PAGE:03d}.jpg" \
    --dpi 300 \
    --format jpg
    
  echo "Completed: $name"
done

Quality Control

#!/bin/bash
# Generate multiple quality versions

PDF="$1"
NAME=$(basename "$PDF" .pdf)

# Preview quality
pm-media pdf2jpg --input "$PDF" --output "preview/$NAME/{PAGE}.jpg" --dpi 150

# Standard quality  
pm-media pdf2jpg --input "$PDF" --output "standard/$NAME/{PAGE}.jpg" --dpi 300

# Print quality
pm-media pdf2jpg --input "$PDF" --output "print/$NAME/{PAGE}.png" --dpi 600 --format png

TypeScript Definitions

interface ConvertCommandConfig {
  input: string;
  output?: string;
  dpi: number;
  scale?: number;
  format: 'png' | 'jpg';
  startPage?: number;
  endPage?: number;
}

interface PdfToImageOptions {
  baseVariables: Record<string, any>;
  outputPathTemplate: string;
  dpi: number;
  scale?: number;
  format: ImageFormat;
  startPage?: number;
  endPage?: number;
  logger?: Logger<any>;
}

type ImageFormat = 'png' | 'jpg';

Contributing

For bug reports, feature requests, or contributions, please visit the project repository.

License

This tool is part of the @polymech/media package. Please refer to the package license for usage terms.