12 KiB
12 KiB
PDF2JPG CLI Documentation
The pm-media pdf2jpg command converts PDF documents to high-quality images (JPG or PNG). This tool supports batch processing, custom DPI settings, page range selection, and advanced rendering options using MuPDF.
Table of Contents
- Installation
- Basic Usage
- Output Formats
- Command Line Options
- API Usage
- Examples
- Template Variables
- Performance Optimization
- Troubleshooting
Installation
npm install @polymech/media
Basic Usage
pm-media pdf2jpg --input <pdf-file> [options]
Required Parameters
--input(or-i): Path to the input PDF file
Optional Parameters
--output(or-o): Output path template--dpi: Resolution for output images (default: 300)--format: Output format - 'jpg' or 'png' (default: 'jpg')--scale: Scaling factor (default: 2)--startPage: First page to convert (1-based)--endPage: Last page to convert (1-based)
Output Formats
JPEG Format
- Extension:
.jpg - Use case: Photographs, complex images
- Pros: Smaller file sizes, good compression
- Cons: Lossy compression, no transparency
PNG Format
- Extension:
.png - Use case: Text, diagrams, images with transparency
- Pros: Lossless compression, transparency support
- Cons: Larger file sizes
Command Line Options
Core Options
| Option | Alias | Type | Default | Description |
|---|---|---|---|---|
--input |
-i |
string | required | Path to the input PDF file |
--output |
-o |
string | auto | Output path template |
--dpi |
- | number | 300 | Resolution for output images |
--format |
- | choice | 'jpg' | Output format: 'jpg' or 'png' |
--scale |
- | number | 2 | Scaling factor for rendering |
Page Range Options
| Option | Type | Description |
|---|---|---|
--startPage |
number | First page to convert (1-based index) |
--endPage |
number | Last page to convert (1-based index) |
Quality & Performance
| Option | Type | Default | Description |
|---|---|---|---|
--dpi |
number | 300 | Higher = better quality, larger files |
--scale |
number | 2 | Rendering scale factor |
API Usage
Import the Library
import {
runConversion,
convertPdfToImages,
ConvertCommandConfig,
ImageFormat
} from '@polymech/media';
Single PDF Conversion
import { runConversion } from '@polymech/media';
import { logger } from 'tslog';
const config: ConvertCommandConfig = {
input: 'document.pdf',
output: 'output/page_{PAGE}.jpg',
dpi: 300,
format: 'jpg',
scale: 2
};
const outputFiles = await runConversion(config, logger);
console.log(`Generated ${outputFiles.length} images`);
Advanced PDF Conversion
import { convertPdfToImages } from '@polymech/media';
import { readFile } from 'fs/promises';
const pdfBuffer = await readFile('document.pdf');
const options = {
baseVariables: {
SRC_NAME: 'document',
SRC_DIR: '/output'
},
outputPathTemplate: '${SRC_DIR}/${SRC_NAME}_${PAGE}.${FORMAT}',
dpi: 600,
scale: 1.5,
format: 'png' as ImageFormat,
startPage: 1,
endPage: 10
};
const outputFiles = await convertPdfToImages(pdfBuffer, options);
Custom Page Range
const config: ConvertCommandConfig = {
input: 'large-document.pdf',
output: 'pages/chapter_{PAGE}.png',
format: 'png',
startPage: 10,
endPage: 25,
dpi: 150 // Lower DPI for faster processing
};
await runConversion(config, logger);
Examples
Example 1: Basic PDF to JPG
# Convert entire PDF to JPG images
pm-media pdf2jpg \
--input "document.pdf" \
--output "images/page_{PAGE}.jpg"
Example 2: High-Quality PNG Conversion
# High-quality PNG conversion
pm-media pdf2jpg \
--input "technical-manual.pdf" \
--output "manual/page_{PAGE}.png" \
--format png \
--dpi 600 \
--scale 1
Example 3: Specific Page Range
# Convert only pages 5-15
pm-media pdf2jpg \
--input "book.pdf" \
--output "chapter/page_{PAGE}.jpg" \
--startPage 5 \
--endPage 15 \
--dpi 300
Example 4: Fast Preview Generation
# Low-quality previews for quick processing
pm-media pdf2jpg \
--input "presentation.pdf" \
--output "previews/slide_{PAGE}.jpg" \
--dpi 150 \
--scale 1
Example 5: Print-Quality Images
# Print-quality conversion
pm-media pdf2jpg \
--input "brochure.pdf" \
--output "print-ready/page_{PAGE}.png" \
--format png \
--dpi 600 \
--scale 2
Example 6: Organized Output Structure
# Organized folder structure
pm-media pdf2jpg \
--input "reports/quarterly-report.pdf" \
--output "output/quarterly/{PAGE:03d}_page.jpg" \
--dpi 300
Example 7: Single Page Extraction
# Extract just the first page
pm-media pdf2jpg \
--input "document.pdf" \
--output "cover.jpg" \
--startPage 1 \
--endPage 1 \
--dpi 300
Example 8: Batch Processing Script
#!/bin/bash
# Process multiple PDFs
for pdf in *.pdf; do
name=$(basename "$pdf" .pdf)
pm-media pdf2jpg \
--input "$pdf" \
--output "converted/${name}/{PAGE}.jpg" \
--dpi 300 \
--format jpg
done
Template Variables
The output path supports template variables for dynamic naming:
Available Variables
| Variable | Description | Example |
|---|---|---|
{PAGE} |
Page number | 1, 2, 3 |
{PAGE:03d} |
Zero-padded page | 001, 002, 003 |
{SRC_NAME} |
PDF filename (no extension) | document |
{SRC_DIR} |
Source directory | /path/to/source |
{FORMAT} |
Output format | jpg, png |
Template Examples
# Basic template
--output "pages/page_{PAGE}.jpg"
# Result: pages/page_1.jpg, pages/page_2.jpg
# Zero-padded pages
--output "output/{PAGE:03d}_page.{FORMAT}"
# Result: output/001_page.jpg, output/002_page.jpg
# Include source name
--output "{SRC_NAME}/page_{PAGE}.{FORMAT}"
# Result: document/page_1.jpg, document/page_2.jpg
# Complex structure
--output "converted/{SRC_NAME}/{PAGE:04d}_{SRC_NAME}.{FORMAT}"
# Result: converted/manual/0001_manual.jpg
DPI and Quality Guidelines
DPI Recommendations
| Use Case | DPI | File Size | Quality |
|---|---|---|---|
| Web preview | 72-96 | Small | Basic |
| Screen viewing | 150 | Medium | Good |
| Standard print | 300 | Large | High |
| Professional print | 600+ | Very large | Excellent |
Quality vs Performance
# Fast processing (previews)
pm-media pdf2jpg --input doc.pdf --dpi 150 --scale 1
# Balanced (general use)
pm-media pdf2jpg --input doc.pdf --dpi 300 --scale 2
# High quality (print/archive)
pm-media pdf2jpg --input doc.pdf --dpi 600 --scale 1 --format png
Performance Optimization
1. Adjust DPI for Use Case
# Thumbnails/previews
pm-media pdf2jpg --input large.pdf --dpi 96 --output "thumbs/{PAGE}.jpg"
# Print quality
pm-media pdf2jpg --input doc.pdf --dpi 300 --output "print/{PAGE}.png"
2. Use Appropriate Format
# Photos/complex images → JPG
pm-media pdf2jpg --input photo-catalog.pdf --format jpg --dpi 300
# Text/diagrams → PNG
pm-media pdf2jpg --input technical-docs.pdf --format png --dpi 300
3. Process Page Ranges
# Process in chunks for large PDFs
pm-media pdf2jpg --input huge.pdf --startPage 1 --endPage 50 --output "batch1/{PAGE}.jpg"
pm-media pdf2jpg --input huge.pdf --startPage 51 --endPage 100 --output "batch2/{PAGE}.jpg"
4. Scale Factor Optimization
# Scale = 1 for faster processing
pm-media pdf2jpg --input doc.pdf --scale 1 --dpi 300
# Scale = 2 for better quality (default)
pm-media pdf2jpg --input doc.pdf --scale 2 --dpi 300
File Organization
Automatic Directory Creation
The tool automatically creates output directories:
# Creates nested directories as needed
pm-media pdf2jpg \
--input "doc.pdf" \
--output "output/projects/client-a/pages/{PAGE}.jpg"
Organized Output Patterns
# By document name
--output "{SRC_NAME}/page_{PAGE:03d}.jpg"
# By date and document
--output "$(date +%Y-%m-%d)/{SRC_NAME}/{PAGE}.jpg"
# By quality level
--output "high-res/{SRC_NAME}/page_{PAGE}.png"
Troubleshooting
Common Issues
"Input file does not exist"
# Check file path and permissions
ls -la path/to/file.pdf
pm-media pdf2jpg --input "./document.pdf" --output "output/{PAGE}.jpg"
"Unable to open PDF"
- Ensure PDF is not corrupted
- Check if PDF is password-protected
- Verify file permissions
Poor image quality
# Increase DPI and use PNG for text
pm-media pdf2jpg \
--input doc.pdf \
--format png \
--dpi 600 \
--scale 1
Large file sizes
# Reduce DPI or use JPG format
pm-media pdf2jpg \
--input doc.pdf \
--format jpg \
--dpi 300 \
--scale 1
Memory issues with large PDFs
# Process in smaller page ranges
pm-media pdf2jpg --input large.pdf --startPage 1 --endPage 10
pm-media pdf2jpg --input large.pdf --startPage 11 --endPage 20
Debug Mode
No specific debug mode, but check output for errors:
pm-media pdf2jpg --input doc.pdf --output "{PAGE}.jpg" 2>&1 | tee conversion.log
Page Range Validation
# Check PDF page count first
pm-media pdf2jpg --input doc.pdf --startPage 1 --endPage 1 # Test first page
Advanced Usage
Integration with Other Tools
#!/bin/bash
# Convert PDF and create thumbnails
PDF_FILE="$1"
BASE_NAME=$(basename "$PDF_FILE" .pdf)
# Convert to high-res images
pm-media pdf2jpg \
--input "$PDF_FILE" \
--output "images/${BASE_NAME}/{PAGE:03d}.png" \
--format png \
--dpi 300
# Create thumbnails
pm-media resize \
--src "images/${BASE_NAME}/*.png" \
--dst "thumbnails/${BASE_NAME}/" \
--width 200 \
--height 200 \
--fit cover
Automation Scripts
#!/bin/bash
# Automated PDF processing pipeline
INPUT_DIR="$1"
OUTPUT_DIR="$2"
find "$INPUT_DIR" -name "*.pdf" | while read pdf; do
name=$(basename "$pdf" .pdf)
echo "Processing: $pdf"
pm-media pdf2jpg \
--input "$pdf" \
--output "$OUTPUT_DIR/$name/{PAGE:03d}.jpg" \
--dpi 300 \
--format jpg
echo "Completed: $name"
done
Quality Control
#!/bin/bash
# Generate multiple quality versions
PDF="$1"
NAME=$(basename "$PDF" .pdf)
# Preview quality
pm-media pdf2jpg --input "$PDF" --output "preview/$NAME/{PAGE}.jpg" --dpi 150
# Standard quality
pm-media pdf2jpg --input "$PDF" --output "standard/$NAME/{PAGE}.jpg" --dpi 300
# Print quality
pm-media pdf2jpg --input "$PDF" --output "print/$NAME/{PAGE}.png" --dpi 600 --format png
TypeScript Definitions
interface ConvertCommandConfig {
input: string;
output?: string;
dpi: number;
scale?: number;
format: 'png' | 'jpg';
startPage?: number;
endPage?: number;
}
interface PdfToImageOptions {
baseVariables: Record<string, any>;
outputPathTemplate: string;
dpi: number;
scale?: number;
format: ImageFormat;
startPage?: number;
endPage?: number;
logger?: Logger<any>;
}
type ImageFormat = 'png' | 'jpg';
Contributing
For bug reports, feature requests, or contributions, please visit the project repository.
License
This tool is part of the @polymech/media package. Please refer to the package license for usage terms.