agent-smith/packages/GLiNER2/tutorial/7-api.md

# GLiNER2 API Extractor

Use GLiNER2 through a cloud API without loading models locally. Perfect for production deployments, low-memory environments, or when you need instant access without GPU setup.

## Table of Contents
- [Getting Started](#getting-started)
- [Basic Usage](#basic-usage)
- [Entity Extraction](#entity-extraction)
- [Text Classification](#text-classification)
- [Structured Extraction](#structured-extraction)
- [Relation Extraction](#relation-extraction)
- [Combined Schemas](#combined-schemas)
- [Batch Processing](#batch-processing)
- [Confidence Scores](#confidence-scores)
- [Error Handling](#error-handling)
- [API vs Local](#api-vs-local)

## Getting Started

### Get Your API Key

1. Visit [gliner.pioneer.ai](https://gliner.pioneer.ai)
2. Sign up or log in to your account
3. Navigate to API Keys section
4. Generate a new API key

### Installation

```bash
pip install gliner2
```

### Set Your API Key

**Option 1: Environment Variable (Recommended)**
```bash
export PIONEER_API_KEY="your-api-key-here"
```

**Option 2: Pass Directly**
```python
extractor = GLiNER2.from_api(api_key="your-api-key-here")
```

## Basic Usage

```python
from gliner2 import GLiNER2

# Load from API (uses PIONEER_API_KEY environment variable)
extractor = GLiNER2.from_api()

# Use exactly like the local model!
results = extractor.extract_entities(
    "Apple CEO Tim Cook announced the iPhone 15 in Cupertino.",
    ["company", "person", "product", "location"]
)
print(results)
# Output: {
#     'entities': {
#         'company': ['Apple'],
#         'person': ['Tim Cook'],
#         'product': ['iPhone 15'],
#         'location': ['Cupertino']
#     }
# }
```

## Entity Extraction

### Simple Extraction

```python
extractor = GLiNER2.from_api()

text = "Elon Musk founded SpaceX in 2002 and Tesla in 2003."
results = extractor.extract_entities(
    text,
    ["person", "company", "date"]
)
# Output: {
#     'entities': {
#         'person': ['Elon Musk'],
#         'company': ['SpaceX', 'Tesla'],
#         'date': ['2002', '2003']
#     }
# }
```

### With Confidence Scores and Character Positions

You can include confidence scores and character-level start/end positions using `include_confidence` and `include_spans`:

```python
# With confidence only
results = extractor.extract_entities(
    "Microsoft acquired LinkedIn for $26.2 billion.",
    ["company", "price"],
    include_confidence=True
)
# Output: {
#     'entities': {
#         'company': [
#             {'text': 'Microsoft', 'confidence': 0.98},
#             {'text': 'LinkedIn', 'confidence': 0.97}
#         ],
#         'price': [
#             {'text': '$26.2 billion', 'confidence': 0.95}
#         ]
#     }
# }

# With character positions (spans) only
results = extractor.extract_entities(
    "Microsoft acquired LinkedIn.",
    ["company"],
    include_spans=True
)
# Output: {
#     'entities': {
#         'company': [
#             {'text': 'Microsoft', 'start': 0, 'end': 9},
#             {'text': 'LinkedIn', 'start': 18, 'end': 26}
#         ]
#     }
# }

# With both confidence and spans
results = extractor.extract_entities(
    "Microsoft acquired LinkedIn for $26.2 billion.",
    ["company", "price"],
    include_confidence=True,
    include_spans=True
)
# Output: {
#     'entities': {
#         'company': [
#             {'text': 'Microsoft', 'confidence': 0.98, 'start': 0, 'end': 9},
#             {'text': 'LinkedIn', 'confidence': 0.97, 'start': 18, 'end': 26}
#         ],
#         'price': [
#             {'text': '$26.2 billion', 'confidence': 0.95, 'start': 32, 'end': 45}
#         ]
#     }
# }
```

### Custom Threshold

```python
# Only return high-confidence extractions
results = extractor.extract_entities(
    text,
    ["person", "company"],
    threshold=0.8  # Minimum 80% confidence
)
```

## Text Classification

### Single-Label Classification

```python
extractor = GLiNER2.from_api()

text = "I absolutely love this product! It exceeded all my expectations."
results = extractor.classify_text(
    text,
    {"sentiment": ["positive", "negative", "neutral"]}
)
# Output: {'sentiment': {'category': 'positive'}}
```

### Multi-Task Classification

```python
text = "Breaking: Major earthquake hits coastal city. Rescue teams deployed."
results = extractor.classify_text(
    text,
    {
        "category": ["politics", "sports", "technology", "disaster", "business"],
        "urgency": ["low", "medium", "high"]
    }
)
# Output: {'category': 'disaster', 'urgency': 'high'}
```

## Structured Extraction

### Contact Information

```python
extractor = GLiNER2.from_api()

text = """
Contact John Smith at john.smith@email.com or call +1-555-123-4567.
He works as a Senior Engineer at TechCorp Inc.
"""

results = extractor.extract_json(
    text,
    {
        "contact": [
            "name::str::Full name of the person",
            "email::str::Email address",
            "phone::str::Phone number",
            "job_title::str::Professional title",
            "company::str::Company name"
        ]
    }
)
# Output: {
#     'contact': [{
#         'name': 'John Smith',
#         'email': 'john.smith@email.com',
#         'phone': '+1-555-123-4567',
#         'job_title': 'Senior Engineer',
#         'company': 'TechCorp Inc.'
#     }]
# }
```

### Product Information

```python
text = "iPhone 15 Pro Max - $1199, 256GB storage, Natural Titanium color"

results = extractor.extract_json(
    text,
    {
        "product": [
            "name::str",
            "price::str",
            "storage::str",
            "color::str"
        ]
    }
)
# Output: {
#     'product': [{
#         'name': 'iPhone 15 Pro Max',
#         'price': '$1199',
#         'storage': '256GB',
#         'color': 'Natural Titanium'
#     }]
# }
```

## Relation Extraction

Extract relationships between entities as directional tuples (source, target).

### Basic Relation Extraction

```python
extractor = GLiNER2.from_api()

text = "John works for Apple Inc. and lives in San Francisco. Apple Inc. is located in Cupertino."
results = extractor.extract_relations(
    text,
    ["works_for", "lives_in", "located_in"]
)
# Output: {
#     'relation_extraction': {
#         'works_for': [('John', 'Apple Inc.')],
#         'lives_in': [('John', 'San Francisco')],
#         'located_in': [('Apple Inc.', 'Cupertino')]
#     }
# }
```

### With Descriptions

```python
text = "Elon Musk founded SpaceX in 2002. SpaceX is located in Hawthorne, California."

schema = extractor.create_schema().relations({
    "founded": "Founding relationship where person created organization",
    "located_in": "Geographic relationship where entity is in a location"
})

results = extractor.extract(text, schema)
# Output: {
#     'relation_extraction': {
#         'founded': [('Elon Musk', 'SpaceX')],
#         'located_in': [('SpaceX', 'Hawthorne, California')]
#     }
# }
```

### Batch Relation Extraction

```python
texts = [
    "John works for Microsoft and lives in Seattle.",
    "Sarah founded TechStartup in 2020.",
    "Bob reports to Alice at Google."
]

results = extractor.batch_extract_relations(
    texts,
    ["works_for", "founded", "reports_to", "lives_in"]
)
# Returns list of relation extraction results for each text
```

## Combined Schemas

Combine entities, classification, relations, and structured extraction in a single call.

```python
extractor = GLiNER2.from_api()

text = """
Tech Review: The new MacBook Pro M3 is absolutely fantastic! Apple has outdone themselves.
I tested it in San Francisco last week. Tim Cook works for Apple, which is located in Cupertino.
Highly recommended for developers. Rating: 5 out of 5 stars.
"""

schema = (extractor.create_schema()
    .entities(["company", "product", "location", "person"])
    .classification("sentiment", ["positive", "negative", "neutral"])
    .relations(["works_for", "located_in"])
    .structure("review")
        .field("product_name", dtype="str")
        .field("rating", dtype="str")
        .field("recommendation", dtype="str")
)

results = extractor.extract(text, schema)
# Output: {
#     'entities': {
#         'company': ['Apple'],
#         'product': ['MacBook Pro M3'],
#         'location': ['San Francisco', 'Cupertino'],
#         'person': ['Tim Cook']
#     },
#     'sentiment': 'positive',
#     'relation_extraction': {
#         'works_for': [('Tim Cook', 'Apple')],
#         'located_in': [('Apple', 'Cupertino')]
#     },
#     'review': [{
#         'product_name': 'MacBook Pro M3',
#         'rating': '5 out of 5 stars',
#         'recommendation': 'Highly recommended for developers'
#     }]
# }
```

## Batch Processing

Process multiple texts efficiently in a single API call.

```python
extractor = GLiNER2.from_api()

texts = [
    "Google's Sundar Pichai unveiled Gemini AI in Mountain View.",
    "Microsoft CEO Satya Nadella announced Copilot at Build 2023.",
    "Amazon's Andy Jassy revealed new AWS services in Seattle."
]

results = extractor.batch_extract_entities(
    texts,
    ["company", "person", "product", "location"]
)

for i, result in enumerate(results):
    print(f"Text {i+1}: {result}")
```

## Confidence Scores and Character Positions

### Entity Extraction with Confidence

```python
# Include confidence scores
results = extractor.extract_entities(
    "Apple released the iPhone 15 in September 2023.",
    ["company", "product", "date"],
    include_confidence=True
)
# Each entity includes: {'text': '...', 'confidence': 0.95}
```

### Entity Extraction with Character Positions

```python
# Include character-level start/end positions
results = extractor.extract_entities(
    "Apple released the iPhone 15.",
    ["company", "product"],
    include_spans=True
)
# Each entity includes: {'text': '...', 'start': 0, 'end': 5}
```

### Both Confidence and Positions

```python
# Include both confidence and character positions
results = extractor.extract_entities(
    "Apple released the iPhone 15 in September 2023.",
    ["company", "product", "date"],
    include_confidence=True,
    include_spans=True
)
# Each entity includes: {'text': '...', 'confidence': 0.95, 'start': 0, 'end': 5}
```

### Raw Results (Advanced)

For full control over the extraction data:

```python
results = extractor.extract_entities(
    "Apple CEO Tim Cook announced new products.",
    ["company", "person"],
    format_results=False,  # Get raw extraction data
    include_confidence=True,
    include_spans=True
)
# Returns tuples: (text, confidence, start_char, end_char)
```

## Error Handling

```python
from gliner2 import GLiNER2, GLiNER2APIError, AuthenticationError, ValidationError

try:
    extractor = GLiNER2.from_api()
    results = extractor.extract_entities(text, entity_types)

except AuthenticationError:
    print("Invalid API key. Check your PIONEER_API_KEY.")

except ValidationError as e:
    print(f"Invalid request: {e}")

except GLiNER2APIError as e:
    print(f"API error: {e}")
```

### Connection Settings

```python
extractor = GLiNER2.from_api(
    api_key="your-key",
    timeout=60.0,      # Request timeout (seconds)
    max_retries=5      # Retry failed requests
)
```

## API vs Local

| Feature | API (`from_api()`) | Local (`from_pretrained()`) |
|---------|-------------------|----------------------------|
| Setup | Just API key | GPU/CPU + model download |
| Memory | ~0 MB | 2-8 GB+ |
| Latency | Network dependent | Faster for single texts |
| Batch | Optimized | Optimized |
| Cost | Per request | Free after setup |
| Offline | ❌ | ✅ |
| RegexValidator | ❌ | ✅ |

### When to Use API

- Production deployments without GPU
- Serverless functions (AWS Lambda, etc.)
- Quick prototyping
- Low-memory environments
- Mobile/edge applications

### When to Use Local

- High-volume processing
- Offline requirements
- Sensitive data (no network transfer)
- Need for RegexValidator
- Cost optimization at scale

## Seamless Switching

The API mirrors the local interface exactly, making switching trivial:

```python
# Development: Use API for quick iteration
extractor = GLiNER2.from_api()

# Production: Switch to local if needed
# extractor = GLiNER2.from_pretrained("your-model")

# Same code works with both!
results = extractor.extract_entities(text, entity_types)
```

## Limitations

The API currently does not support:

1. **RegexValidator** - Use local model for regex-based filtering
2. **Multi-schema batch** - Different schemas per text in batch (works but slower)
3. **Custom models** - API uses the default GLiNER2 model

## Best Practices

1. **Store API key securely** - Use environment variables, not hardcoded strings
2. **Handle errors gracefully** - Network issues can occur
3. **Use batch processing** - More efficient than individual calls
4. **Set appropriate timeouts** - Increase for large texts
5. **Cache results** - Avoid redundant API calls for same content