2026-03-06 12:59:32 +01:00

13 KiB

Raw Permalink Blame History

GLiNER2 API Extractor

Use GLiNER2 through a cloud API without loading models locally. Perfect for production deployments, low-memory environments, or when you need instant access without GPU setup.

Getting Started
Basic Usage
Entity Extraction
Text Classification
Structured Extraction
Relation Extraction
Combined Schemas
Batch Processing
Confidence Scores
Error Handling
API vs Local

Getting Started

Get Your API Key

Visit gliner.pioneer.ai
Sign up or log in to your account
Navigate to API Keys section
Generate a new API key

Installation

pip install gliner2

Set Your API Key

Option 1: Environment Variable (Recommended)

export PIONEER_API_KEY="your-api-key-here"

Option 2: Pass Directly

extractor = GLiNER2.from_api(api_key="your-api-key-here")

Basic Usage

from gliner2 import GLiNER2

# Load from API (uses PIONEER_API_KEY environment variable)
extractor = GLiNER2.from_api()

# Use exactly like the local model!
results = extractor.extract_entities(
    "Apple CEO Tim Cook announced the iPhone 15 in Cupertino.",
    ["company", "person", "product", "location"]
)
print(results)
# Output: {
#     'entities': {
#         'company': ['Apple'],
#         'person': ['Tim Cook'],
#         'product': ['iPhone 15'],
#         'location': ['Cupertino']
#     }
# }

Entity Extraction

Simple Extraction

extractor = GLiNER2.from_api()

text = "Elon Musk founded SpaceX in 2002 and Tesla in 2003."
results = extractor.extract_entities(
    text,
    ["person", "company", "date"]
)
# Output: {
#     'entities': {
#         'person': ['Elon Musk'],
#         'company': ['SpaceX', 'Tesla'],
#         'date': ['2002', '2003']
#     }
# }

With Confidence Scores and Character Positions

You can include confidence scores and character-level start/end positions using include_confidence and include_spans:

# With confidence only
results = extractor.extract_entities(
    "Microsoft acquired LinkedIn for $26.2 billion.",
    ["company", "price"],
    include_confidence=True
)
# Output: {
#     'entities': {
#         'company': [
#             {'text': 'Microsoft', 'confidence': 0.98},
#             {'text': 'LinkedIn', 'confidence': 0.97}
#         ],
#         'price': [
#             {'text': '$26.2 billion', 'confidence': 0.95}
#         ]
#     }
# }

# With character positions (spans) only
results = extractor.extract_entities(
    "Microsoft acquired LinkedIn.",
    ["company"],
    include_spans=True
)
# Output: {
#     'entities': {
#         'company': [
#             {'text': 'Microsoft', 'start': 0, 'end': 9},
#             {'text': 'LinkedIn', 'start': 18, 'end': 26}
#         ]
#     }
# }

# With both confidence and spans
results = extractor.extract_entities(
    "Microsoft acquired LinkedIn for $26.2 billion.",
    ["company", "price"],
    include_confidence=True,
    include_spans=True
)
# Output: {
#     'entities': {
#         'company': [
#             {'text': 'Microsoft', 'confidence': 0.98, 'start': 0, 'end': 9},
#             {'text': 'LinkedIn', 'confidence': 0.97, 'start': 18, 'end': 26}
#         ],
#         'price': [
#             {'text': '$26.2 billion', 'confidence': 0.95, 'start': 32, 'end': 45}
#         ]
#     }
# }

Custom Threshold

# Only return high-confidence extractions
results = extractor.extract_entities(
    text,
    ["person", "company"],
    threshold=0.8  # Minimum 80% confidence
)

Text Classification

Single-Label Classification

extractor = GLiNER2.from_api()

text = "I absolutely love this product! It exceeded all my expectations."
results = extractor.classify_text(
    text,
    {"sentiment": ["positive", "negative", "neutral"]}
)
# Output: {'sentiment': {'category': 'positive'}}

Multi-Task Classification

text = "Breaking: Major earthquake hits coastal city. Rescue teams deployed."
results = extractor.classify_text(
    text,
    {
        "category": ["politics", "sports", "technology", "disaster", "business"],
        "urgency": ["low", "medium", "high"]
    }
)
# Output: {'category': 'disaster', 'urgency': 'high'}

Structured Extraction

Contact Information

extractor = GLiNER2.from_api()

text = """
Contact John Smith at john.smith@email.com or call +1-555-123-4567.
He works as a Senior Engineer at TechCorp Inc.
"""

results = extractor.extract_json(
    text,
    {
        "contact": [
            "name::str::Full name of the person",
            "email::str::Email address",
            "phone::str::Phone number",
            "job_title::str::Professional title",
            "company::str::Company name"
        ]
    }
)
# Output: {
#     'contact': [{
#         'name': 'John Smith',
#         'email': 'john.smith@email.com',
#         'phone': '+1-555-123-4567',
#         'job_title': 'Senior Engineer',
#         'company': 'TechCorp Inc.'
#     }]
# }

Product Information

text = "iPhone 15 Pro Max - $1199, 256GB storage, Natural Titanium color"

results = extractor.extract_json(
    text,
    {
        "product": [
            "name::str",
            "price::str",
            "storage::str",
            "color::str"
        ]
    }
)
# Output: {
#     'product': [{
#         'name': 'iPhone 15 Pro Max',
#         'price': '$1199',
#         'storage': '256GB',
#         'color': 'Natural Titanium'
#     }]
# }

Relation Extraction

Extract relationships between entities as directional tuples (source, target).

Basic Relation Extraction

extractor = GLiNER2.from_api()

text = "John works for Apple Inc. and lives in San Francisco. Apple Inc. is located in Cupertino."
results = extractor.extract_relations(
    text,
    ["works_for", "lives_in", "located_in"]
)
# Output: {
#     'relation_extraction': {
#         'works_for': [('John', 'Apple Inc.')],
#         'lives_in': [('John', 'San Francisco')],
#         'located_in': [('Apple Inc.', 'Cupertino')]
#     }
# }

With Descriptions

text = "Elon Musk founded SpaceX in 2002. SpaceX is located in Hawthorne, California."

schema = extractor.create_schema().relations({
    "founded": "Founding relationship where person created organization",
    "located_in": "Geographic relationship where entity is in a location"
})

results = extractor.extract(text, schema)
# Output: {
#     'relation_extraction': {
#         'founded': [('Elon Musk', 'SpaceX')],
#         'located_in': [('SpaceX', 'Hawthorne, California')]
#     }
# }

Batch Relation Extraction

texts = [
    "John works for Microsoft and lives in Seattle.",
    "Sarah founded TechStartup in 2020.",
    "Bob reports to Alice at Google."
]

results = extractor.batch_extract_relations(
    texts,
    ["works_for", "founded", "reports_to", "lives_in"]
)
# Returns list of relation extraction results for each text

Combined Schemas

Combine entities, classification, relations, and structured extraction in a single call.

extractor = GLiNER2.from_api()

text = """
Tech Review: The new MacBook Pro M3 is absolutely fantastic! Apple has outdone themselves.
I tested it in San Francisco last week. Tim Cook works for Apple, which is located in Cupertino.
Highly recommended for developers. Rating: 5 out of 5 stars.
"""

schema = (extractor.create_schema()
    .entities(["company", "product", "location", "person"])
    .classification("sentiment", ["positive", "negative", "neutral"])
    .relations(["works_for", "located_in"])
    .structure("review")
        .field("product_name", dtype="str")
        .field("rating", dtype="str")
        .field("recommendation", dtype="str")
)

results = extractor.extract(text, schema)
# Output: {
#     'entities': {
#         'company': ['Apple'],
#         'product': ['MacBook Pro M3'],
#         'location': ['San Francisco', 'Cupertino'],
#         'person': ['Tim Cook']
#     },
#     'sentiment': 'positive',
#     'relation_extraction': {
#         'works_for': [('Tim Cook', 'Apple')],
#         'located_in': [('Apple', 'Cupertino')]
#     },
#     'review': [{
#         'product_name': 'MacBook Pro M3',
#         'rating': '5 out of 5 stars',
#         'recommendation': 'Highly recommended for developers'
#     }]
# }

Batch Processing

Process multiple texts efficiently in a single API call.

extractor = GLiNER2.from_api()

texts = [
    "Google's Sundar Pichai unveiled Gemini AI in Mountain View.",
    "Microsoft CEO Satya Nadella announced Copilot at Build 2023.",
    "Amazon's Andy Jassy revealed new AWS services in Seattle."
]

results = extractor.batch_extract_entities(
    texts,
    ["company", "person", "product", "location"]
)

for i, result in enumerate(results):
    print(f"Text {i+1}: {result}")

Confidence Scores and Character Positions

Entity Extraction with Confidence

# Include confidence scores
results = extractor.extract_entities(
    "Apple released the iPhone 15 in September 2023.",
    ["company", "product", "date"],
    include_confidence=True
)
# Each entity includes: {'text': '...', 'confidence': 0.95}

Entity Extraction with Character Positions

# Include character-level start/end positions
results = extractor.extract_entities(
    "Apple released the iPhone 15.",
    ["company", "product"],
    include_spans=True
)
# Each entity includes: {'text': '...', 'start': 0, 'end': 5}

Both Confidence and Positions

# Include both confidence and character positions
results = extractor.extract_entities(
    "Apple released the iPhone 15 in September 2023.",
    ["company", "product", "date"],
    include_confidence=True,
    include_spans=True
)
# Each entity includes: {'text': '...', 'confidence': 0.95, 'start': 0, 'end': 5}

Raw Results (Advanced)

For full control over the extraction data:

results = extractor.extract_entities(
    "Apple CEO Tim Cook announced new products.",
    ["company", "person"],
    format_results=False,  # Get raw extraction data
    include_confidence=True,
    include_spans=True
)
# Returns tuples: (text, confidence, start_char, end_char)

Error Handling

from gliner2 import GLiNER2, GLiNER2APIError, AuthenticationError, ValidationError

try:
    extractor = GLiNER2.from_api()
    results = extractor.extract_entities(text, entity_types)
    
except AuthenticationError:
    print("Invalid API key. Check your PIONEER_API_KEY.")
    
except ValidationError as e:
    print(f"Invalid request: {e}")
    
except GLiNER2APIError as e:
    print(f"API error: {e}")

Connection Settings

extractor = GLiNER2.from_api(
    api_key="your-key",
    timeout=60.0,      # Request timeout (seconds)
    max_retries=5      # Retry failed requests
)

API vs Local

Feature	API (`from_api()`)	Local (`from_pretrained()`)
Setup	Just API key	GPU/CPU + model download
Memory	~0 MB	2-8 GB+
Latency	Network dependent	Faster for single texts
Batch	Optimized	Optimized
Cost	Per request	Free after setup
Offline	❌	✅
RegexValidator	❌	✅

When to Use API

Production deployments without GPU
Serverless functions (AWS Lambda, etc.)
Quick prototyping
Low-memory environments
Mobile/edge applications

When to Use Local

High-volume processing
Offline requirements
Sensitive data (no network transfer)
Need for RegexValidator
Cost optimization at scale

Seamless Switching

The API mirrors the local interface exactly, making switching trivial:

# Development: Use API for quick iteration
extractor = GLiNER2.from_api()

# Production: Switch to local if needed
# extractor = GLiNER2.from_pretrained("your-model")

# Same code works with both!
results = extractor.extract_entities(text, entity_types)

Limitations

The API currently does not support:

RegexValidator - Use local model for regex-based filtering
Multi-schema batch - Different schemas per text in batch (works but slower)
Custom models - API uses the default GLiNER2 model

Best Practices

Store API key securely - Use environment variables, not hardcoded strings
Handle errors gracefully - Network issues can occur
Use batch processing - More efficient than individual calls
Set appropriate timeouts - Increase for large texts
Cache results - Avoid redundant API calls for same content

13 KiB Raw Permalink Blame History