GLiNER2 Regex Validators

Regex validators filter extracted spans to ensure they match expected patterns, improving extraction quality and reducing false positives.

Quick Start

from gliner2 import GLiNER2, RegexValidator

extractor = GLiNER2.from_pretrained("your-model")

# Create validator and apply to field
email_validator = RegexValidator(r"^[\w\.-]+@[\w\.-]+\.\w+$")
schema = (extractor.create_schema()
    .structure("contact")
        .field("email", dtype="str", validators=[email_validator])
)

RegexValidator Parameters

pattern: Regex pattern (string or compiled Pattern)
mode: "full" (exact match) or "partial" (substring match)
exclude: False (keep matches) or True (exclude matches)
flags: Regex flags like re.IGNORECASE (for string patterns only)

Examples

Email Validation

email_validator = RegexValidator(r"^[\w\.-]+@[\w\.-]+\.\w+$")

text = "Contact: john@company.com, not-an-email, jane@domain.org"
# Output: ['john@company.com', 'jane@domain.org']

Phone Numbers (US Format)

phone_validator = RegexValidator(r"\(\d{3}\)\s\d{3}-\d{4}", mode="partial")

text = "Call (555) 123-4567 or 5551234567"
# Output: ['(555) 123-4567']  # Second number filtered out

URLs Only

url_validator = RegexValidator(r"^https?://", mode="partial")

text = "Visit https://example.com or www.site.com"
# Output: ['https://example.com']  # www.site.com filtered out

Exclude Test Data

no_test_validator = RegexValidator(r"^(test|demo|sample)", exclude=True, flags=re.IGNORECASE)

text = "Products: iPhone, Test Phone, Samsung Galaxy"
# Output: ['iPhone', 'Samsung Galaxy']  # Test Phone excluded

Length Constraints

length_validator = RegexValidator(r"^.{5,50}$")  # 5-50 characters

text = "Names: Jo, Alexander, A Very Long Name That Exceeds Fifty Characters"
# Output: ['Alexander']  # Others filtered by length

Multiple Validators

# All validators must pass
username_validators = [
    RegexValidator(r"^[a-zA-Z0-9_]+$"),  # Alphanumeric + underscore
    RegexValidator(r"^.{3,20}$"),        # 3-20 characters
    RegexValidator(r"^(?!admin)", exclude=True, flags=re.IGNORECASE)  # No "admin"
]

schema = (extractor.create_schema()
    .structure("user")
        .field("username", dtype="str", validators=username_validators)
)

text = "Users: ab, john_doe, user@domain, admin, valid_user123"
# Output: ['john_doe', 'valid_user123']

Common Patterns

Use Case	Pattern	Mode
Email	`r"^[\w\.-]+@[\w\.-]+\.\w+$"`	full
Phone (US)	`r"$\d{3}$\s\d{3}-\d{4}"`	partial
URL	`r"^https?://"`	partial
Numbers only	`r"^\d+$"`	full
No spaces	`r"^\S+$"`	full
Min length	`r"^.{5,}$"`	full
Alphanumeric	`r"^[a-zA-Z0-9]+$"`	full

Best Practices

Use specific patterns - More specific = fewer false positives
Test your regex - Validate patterns before deployment
Combine validators - Chain multiple simple validators
Consider case sensitivity - Use re.IGNORECASE when needed
Start simple - Begin with basic patterns, refine as needed

Performance Notes

Validators run after span extraction but before formatting
Failed validation simply excludes the span (no errors)
Multiple validators use short-circuit evaluation (stops at first failure)
Compiled patterns are cached automatically

3.4 KiB Raw Permalink Blame History

GLiNER2 Regex Validators

Quick Start

RegexValidator Parameters

Examples

Email Validation

Phone Numbers (US Format)

URLs Only

Exclude Test Data

Length Constraints

Multiple Validators

Common Patterns

Best Practices

Performance Notes

3.4 KiB

Raw Permalink Blame History