Extractions are the results of document processing. Each extraction contains the structured data extracted from a document, along with metadata about the extraction process. You can list, filter, retrieve, update, and download extractions programmatically.
Retrieve a paginated list of extractions with optional filtering by date, origin, review status, or custom metadata.
from datetime import datetime
from retab import Retab
client = Retab()
# List recent extractions
extractions = client.extractions.list(
limit=10,
order="desc"
)
# Filter by metadata
extractions = client.extractions.list(
metadata={"organization_id": "org_acme_corp"},
limit=50
)
# Filter by date range
extractions = client.extractions.list(
from_date=datetime(2024, 1, 1),
to_date=datetime(2024, 12, 31)
)
Parameters
Maximum number of extractions to return per page.
Sort order by creation date. Either "asc" or "desc".
Cursor for pagination - return extractions before this ID.
Cursor for pagination - return extractions after this ID.
Filter extractions created on or after this date. Use datetime in Python or Date in JavaScript.
Filter extractions created on or before this date. Use datetime in Python or Date in JavaScript.
Filter by custom metadata key-value pairs.
Filter by review status: "success", "review_required", or "reviewed".
Retrieve a single extraction by its ID.
from retab import Retab
client = Retab()
extraction = client.extractions.get("extr_01G34H8J2K")
print(extraction)
Update an extraction’s predictions, review status, or other properties.
from retab import Retab
client = Retab()
# Update predictions after human review
updated = client.extractions.update(
extraction_id="extr_01G34H8J2K",
predictions={
"invoice_number": "INV-2024-0789-CORRECTED",
"total_amount": 1576.75
},
human_review_status="reviewed"
)
Download extractions in bulk as JSONL, CSV, or XLSX format.
from datetime import datetime
from retab import Retab
client = Retab()
# Get download URL for JSONL export
result = client.extractions.download(
format="jsonl",
from_date=datetime(2024, 1, 1),
metadata={"organization_id": "org_acme_corp"}
)
print(f"Download URL: {result['download_url']}")
print(f"Expires at: {result['expires_at']}")
Download Parameters
Export format: "jsonl", "csv", or "xlsx".
Filter extractions created on or after this date. Use datetime in Python or Date in JavaScript.
Filter extractions created on or before this date. Use datetime in Python or Date in JavaScript.
Filter by custom metadata.
Metadata filtering is powerful for organizing extractions across multiple clients or workflows. When you attach metadata during extraction, you can later filter by those same keys.
from retab import Retab
client = Retab()
# List all extractions for a specific organization
org_extractions = client.extractions.list(
metadata={"organization_id": "org_acme_corp"},
limit=100
)
# Download all extractions from a specific batch
batch_download = client.extractions.download(
format="csv",
metadata={"batch_id": "batch_2024_04"}
)
Please check the API Reference for complete method documentation.