Files - Retab Docs

Introduction

The Files API lets you upload, manage, and retrieve documents stored in Retab. Files are the foundation of document processing: once uploaded, a file can be reused across classify, split, extract, parse, and workflow calls without sending the bytes again. The module exposes four methods:

Method	Purpose
`upload`	Upload a document and receive a durable `MIMEData` reference for future requests.
`list`	List uploaded files with pagination, filename prefix search, and MIME type filtering.
`get`	Retrieve metadata for a single file by ID.
`get_download_link`	Get a temporary signed URL (60 min) to download the original file.

Uploading files

SDK uploads use a direct-to-storage flow. The SDK first creates an upload session, uploads the bytes to the signed storage URL, then completes the upload and returns MIMEData.

from retab import Retab
from pathlib import Path

client = Retab()

# Create an upload session for a local file.
invoice_path = Path("invoice.pdf")
session = client.files.create_upload(
    filename=invoice_path.name,
    size_bytes=invoice_path.stat().st_size,
    content_type="application/pdf",
)
mime_data = session.mime_data
print(f"Filename: {mime_data.filename}")
print(f"URL: {mime_data.url}")

The returned url has the form https://storage.retab.com/file_.... It is an opaque Retab URL, not a public signed URL, and can be passed to later processing requests without sending the file bytes again.

Large documents: avoid inline uploads

When you pass a local file path directly to an SDK processing call, the SDK may send the document as inline MIME/base64 data. This is convenient for small files, but large scanned PDFs can make the request body too large and trigger 413 Request Entity Too Large. For large documents, use one of these URL-backed flows instead:

Preferred: use your own object-storage URL. Retab fetches the file server-side, so the document bytes are not sent inline in the API request. Use a time-limited signed URL when the object is private.
Alternative: upload to Retab first. The SDK uploads the file once, then you pass the returned Retab storage URL to classify, split, extract, parse, or workflow calls.

URL-backed remote documents are streamed into Retab storage and capped at 2 GiB (2,147,483,648 bytes) per document.

Option 1: object-storage URL

Pass an HTTPS URL from object storage directly as the document. Supported remote URL hosts include:

Provider	Supported URL shape
Azure Blob Storage	`https://<account>.blob.core.windows.net/...`
Google Cloud Storage	`https://storage.googleapis.com/...` or `https://<bucket>.storage.googleapis.com/...`
Amazon S3	`https://<bucket>.s3.<region>.amazonaws.com/...` or other `amazonaws.com` S3 URLs
Cloudflare R2	`https://<account>.r2.cloudflarestorage.com/...` and public `https://<public-id>.r2.dev/...` URLs

Custom domains are not fetched by default. Contact support if you need a custom storage hostname allowlisted. For private files, generate a signed URL with enough time for Retab to fetch the document.

from retab import Retab

client = Retab(api_key="YOUR_RETAB_API_KEY")

schema = {
"type": "object",
"properties": {
"invoice_number": {"type": "string"},
"total_amount": {"type": "number"},
},
}

azure_blob_url = "https://<account>.blob.core.windows.net/<container>/large_document.pdf?<sas_token>"

extraction = client.extractions.create(
document=azure_blob_url,
model="retab-small",
json_schema=schema,
)

print(extraction.output)

Option 2: upload to Retab, then reuse the URL

If you do not have an object-storage URL available, upload the file to Retab first and use the returned mime_ref.url.

from retab import Retab

client = Retab(api_key="YOUR_RETAB_API_KEY")

session = client.files.create_upload(
    filename="large_document.pdf",
    size_bytes=12345,
    content_type="application/pdf",
)
mime_ref = session.mime_data

extraction = client.extractions.create(
document=mime_ref.url,
model="retab-small",
json_schema={
"type": "object",
"properties": {
"invoice_number": {"type": "string"},
"total_amount": {"type": "number"},
},
},
)

You can also pass document=mime_ref directly. Passing mime_ref.url is equivalent for Retab storage URLs: the backend parses the file ID and resolves it against the authenticated caller’s organization before processing.

Security model

Signed object-storage URLs are bearer URLs controlled by you. Keep them time-limited and scoped to the single document being processed. Public object-storage URLs, such as public Cloudflare R2 r2.dev URLs, can also be fetched but are not access-restricted by the URL itself. Retab storage URLs such as https://storage.retab.com/file_... are different: they are opaque Retab file references, not public download links. Retab resolves the file ID against the authenticated caller’s organization. If the file is missing, belongs to another organization, or is not fully uploaded, the request is rejected.

The file data structure

File Object

object

Show properties

string

Unique file identifier, prefixed with file_.

object

string

Always "file".

filename

string

The original filename of the uploaded document.

page_count

integer | null

Number of pages in the document (if applicable).

created_at

string

ISO 8601 timestamp of when the file was uploaded.

updated_at

string

ISO 8601 timestamp of the last update.

File Object

{
  "id": "file_a1b2c3d4e5f6",
  "object": "file",
  "filename": "invoice.pdf",
  "page_count": 3,
  "created_at": "2024-01-15T10:30:00Z",
  "updated_at": "2024-01-15T10:30:00Z"
}

Listing and filtering

Use list to browse uploaded files with id-based pagination:

# List recent files
files = client.files.list(limit=20)
for f in files:
    print(f"{f.id}: {f.filename}")

# Filter by filename prefix

pdfs = client.files.list(filename="invoice", mime_type="application/pdf")

Downloading files

Retrieve a time-limited signed URL to download the original file:

link = client.files.get_download_link("file_a1b2c3d4e5f6")
print(f"Download URL: {link.download_url}")
print(f"Expires in: {link.expires_in}")

​Introduction

​Uploading files

​Large documents: avoid inline uploads

​Option 1: object-storage URL

​Option 2: upload to Retab, then reuse the URL

​Security model

​The file data structure

​Listing and filtering

​Downloading files

Introduction

Uploading files

Large documents: avoid inline uploads

Option 1: object-storage URL

Option 2: upload to Retab, then reuse the URL

Security model

The file data structure

Listing and filtering

Downloading files