Introduction
The Files API lets you upload, manage, and retrieve documents stored in Retab. Files are the foundation of document processing: once uploaded, a file can be reused across classify, split, extract, parse, and workflow calls without sending the bytes again. The module exposes four methods:| Method | Purpose |
|---|---|
upload | Upload a document and receive a durable MIMEData reference for future requests. |
list | List uploaded files with pagination, filename prefix search, and MIME type filtering. |
get | Retrieve metadata for a single file by ID. |
get_download_link | Get a temporary signed URL (60 min) to download the original file. |
Uploading files
SDK uploads use a direct-to-storage flow. The SDK first creates an upload session, uploads the bytes to the signed storage URL, then completes the upload and returnsMIMEData.
url has the form https://storage.retab.com/file_.... It is an opaque Retab URL, not a public signed URL, and can be passed to later processing requests without sending the file bytes again.
Large documents: avoid inline uploads
When you pass a local file path directly to an SDK processing call, the SDK may send the document as inline MIME/base64 data. This is convenient for small files, but large scanned PDFs can make the request body too large and trigger413 Request Entity Too Large.
For large documents, use one of these URL-backed flows instead:
- Preferred: use your own object-storage URL. Retab fetches the file server-side, so the document bytes are not sent inline in the API request. Use a time-limited signed URL when the object is private.
- Alternative: upload to Retab first. The SDK uploads the file once, then you pass the returned Retab storage URL to classify, split, extract, parse, or workflow calls.
Option 1: object-storage URL
Pass an HTTPS URL from object storage directly as thedocument.
Supported remote URL hosts include:
| Provider | Supported URL shape |
|---|---|
| Azure Blob Storage | https://<account>.blob.core.windows.net/... |
| Google Cloud Storage | https://storage.googleapis.com/... or https://<bucket>.storage.googleapis.com/... |
| Amazon S3 | https://<bucket>.s3.<region>.amazonaws.com/... or other amazonaws.com S3 URLs |
| Cloudflare R2 | https://<account>.r2.cloudflarestorage.com/... and public https://<public-id>.r2.dev/... URLs |
Option 2: upload to Retab, then reuse the URL
If you do not have an object-storage URL available, upload the file to Retab first and use the returnedmime_ref.url.
document=mime_ref directly. Passing mime_ref.url is equivalent for Retab storage URLs: the backend parses the file ID and resolves it against the authenticated caller’s organization before processing.
Security model
Signed object-storage URLs are bearer URLs controlled by you. Keep them time-limited and scoped to the single document being processed. Public object-storage URLs, such as public Cloudflare R2r2.dev URLs, can also be fetched but are not access-restricted by the URL itself.
Retab storage URLs such as https://storage.retab.com/file_... are different: they are opaque Retab file references, not public download links. Retab resolves the file ID against the authenticated caller’s organization. If the file is missing, belongs to another organization, or is not fully uploaded, the request is rejected.
The file data structure
File Object
Listing and filtering
Uselist to browse uploaded files with id-based pagination: