Skip to main content
POST
/
v1
/
projects
/
extract
/
{project_id}
from retab import Retab

client = Retab()

completion = client.projects.extract(
    project_id="proj_F0FE8DFqyouQdZXDTWRg0",
    document="invoice.jpeg"
)

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1744316542,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "{\"invoice_number\": \"INV-42\", \"total_amount\": 123.45}",
        "parsed": {
          "invoice_number": "INV-42", 
          "total_amount": 123.45
        }
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 1250,
    "completion_tokens": 35,
    "total_tokens": 1285
  },
  "likelihoods": {
    "invoice_number": 0.95,
    "total_amount": 0.87
  },
  "extraction_id": "ext_1234567890"
}
from retab import Retab

client = Retab()

completion = client.projects.extract(
    project_id="proj_F0FE8DFqyouQdZXDTWRg0",
    document="invoice.jpeg"
)

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1744316542,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "{\"invoice_number\": \"INV-42\", \"total_amount\": 123.45}",
        "parsed": {
          "invoice_number": "INV-42", 
          "total_amount": 123.45
        }
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 1250,
    "completion_tokens": 35,
    "total_tokens": 1285
  },
  "likelihoods": {
    "invoice_number": 0.95,
    "total_amount": 0.87
  },
  "extraction_id": "ext_1234567890"
}

Authorizations

Api-Key
string
header
required

Headers

Idempotency-Key
string | null

Optional unique key to ensure idempotent requests. If the same key is used again, the cached result will be returned.

Path Parameters

project_id
string
required

Query Parameters

access_token
string | null

Body

multipart/form-data
document
file | null

The document file to extract data from (PDF, image, etc.)

model
string | null

The model to use for extraction (e.g., 'gpt-4o'). If not specified, uses the project's default model.

temperature
number | null

Sampling temperature for the model (0.0 to 2.0). Lower values are more deterministic.

image_resolution_dpi
integer | null

DPI resolution for image processing. Higher values increase quality but also cost.

n_consensus
integer | null

Number of extraction runs to perform for consensus voting. Higher values improve accuracy.

seed
integer | null

Random seed for reproducible extractions.

store
boolean
default:true

Whether to store the extraction result in the database.

metadata
string | null

Custom key-value metadata as a JSON string to attach to the extraction.

extraction_id
string | null

Optional custom ID for the extraction. If not provided, one will be generated.

Response

Successful Response

id
string
required
choices
RetabParsedChoice · object[]
required
created
integer
required
model
string
required
object
string
required
Allowed value: "chat.completion"
service_tier
enum<string> | null
Available options:
auto,
default,
flex,
scale,
priority
system_fingerprint
string | null
usage
object | null
extraction_id
string | null
likelihoods
object | null

Object defining the uncertainties of the fields extracted when using consensus. Follows the same structure as the extraction object.

requires_human_review
boolean
default:false

Flag indicating if the extraction requires human review

request_at
string<date-time> | null

Timestamp of the request

first_token_at
string<date-time> | null

Timestamp of the first token of the document. If non-streaming, set to last_token_at

last_token_at
string<date-time> | null

Timestamp of the last token of the document