Skip to main content
POST
/
v1
/
documents
/
extract
from retab import Retab

client = Retab()
response = client.documents.extract(
    json_schema = "Invoice_schema.json",
    document = "Invoice.pdf",
    model="retab-micro",
)

print(response.data)
print(response.likelihoods)
print(response.extraction_id)
{
    "id": "chatcmpl-AoBs45TNWTB1VKGSXV7NAwCnxMaNN",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "{\"name\": \"Confirmation d'affr\\u00e9tement\", \"date\": \"2024-11-08\"}",
                "refusal": null,
                "role": "assistant",
                "audio": null,
                "function_call": null,
                "tool_calls": [],
                "parsed": {
                    "name": "Confirmation d'affr\u00e9tement",
                    "date": "2024-11-08"
                }
            }
        }
    ],
    "created": 1736525396,
    "model": "retab-micro",
    "object": "chat.completion",
    "extraction_id": "extr_01HZX0ABCDEF123456",
    "usage": {
        "completion_tokens": 20,
        "prompt_tokens": 2760,
        "total_tokens": 2780,
        "completion_tokens_details": {
            "accepted_prediction_tokens": 0,
            "audio_tokens": 0,
            "reasoning_tokens": 0,
            "rejected_prediction_tokens": 0
        },
        "prompt_tokens_details": {
            "audio_tokens": 0,
            "cached_tokens": 0
        }
    },
    "likelihoods": {
        "name": 0.7227993785831323,
        "date": 0.7306298416895017
    }
}

from retab import Retab

client = Retab()
response = client.documents.extract(
    json_schema = "Invoice_schema.json",
    document = "Invoice.pdf",
    model="retab-micro",
)

print(response.data)
print(response.likelihoods)
print(response.extraction_id)
{
    "id": "chatcmpl-AoBs45TNWTB1VKGSXV7NAwCnxMaNN",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "{\"name\": \"Confirmation d'affr\\u00e9tement\", \"date\": \"2024-11-08\"}",
                "refusal": null,
                "role": "assistant",
                "audio": null,
                "function_call": null,
                "tool_calls": [],
                "parsed": {
                    "name": "Confirmation d'affr\u00e9tement",
                    "date": "2024-11-08"
                }
            }
        }
    ],
    "created": 1736525396,
    "model": "retab-micro",
    "object": "chat.completion",
    "extraction_id": "extr_01HZX0ABCDEF123456",
    "usage": {
        "completion_tokens": 20,
        "prompt_tokens": 2760,
        "total_tokens": 2780,
        "completion_tokens_details": {
            "accepted_prediction_tokens": 0,
            "audio_tokens": 0,
            "reasoning_tokens": 0,
            "rejected_prediction_tokens": 0
        },
        "prompt_tokens_details": {
            "audio_tokens": 0,
            "cached_tokens": 0
        }
    },
    "likelihoods": {
        "name": 0.7227993785831323,
        "date": 0.7306298416895017
    }
}

Authorizations

Api-Key
string
header
required

Headers

Idempotency-Key
string | null

Query Parameters

access_token
string | null

Body

application/json
document
MIMEData · object
required

Document to be analyzed

model
string
required

Model used for chat completion

json_schema
Json Schema · object
required

JSON schema format used to validate the output data.

image_resolution_dpi
integer
default:192

Resolution of the image sent to the LLM

Required range: 96 <= x <= 300
n_consensus
integer
default:1

Number of consensus models to use for extraction.

stream
boolean
default:false

If true, the extraction will be streamed to the user using the active WebSocket connection

chunking_keys
Chunking Keys · object

If set, keys to be used for the extraction of long lists of data using Parallel OCR

Example:
{
  "products": "identity.id",
  "properties": "ID"
}
metadata
Metadata · object

User-defined metadata to associate with this extraction

extraction_id
string | null

Extraction ID to use for this extraction. If not provided, a new ID will be generated.

additional_messages
ChatCompletionRetabMessage · object[] | null

Additional chat messages to append after the document content messages. Useful for providing extra context or instructions.

bust_cache
boolean
default:false

If true, skip the LLM cache and force a fresh completion

Response

Successful Response

id
string
required
choices
RetabParsedChoice · object[]
required
created
integer
required
model
string
required
object
string
required
Allowed value: "chat.completion"
data
any
required

The extracted structured data. Shortcut for choices[0].message.parsed.

text
string | null
required

The raw JSON content string. Shortcut for choices[0].message.content.

service_tier
enum<string> | null
Available options:
auto,
default,
flex,
scale,
priority
system_fingerprint
string | null
usage
CompletionUsage · object

Usage statistics for the completion request.

extraction_id
string | null
likelihoods
Likelihoods · object

Object defining the uncertainties of the fields extracted when using consensus. Follows the same structure as the extraction object.

request_at
string<date-time> | null

Timestamp of the request

first_token_at
string<date-time> | null

Timestamp of the first token of the document. If non-streaming, set to last_token_at

last_token_at
string<date-time> | null

Timestamp of the last token of the document