Extract

from retab import Retab client = Retab() response = client.documents.extract( json_schema = "Invoice_schema.json", document = "Invoice.pdf", model="retab-micro", ) print(response.data) print(response.likelihoods) print(response.extraction_id)

{ "id": "chatcmpl-AoBs45TNWTB1VKGSXV7NAwCnxMaNN", "choices": [ { "finish_reason": "stop", "index": 0, "logprobs": null, "message": { "content": "{\"name\": \"Confirmation d'affr\\u00e9tement\", \"date\": \"2024-11-08\"}", "refusal": null, "role": "assistant", "audio": null, "function_call": null, "tool_calls": [], "parsed": { "name": "Confirmation d'affr\u00e9tement", "date": "2024-11-08" } } } ], "created": 1736525396, "model": "retab-micro", "object": "chat.completion", "extraction_id": "extr_01HZX0ABCDEF123456", "usage": { "completion_tokens": 20, "prompt_tokens": 2760, "total_tokens": 2780, "completion_tokens_details": { "accepted_prediction_tokens": 0, "audio_tokens": 0, "reasoning_tokens": 0, "rejected_prediction_tokens": 0 }, "prompt_tokens_details": { "audio_tokens": 0, "cached_tokens": 0 } }, "likelihoods": { "name": 0.7227993785831323, "date": 0.7306298416895017 } }

Authorizations

Api-Key

string

header

required

Headers

Idempotency-Key

string | null

Query Parameters

access_token

string | null

Body

application/json

document

MIMEData · object

required

Document to be analyzed

Show child attributes

model

string

required

Model used for chat completion

json_schema

Json Schema · object

required

JSON schema format used to validate the output data.

image_resolution_dpi

integer

default:192

Resolution of the image sent to the LLM

Required range: 96 <= x <= 300

n_consensus

integer

default:1

Number of consensus models to use for extraction.

stream

boolean

default:false

If true, the extraction will be streamed to the user using the active WebSocket connection

chunking_keys

Chunking Keys · object

If set, keys to be used for the extraction of long lists of data using Parallel OCR

Show child attributes

Example:

{
  "products": "identity.id",
  "properties": "ID"
}

metadata

Metadata · object

User-defined metadata to associate with this extraction

Show child attributes

extraction_id

string | null

Extraction ID to use for this extraction. If not provided, a new ID will be generated.

additional_messages

ChatCompletionRetabMessage · object[] | null

Additional chat messages to append after the document content messages. Useful for providing extra context or instructions.

Show child attributes

bust_cache

boolean

default:false

If true, skip the LLM cache and force a fresh completion

Response

Successful Response

string

required

choices

RetabParsedChoice · object[]

required

Show child attributes

created

integer

required

model

string

required

object

string

required

Allowed value: "chat.completion"

data

any

required

The extracted structured data. Shortcut for choices[0].message.parsed.

text

string | null

required

The raw JSON content string. Shortcut for choices[0].message.content.

service_tier

enum<string> | null

Available options:

auto,

default,

flex,

scale,

priority

system_fingerprint

string | null

usage

CompletionUsage · object

Usage statistics for the completion request.

CompletionUsage
RetabUsage

Show child attributes

extraction_id

string | null

likelihoods

Likelihoods · object

Object defining the uncertainties of the fields extracted when using consensus. Follows the same structure as the extraction object.

request_at

string<date-time> | null

Timestamp of the request

first_token_at

string<date-time> | null

Timestamp of the first token of the document. If non-streaming, set to last_token_at

last_token_at

string<date-time> | null

Timestamp of the last token of the document

API Reference

Files

Schemas

Documents

Edit

Workflows

Jobs

Projects

Evals

Extractions

Authorizations

Headers

Query Parameters

Body

Response