Node Reference - Retab Docs

Overview

Workflow nodes are the building blocks of document processing pipelines. Each node has specific inputs, outputs, and configuration options.

Node Categories

Nodes are organized into three categories:

Category	Purpose
Core	Workflow entry/exit points and utilities
Tools	Document processing operations
Logic	Conditional flows and data transformations

Core Nodes

Document (Start)

Document

The entry point for your workflow. Upload documents here for processing.

Outputs: File Configuration: None (just upload your document in Run mode) Supported Formats:

PDF documents
Images (PNG, JPG, JPEG, GIF, WebP, TIFF, BMP)
Microsoft Word (.docx, .doc)
Microsoft Excel (.xlsx, .xls)
Microsoft PowerPoint (.pptx, .ppt)

Usage:

Drag multiple Document nodes for workflows that combine multiple inputs
Each Document node can receive one file per workflow run
Files are automatically converted to PDF when required by downstream nodes

JSON Input (Start)

JSON Input

Entry point for structured JSON data. Define a schema and pass JSON data when running the workflow.

Outputs: JSON Configuration:

Setting	Description
JSON Schema	Define the structure of expected input data

Schema Example:

{
  "type": "object",
  "properties": {
    "customer_id": { "type": "string" },
    "priority": { "type": "string", "enum": ["low", "medium", "high"] },
    "metadata": {
      "type": "object",
      "properties": {
        "source": { "type": "string" }
      }
    }
  },
  "required": ["customer_id"]
}

Usage:

Use JSON Input nodes to pass structured data into workflows without documents
Combine with Document nodes to enrich extractions with external data
Connect to Extract nodes as additional context for AI-powered extraction
Use with Formula or If/Else nodes for data-driven workflow logic

SDK Example:

run = client.workflows.runs.create(
    workflow_id="wf_abc123",
    json_inputs={
        "json-node-id": {"customer_id": "cust_123", "priority": "high"}
    }
)

Webhook (End)

Webhook

Send workflow outputs to an external HTTP endpoint.

Inputs: File, JSON Configuration:

Setting	Description
Webhook URL	The HTTPS endpoint to receive the data
Headers	Custom HTTP headers (e.g., authentication tokens)

Example Configuration:

{
  "webhook_url": "https://api.yourapp.com/webhook",
  "webhook_headers": {
    "Authorization": "Bearer your-token",
    "X-Custom-Header": "value"
  }
}

Webhook Payload: The webhook receives a JSON payload containing:

completion: The extraction result with parsed data
file_payload: Document metadata including filename and URL
user: User email address (if authenticated)
metadata: Additional workflow metadata

See Webhooks for details on securing and handling webhook requests.

Note

Add comments and documentation to your workflow.

Inputs: None
Outputs: None Notes don’t affect workflow execution—they’re purely for documentation. Use them to:

Explain complex logic
Document configuration choices
Leave instructions for teammates

Tools Nodes

Extract

Extract structured data from documents using a JSON schema.

Inputs: File, plus optional additional inputs (JSON, File) Outputs: JSON (data), JSON (likelihoods when using consensus) Configuration:

Setting	Description	Default
Schema	JSON Schema defining fields to extract	`{}`
Model	AI model for extraction	`retab-small`
Image Resolution	DPI for document rendering	`150`
Consensus	Number of parallel extractions (1-10)	`1`
Additional Inputs	Named inputs for context (JSON or files)	`[]`

Schema Example:

{
  "type": "object",
  "properties": {
    "invoice_number": { "type": "string" },
    "total_amount": { "type": "number" },
    "vendor": {
      "type": "object",
      "properties": {
        "name": { "type": "string" },
        "address": { "type": "string" }
      }
    }
  }
}

Consensus Mode:
When n_consensus > 1, the node runs multiple extractions in parallel and returns:

data: The consensus result
likelihoods: Confidence scores for each field (0-1)

Additional Inputs: You can add named input handles to provide extra context during extraction:

JSON inputs: Structured data from other nodes
File inputs: Additional reference documents

Parse

Parse documents into machine-usable JSON with a Pages array.

Inputs: File
Outputs: JSON ({"Pages":[...]}) Configuration:

Setting	Description	Default
Model	AI model for parsing	`retab-small`
Image Resolution	DPI for document rendering	`150`

Use Cases:

Generate page-by-page JSON content for indexing/RAG pipelines
Add parsed page context to Extract as additional JSON input
Convert scanned PDFs to machine-usable page content

Split

Split multi-page documents into separate PDFs by subdocument.

Inputs: File (PDF)
Outputs: Multiple File outputs (one per subdocument) Configuration:

Setting	Description
Subdocuments	List of document subdocuments with names and descriptions
Model	AI model for classification

Subdocument Example:

{
  "subdocuments": [
    {
      "name": "Invoice",
      "description": "Pages containing invoice details, line items, and totals"
    },
    {
      "name": "Contract",
      "description": "Legal agreement pages with terms and signatures"
    },
    {
      "name": "Supporting Documents",
      "description": "Receipts, certificates, and other attachments"
    }
  ]
}

Each category creates a separate output handle. Connect downstream nodes to process each category differently.

Non-PDF documents are automatically converted to PDF before splitting.

Classifier

Classify documents into one of the predefined categories.

Inputs: File
Outputs: Multiple File outputs (one per category, only the matched category receives the document) Configuration:

Setting	Description
Categories	List of document categories with names and descriptions
Model	AI model for classification

Difference from Split:

Feature	Split	Classifier
Input	Multi-page document	Single document
Output	Multiple PDFs (pages grouped by category)	Same document routed to one category
Use Case	Separating bundled documents	Routing different document types

Example:

{
  "categories": [
    {
      "name": "Invoice",
      "description": "Documents containing billing information and payment requests"
    },
    {
      "name": "Receipt",
      "description": "Proof of payment or purchase confirmation"
    },
    {
      "name": "Contract",
      "description": "Legal agreements and binding documents"
    }
  ]
}

The AI analyzes the document and routes it to exactly one category. Downstream nodes connected to other categories are skipped.

Edit

Fill PDF forms using AI with JSON instructions or pre-defined templates.

Inputs:

File (document to edit) — only when “Use Template” is off
JSON (instructions/data)

Outputs: File (filled document) Configuration:

Setting	Description
Model	AI model to use for filling
Use Template	Toggle to use a pre-defined template instead of an input document
Template	Template to use (shown when “Use Template” is on)

Two Operating Modes:

Document Mode (default): Edit an input document using AI
- Connect a document from another node (e.g., Start, Split)
- Provide instructions via the JSON input
- The AI fills the form fields based on your instructions
Template Mode: Fill a pre-defined template
- Enable “Use Template” toggle
- Select a template from your template library
- No document input required — the template provides the form structure
- Provide data/instructions via the JSON input

Example Instructions (JSON):

{
  "name": "Acme Corp",
  "address": "12 Main St",
  "notes": "Leave signature fields blank"
}

The JSON input provides instructions/data for filling. Connect this to an Extract node’s JSON output for end-to-end automation.

Logic Nodes

Human in the Loop (HIL)

Human in the Loop

Pause workflow execution for human review and approval.

Inputs: JSON
Outputs: JSON (verified data) Configuration: None (inherits schema from connected source) How It Works:

Connect to an Extract or Formula node (schema is automatically inherited)
When the workflow runs, it pauses at the HIL node
A reviewer sees the extracted data alongside the source document
The reviewer can approve, modify, or reject the data
After approval, the verified data continues through the workflow

Use Cases:

Validate critical extractions before sending to downstream systems
Quality control for high-value documents
Compliance requirements that mandate human oversight

The HIL node automatically inherits the JSON schema from the connected upstream node. It also preserves any computed fields from Formula nodes.

Formula

Add computed fields using Excel-like formulas.

Inputs: JSON Outputs: JSON (with computed fields) Configuration:

Setting	Description
Formulas	List of computed fields with target paths and expressions

Common Functions:

Category	Functions	Example
Aggregation	`SUM`, `AVERAGE`, `COUNT`, `MIN`, `MAX`	`SUM(items.*.price)`
Logic	`IF`, `AND`, `OR`, `NOT`	`IF(total > 1000, "Large", "Small")`
Strings	`CONCAT`, `TRIM`, `UPPER`, `LOWER`, `REPLACE`, `LEN`	`CONCAT(first_name, " ", last_name)`
String matching	`STARTS_WITH`, `ENDS_WITH`, `CONTAINS`	`ENDS_WITH(email, "@workflows.retab.com")`
String extraction	`BEFORE`, `AFTER`, `SPLIT_PART`, `REGEX_EXTRACT`	`BEFORE(BEFORE(email, "@"), ".")`
Math	`ABS`, `SQRT`, `LOG`, `POW`	`ABS(total - expected_total)`

Example Configuration:

Target: subtotal
Expression: SUM(line_items.*.amount)

Target: tax
Expression: subtotal * 0.1

Target: total
Expression: subtotal + tax

Formulas are evaluated in dependency order. You can reference other computed fields in expressions.

String extraction helpers return null when they cannot extract a value. For example, BEFORE(email, "@") returns null if email does not contain @.

See Formulas Documentation for the complete formula reference.

Use Cases:

Restructure flat data into nested objects for downstream systems
Select only the fields you need from a large extraction
Rename fields to match your API or database schema
Prepare data for webhooks that expect a specific format

Unmapped fields are excluded from the output. If an input path doesn’t exist, the mapping is skipped without error.

If / Else

Route data to different branches based on conditions.

Inputs: JSON
Outputs: Multiple JSON outputs (one per branch: If, Else If, Else) Configuration:

Setting	Description
Conditions	List of conditions to evaluate in order
Has Else	Whether to include a default else branch (default: true)

Condition Structure: Each condition can have multiple sub-conditions combined with AND/OR:

{
  "conditions": [
    {
      "branch_name": "if",
      "sub_conditions": [
        { "path": "data.total_amount", "operator": "is_greater_than", "value": 1000 },
        { "path": "data.vendor.country", "operator": "is_equal_to", "value": "US" }
      ],
      "logical_operator": "and"
    }
  ]
}

Available Operators:

Type	Operators
Existence	`exists`, `does_not_exist`, `is_empty`, `is_not_empty`
Comparison	`is_equal_to`, `is_not_equal_to`
String	`contains`, `starts_with`, `ends_with`, `matches_regex`
Number	`is_greater_than`, `is_less_than`, `is_greater_than_or_equal_to`, `is_less_than_or_equal_to`
Boolean	`is_true`, `is_false`
Array	`length_equal_to`, `length_greater_than`, `length_less_than`
Date	`is_after`, `is_before`, `is_after_or_equal_to`, `is_before_or_equal_to`

How It Works:

Conditions are evaluated in order (If, Else If 1, Else If 2, …)
The first matching condition determines the output branch
Data is routed to exactly one branch
If no conditions match and has_else is true, data goes to the Else branch
Downstream nodes on non-matched branches are skipped

Example Use Cases:

Route high-value invoices for additional approval
Process documents differently based on vendor country
Flag incomplete extractions for review

Merge PDF

Combine multiple PDF documents into a single file.

Inputs: Multiple File inputs (configurable)
Outputs: File (merged PDF) Configuration:

Setting	Description
Inputs	Named input slots for PDFs to merge

Example Configuration:

{
  "inputs": [
    { "name": "Cover Page" },
    { "name": "Invoice" },
    { "name": "Terms and Conditions" }
  ]
}

PDFs are merged in the order defined by the inputs list. Non-PDF documents are automatically converted before merging.

API Call

Make HTTP requests to external APIs and use the response in your workflow.

Inputs: JSON (optional request body or parameters) Outputs: JSON (API response) Configuration:

Setting	Description	Default
URL	The API endpoint URL	Required
Method	HTTP method (GET, POST, PUT, PATCH, DELETE)	`POST`
Headers	Custom HTTP headers (e.g., authentication)	`{}`
Body Template	JSON template with placeholders for input data	`{}`

Example Configuration:

{
  "url": "https://api.example.com/validate",
  "method": "POST",
  "headers": {
    "Authorization": "Bearer ${API_KEY}",
    "Content-Type": "application/json"
  },
  "body_template": {
    "invoice_number": "{{data.invoice_number}}",
    "amount": "{{data.total_amount}}"
  }
}

Use Cases:

Validate extracted data against external systems
Enrich documents with data from your CRM or ERP
Trigger actions in third-party services based on extraction results
Look up additional information using extracted identifiers

Placeholders: Use {{path.to.field}} syntax to reference values from the input JSON:

{{data.invoice_number}} → Inserts the invoice_number field
{{data.vendor.name}} → Inserts nested fields
{{data.line_items[0].amount}} → Inserts array elements

API Call nodes execute synchronously. For long-running operations, consider using webhooks to trigger external workflows asynchronously.

Merge JSON

Combine multiple JSON objects into a single structured object.

Inputs: Multiple JSON inputs (configurable)
Outputs: JSON (merged object) Configuration:

Setting	Description
Inputs	Named input slots for JSON objects to merge

Example Configuration:

{
  "inputs": [
    { "name": "invoice_data" },
    { "name": "vendor_data" },
    { "name": "line_items" }
  ]
}

Output Structure: The merged output wraps each input under its named key:

{
  "invoice_data": { ... extracted invoice fields ... },
  "vendor_data": { ... extracted vendor fields ... },
  "line_items": { ... extracted line items ... }
}

This is useful for:

Combining extractions from multiple documents
Aggregating data from parallel processing branches
Creating comprehensive outputs from Split or Classifier workflows

While Loop

Repeat contained nodes until a termination condition is met.

Inputs: JSON or File Outputs: JSON or File (final iteration result) Configuration:

Setting	Description	Default
Max Iterations	Maximum number of loop iterations	`10`
Termination Conditions	Conditions that stop the loop when matched	`[]`
Loop Context Template	Text template with loop variables for contained nodes	`""`

How It Works:

Data enters the loop through the left input handle
Contained nodes inside the loop process the data
After each iteration, termination conditions are evaluated
If a condition matches (or max iterations reached), the loop exits
Final output is passed through the right output handle

Termination Conditions: Configure conditions using the same operators as If/Else:

Type	Operators
Comparison	`is_equal_to`, `is_not_equal_to`, `is_greater_than`, `is_less_than`
Existence	`exists`, `does_not_exist`, `is_empty`, `is_not_empty`
Boolean	`is_true`, `is_false`
Array	`length_equal_to`, `length_greater_than`, `length_less_than`

Example Configuration:

{
  "max_iterations": 5,
  "termination_conditions": [
    {
      "path": "data.status",
      "operator": "is_equal_to",
      "value": "complete"
    }
  ]
}

Loop Context Variables: The loop context template can include these variables:

{{iteration_number}} - Current iteration (e.g., “1 of 10”)
{{termination_condition_breakdown}} - Detailed evaluation of why the loop continued
{{data}} - JSON data from the previous iteration

Use Cases:

Iteratively refine extractions until quality thresholds are met
Process documents until a specific field value is detected
Implement retry logic with conditional exit

The loop always runs at least once. Termination conditions are evaluated after each iteration completes.

For Each

Process items in parallel and combine results.

Inputs: JSON (array) or File (PDF) Outputs: JSON (combined results) or File (merged PDF) Configuration:

Setting	Description	Default
Map Method	How to split input into items	`split_by_page`
Max Iterations	Maximum items to process	`100`
Reduce Strategy	How to combine iteration outputs	`concat_array`

Map Methods:

Method	Description	Input Type
`iterate_array`	Process each item in a JSON array	JSON
`split_by_page`	Split PDF into N-page chunks	File
`split_by_key`	Split PDF by semantic boundaries (AI-detected)	File

Reduce Strategies:

Strategy	Description	Output Type
`concat_array`	Combine all outputs into an array	JSON
`first`	Return only the first iteration’s output	JSON
`last`	Return only the last iteration’s output	JSON
`merge_pdf`	Merge all output PDFs into one	File
`none`	No output (side effects only)	None

Example: Iterate Array Process each line item in an invoice separately:

{
  "map_method": "iterate_array",
  "map_source_path": "data.line_items",
  "reduce_strategy": "concat_array",
  "reduce_concat_data_key": "processed_items"
}

Example: Split by Page Process a multi-page document one page at a time:

{
  "map_method": "split_by_page",
  "split_page_range": 1,
  "reduce_strategy": "concat_array"
}

Example: Split by Key Split a document containing multiple invoices by invoice number:

{
  "map_method": "split_by_key",
  "partition_key_name": "invoice_number",
  "partition_key_description": "The invoice number found at the top of each invoice",
  "reduce_strategy": "concat_array",
  "reduce_concat_item_key": "invoice_id"
}

Iteration Context: Inside the loop, contained nodes can access:

{{key}} - The partition key for the current item (page range or semantic key)
{{input_json}} - The full input JSON (for iterate_array method)
{{key_description}} - The partition key description (for split_by_key)

Use Cases:

Extract data from each page of a multi-page document independently
Process bundled documents that contain multiple invoices/receipts
Apply different logic to each item in an array
Batch process documents with parallel execution

For Each processes items sequentially. The reduce phase runs after all iterations complete. Use split_by_key when documents have multiple logical sections that need AI to identify boundaries.

Node I/O Types

Understanding input/output types helps you connect nodes correctly:

Type	Color	Description
File	Blue (📎)	Documents, PDFs, images
JSON	Purple (`{ }`)	Structured data objects

Compatibility Matrix

Source → Target	File	JSON
File	✅	❌
JSON	❌	✅

Tips for Building Workflows

Start with the output in mind

Identify what data you need at the end, then work backwards to determine which nodes you need.

Add Formula nodes for calculations

Instead of computing values after receiving webhook data, use Formula nodes to add totals, percentages, and derived fields directly in the workflow.

Use Classifier for routing

When handling different document types, use Classifier to route each document to the appropriate extraction schema before processing.

Split before specialized processing

When handling mixed document bundles (e.g., invoice + contract in one PDF), use Split first, then apply specific Extract schemas to each subdocument.

Combine results with Merge JSON

When processing multiple documents or branches, use Merge JSON to combine all extracted data into a single structured output.

Use If/Else for conditional logic

Route data based on extracted values—for example, send high-value invoices to a different webhook or flag certain conditions for review.

Use Reshape to prepare data for external systems

When your extraction schema doesn’t match your downstream API or database structure, use Reshape to transform fields without modifying your extraction schema.

Use For Each for multi-document PDFs

When a single PDF contains multiple logical documents (like several invoices), use For Each with split_by_key to process each one independently with its own extraction.

Use While Loop for iterative refinement

When you need to retry or refine processing until a quality threshold is met, use While Loop with termination conditions that check for the desired outcome.

Test edge cases

Run your workflow with documents that have missing fields, poor scan quality, or unusual formats to ensure robust handling.

Overview

Core Concepts

Consensus

Workflows

Projects

Evals

​Overview

​Node Categories

​Core Nodes

​Document (Start)

Document

​JSON Input (Start)

JSON Input

​Webhook (End)

Webhook

​Note

Note

​Tools Nodes

​Extract

Extract

​Parse

Parse

​Split

Split

​Classifier

Classifier

​Edit

Edit

​Logic Nodes

​Human in the Loop (HIL)

Human in the Loop

​Formula

Formula

​If / Else

If / Else

​Merge PDF

Merge PDF

​API Call

API Call

​Merge JSON

Merge JSON

​While Loop

While Loop

​For Each

For Each

​Node I/O Types

​Compatibility Matrix

​Tips for Building Workflows

Overview

Node Categories

Core Nodes

Document (Start)

JSON Input (Start)

Webhook (End)

Note

Tools Nodes

Extract

Parse

Split

Classifier

Edit

Logic Nodes

Human in the Loop (HIL)

Formula

If / Else

Merge PDF

API Call

Merge JSON

While Loop

For Each

Node I/O Types

Compatibility Matrix

Tips for Building Workflows