Skip to main content

Overview

Workflow nodes are the building blocks of document processing pipelines. Each node has specific inputs, outputs, and configuration options.

Node Categories

Nodes are organized into three categories:
CategoryPurpose
CoreWorkflow entry/exit points and utilities
ToolsDocument processing operations
LogicConditional flows and data transformations

Core Nodes

Document (Start)

Document

The entry point for your workflow. Upload documents here for processing.
Outputs: File Configuration: None (just upload your document in Run mode) Supported Formats:
  • PDF documents
  • Images (PNG, JPG, JPEG, GIF, WebP, TIFF, BMP)
  • Microsoft Word (.docx, .doc)
  • Microsoft Excel (.xlsx, .xls)
  • Microsoft PowerPoint (.pptx, .ppt)
Usage:
  • Drag multiple Document nodes for workflows that combine multiple inputs
  • Each Document node can receive one file per workflow run
  • Files are automatically converted to PDF when required by downstream nodes

JSON Input (Start)

JSON Input

Entry point for structured JSON data. Define a schema and pass JSON data when running the workflow.
Outputs: JSON Configuration:
SettingDescription
JSON SchemaDefine the structure of expected input data
Schema Example:
{
  "type": "object",
  "properties": {
    "customer_id": { "type": "string" },
    "priority": { "type": "string", "enum": ["low", "medium", "high"] },
    "metadata": {
      "type": "object",
      "properties": {
        "source": { "type": "string" }
      }
    }
  },
  "required": ["customer_id"]
}
Usage:
  • Use JSON Input nodes to pass structured data into workflows without documents
  • Combine with Document nodes to enrich extractions with external data
  • Connect to Extract nodes as additional context for AI-powered extraction
  • Use with Functions or If/Else nodes for data-driven workflow logic
SDK Example:
run = client.workflows.runs.create(
    workflow_id="wf_abc123",
    json_inputs={
        "json-node-id": {"customer_id": "cust_123", "priority": "high"}
    }
)

Text Input (Start)

Text Input

Entry point for plain text data. Pass text strings when running the workflow.
Outputs: Text Configuration: None Usage:
  • Use Text Input nodes to pass instructions or context into workflows
  • Connect to Agent Edit nodes to provide fill instructions
  • Combine with other nodes that accept text inputs
  • Useful for dynamic prompts or user-provided instructions
SDK Example:
run = client.workflows.runs.create(
    workflow_id="wf_abc123",
    text_inputs={
        "text-node-id": "Process this document with high priority"
    }
)

Webhook (End)

Webhook

Send workflow outputs to an external HTTP endpoint.
Inputs: File, JSON Configuration:
SettingDescription
Webhook URLThe HTTPS endpoint to receive the data
HeadersCustom HTTP headers (e.g., authentication tokens)
Example Configuration:
{
  "webhook_url": "https://api.yourapp.com/webhook",
  "webhook_headers": {
    "Authorization": "Bearer your-token",
    "X-Custom-Header": "value"
  }
}
Webhook Payload: The webhook receives a JSON payload containing:
  • completion: The extraction result with parsed data
  • file_payload: Document metadata including filename and URL
  • user: User email address (if authenticated)
  • metadata: Additional workflow metadata
See Webhooks for details on securing and handling webhook requests.

Note

Note

Add comments and documentation to your workflow.
Inputs: None
Outputs: None
Notes don’t affect workflow execution—they’re purely for documentation. Use them to:
  • Explain complex logic
  • Document configuration choices
  • Leave instructions for teammates

Tools Nodes

Extract

Extract

Extract structured data from documents using a JSON schema.
Inputs: File, plus optional additional inputs (Text, JSON, File) Outputs: JSON (data), JSON (likelihoods when using consensus) Configuration:
SettingDescriptionDefault
SchemaJSON Schema defining fields to extract{}
ModelAI model for extractionretab-small
TemperatureRandomness in extraction (0-1)0
Image ResolutionDPI for document rendering150
ConsensusNumber of parallel extractions (1-10)1
Reasoning EffortHow much the model “thinks”minimal
Additional InputsNamed inputs for context (text, JSON, or files)[]
Schema Example:
{
  "type": "object",
  "properties": {
    "invoice_number": { "type": "string" },
    "total_amount": { "type": "number" },
    "vendor": {
      "type": "object",
      "properties": {
        "name": { "type": "string" },
        "address": { "type": "string" }
      }
    }
  }
}
Consensus Mode:
When n_consensus > 1, the node runs multiple extractions in parallel and returns:
  • data: The consensus result
  • likelihoods: Confidence scores for each field (0-1)
Additional Inputs: You can add named input handles to provide extra context during extraction:
  • Text inputs: Instructions or context as plain text
  • JSON inputs: Structured data from other nodes
  • File inputs: Additional reference documents

Parse

Parse

Convert documents to structured text/markdown using AI vision.
Inputs: File
Outputs: File (parsed document), Text (extracted content)
Configuration:
SettingDescriptionDefault
ModelAI model for parsingretab-small
Image ResolutionDPI for document rendering150
Use Cases:
  • Pre-process documents before extraction
  • Convert scanned PDFs to searchable text
  • Extract text from images

Split

Split

Split multi-page documents into separate PDFs by category.
Inputs: File (PDF)
Outputs: Multiple File outputs (one per category)
Configuration:
SettingDescription
CategoriesList of document categories with names and descriptions
ModelAI model for classification
Category Example:
{
  "categories": [
    {
      "name": "Invoice",
      "description": "Pages containing invoice details, line items, and totals"
    },
    {
      "name": "Contract",
      "description": "Legal agreement pages with terms and signatures"
    },
    {
      "name": "Supporting Documents",
      "description": "Receipts, certificates, and other attachments"
    }
  ]
}
Each category creates a separate output handle. Connect downstream nodes to process each category differently.
Non-PDF documents are automatically converted to PDF before splitting.

Classifier

Classifier

Classify documents into one of the predefined categories.
Inputs: File
Outputs: Multiple File outputs (one per category, only the matched category receives the document)
Configuration:
SettingDescription
CategoriesList of document categories with names and descriptions
ModelAI model for classification
Difference from Split:
FeatureSplitClassifier
InputMulti-page documentSingle document
OutputMultiple PDFs (pages grouped by category)Same document routed to one category
Use CaseSeparating bundled documentsRouting different document types
Example:
{
  "categories": [
    {
      "name": "Invoice",
      "description": "Documents containing billing information and payment requests"
    },
    {
      "name": "Receipt",
      "description": "Proof of payment or purchase confirmation"
    },
    {
      "name": "Contract",
      "description": "Legal agreements and binding documents"
    }
  ]
}
The AI analyzes the document and routes it to exactly one category. Downstream nodes connected to other categories are skipped.

Edit

Edit

Fill PDF forms using AI with natural language instructions or pre-defined templates.
Inputs:
  • File (document to edit) — only when “Use Template” is off
  • Text (instructions/data)
Outputs: File (filled document) Configuration:
SettingDescription
ModelAI model to use for filling
Use TemplateToggle to use a pre-defined template instead of an input document
TemplateTemplate to use (shown when “Use Template” is on)
Two Operating Modes:
  1. Document Mode (default): Edit an input document using AI
    • Connect a document from another node (e.g., Start, Split, Parse)
    • Provide natural language instructions via the Text input
    • The AI fills the form fields based on your instructions
  2. Template Mode: Fill a pre-defined template
    • Enable “Use Template” toggle
    • Select a template from your template library
    • No document input required — the template provides the form structure
    • Provide data/instructions via the Text input
Example Instructions:
Fill this form with the company information from the input data.
Use the vendor_name for "Name" and vendor_address for the address fields.
Leave signature fields blank.
The Text input provides instructions or data to use for filling. Connect this to an Extract node’s JSON output (which converts to text instructions).

Logic Nodes

Human in the Loop (HIL)

Human in the Loop

Pause workflow execution for human review and approval.
Inputs: JSON
Outputs: JSON (verified data)
Configuration: None (inherits schema from connected source) How It Works:
  1. Connect to an Extract or Functions node (schema is automatically inherited)
  2. When the workflow runs, it pauses at the HIL node
  3. A reviewer sees the extracted data alongside the source document
  4. The reviewer can approve, modify, or reject the data
  5. After approval, the verified data continues through the workflow
Use Cases:
  • Validate critical extractions before sending to downstream systems
  • Quality control for high-value documents
  • Compliance requirements that mandate human oversight
The HIL node automatically inherits the JSON schema from the connected upstream node. It also preserves any computed fields from Functions nodes.

Functions

Functions

Add computed fields using Excel-like formulas.
Inputs: JSON
Outputs: JSON (with computed fields)
Configuration:
SettingDescription
FunctionsList of computed fields with target paths and expressions
Supported Functions:
FunctionDescriptionExample
SUMSum of valuesSUM(items.*.price)
AVERAGEAverage of valuesAVERAGE(scores.*)
COUNTCount of itemsCOUNT(line_items.*)
MIN / MAXMinimum/maximum valueMAX(items.*.quantity)
IFConditionalIF(total > 1000, "Large", "Small")
CONCATJoin stringsCONCAT(first_name, " ", last_name)
ROUNDRound numberROUND(amount, 2)
Example Configuration:
Target: subtotal
Expression: SUM(line_items.*.amount)

Target: tax
Expression: subtotal * 0.1

Target: total
Expression: subtotal + tax
Functions are evaluated in dependency order. You can reference other computed fields in expressions.
See Functions Documentation for the complete formula reference.

Reshape

Reshape

Transform JSON by selecting and renaming fields into a new structure.
Inputs: JSON Outputs: JSON (reshaped data) Configuration:
SettingDescription
MappingsList of field mappings from input to output paths
How It Works: Reshape lets you transform JSON data by mapping fields from an input structure to a new output structure. Use dot notation for both input and output paths to access or create nested fields. Mapping Example:
Input PathOutput Path
client_nameclient.name
client_addressclient.address
bank_namebank.name
Input:
{
  "client_name": "Jane Doe",
  "client_address": "123 Main St",
  "bank_name": "Commerce Bank"
}
Output:
{
  "client": {
    "name": "Jane Doe",
    "address": "123 Main St"
  },
  "bank": {
    "name": "Commerce Bank"
  }
}
Use Cases:
  • Restructure flat data into nested objects for downstream systems
  • Select only the fields you need from a large extraction
  • Rename fields to match your API or database schema
  • Prepare data for webhooks that expect a specific format
Unmapped fields are excluded from the output. If an input path doesn’t exist, the mapping is skipped without error.

If / Else

If / Else

Route data to different branches based on conditions.
Inputs: JSON
Outputs: Multiple JSON outputs (one per branch: If, Else If, Else)
Configuration:
SettingDescription
ConditionsList of conditions to evaluate in order
Has ElseWhether to include a default else branch (default: true)
Condition Structure: Each condition can have multiple sub-conditions combined with AND/OR:
{
  "conditions": [
    {
      "branch_name": "if",
      "sub_conditions": [
        { "path": "data.total_amount", "operator": "is_greater_than", "value": 1000 },
        { "path": "data.vendor.country", "operator": "is_equal_to", "value": "US" }
      ],
      "logical_operator": "and"
    }
  ]
}
Available Operators:
TypeOperators
Existenceexists, does_not_exist, is_empty, is_not_empty
Comparisonis_equal_to, is_not_equal_to
Stringcontains, starts_with, ends_with, matches_regex
Numberis_greater_than, is_less_than, is_greater_than_or_equal_to, is_less_than_or_equal_to
Booleanis_true, is_false
Arraylength_equal_to, length_greater_than, length_less_than
Dateis_after, is_before, is_after_or_equal_to, is_before_or_equal_to
How It Works:
  1. Conditions are evaluated in order (If, Else If 1, Else If 2, …)
  2. The first matching condition determines the output branch
  3. Data is routed to exactly one branch
  4. If no conditions match and has_else is true, data goes to the Else branch
  5. Downstream nodes on non-matched branches are skipped
Example Use Cases:
  • Route high-value invoices for additional approval
  • Process documents differently based on vendor country
  • Flag incomplete extractions for review

Merge PDF

Merge PDF

Combine multiple PDF documents into a single file.
Inputs: Multiple File inputs (configurable)
Outputs: File (merged PDF)
Configuration:
SettingDescription
InputsNamed input slots for PDFs to merge
Example Configuration:
{
  "inputs": [
    { "name": "Cover Page" },
    { "name": "Invoice" },
    { "name": "Terms and Conditions" }
  ]
}
PDFs are merged in the order defined by the inputs list. Non-PDF documents are automatically converted before merging.

API Call

API Call

Make HTTP requests to external APIs and use the response in your workflow.
Inputs: JSON (optional request body or parameters) Outputs: JSON (API response) Configuration:
SettingDescriptionDefault
URLThe API endpoint URLRequired
MethodHTTP method (GET, POST, PUT, PATCH, DELETE)POST
HeadersCustom HTTP headers (e.g., authentication){}
Body TemplateJSON template with placeholders for input data{}
Example Configuration:
{
  "url": "https://api.example.com/validate",
  "method": "POST",
  "headers": {
    "Authorization": "Bearer ${API_KEY}",
    "Content-Type": "application/json"
  },
  "body_template": {
    "invoice_number": "{{data.invoice_number}}",
    "amount": "{{data.total_amount}}"
  }
}
Use Cases:
  • Validate extracted data against external systems
  • Enrich documents with data from your CRM or ERP
  • Trigger actions in third-party services based on extraction results
  • Look up additional information using extracted identifiers
Placeholders: Use {{path.to.field}} syntax to reference values from the input JSON:
  • {{data.invoice_number}} → Inserts the invoice_number field
  • {{data.vendor.name}} → Inserts nested fields
  • {{data.line_items[0].amount}} → Inserts array elements
API Call nodes execute synchronously. For long-running operations, consider using webhooks to trigger external workflows asynchronously.

Merge JSON

Merge JSON

Combine multiple JSON objects into a single structured object.
Inputs: Multiple JSON inputs (configurable)
Outputs: JSON (merged object)
Configuration:
SettingDescription
InputsNamed input slots for JSON objects to merge
Example Configuration:
{
  "inputs": [
    { "name": "invoice_data" },
    { "name": "vendor_data" },
    { "name": "line_items" }
  ]
}
Output Structure: The merged output wraps each input under its named key:
{
  "invoice_data": { ... extracted invoice fields ... },
  "vendor_data": { ... extracted vendor fields ... },
  "line_items": { ... extracted line items ... }
}
This is useful for:
  • Combining extractions from multiple documents
  • Aggregating data from parallel processing branches
  • Creating comprehensive outputs from Split or Classifier workflows

While Loop

While Loop

Repeat contained nodes until a termination condition is met.
Inputs: JSON, File, or Text Outputs: JSON, File, or Text (final iteration result) Configuration:
SettingDescriptionDefault
Max IterationsMaximum number of loop iterations10
Termination ConditionsConditions that stop the loop when matched[]
Loop Context TemplateText template with loop variables for contained nodes""
How It Works:
  1. Data enters the loop through the left input handle
  2. Contained nodes inside the loop process the data
  3. After each iteration, termination conditions are evaluated
  4. If a condition matches (or max iterations reached), the loop exits
  5. Final output is passed through the right output handle
Termination Conditions: Configure conditions using the same operators as If/Else:
TypeOperators
Comparisonis_equal_to, is_not_equal_to, is_greater_than, is_less_than
Existenceexists, does_not_exist, is_empty, is_not_empty
Booleanis_true, is_false
Arraylength_equal_to, length_greater_than, length_less_than
Example Configuration:
{
  "max_iterations": 5,
  "termination_conditions": [
    {
      "path": "data.status",
      "operator": "is_equal_to",
      "value": "complete"
    }
  ]
}
Loop Context Variables: The loop context template can include these variables:
  • {{iteration_number}} - Current iteration (e.g., “1 of 10”)
  • {{termination_condition_breakdown}} - Detailed evaluation of why the loop continued
  • {{data}} - JSON data from the previous iteration
Use Cases:
  • Iteratively refine extractions until quality thresholds are met
  • Process documents until a specific field value is detected
  • Implement retry logic with conditional exit
The loop always runs at least once. Termination conditions are evaluated after each iteration completes.

For Each

For Each

Process items in parallel and combine results.
Inputs: JSON (array) or File (PDF) Outputs: JSON (combined results) or File (merged PDF) Configuration:
SettingDescriptionDefault
Map MethodHow to split input into itemssplit_by_page
Max IterationsMaximum items to process100
Reduce StrategyHow to combine iteration outputsconcat_array
Map Methods:
MethodDescriptionInput Type
iterate_arrayProcess each item in a JSON arrayJSON
split_by_pageSplit PDF into N-page chunksFile
split_by_keySplit PDF by semantic boundaries (AI-detected)File
Reduce Strategies:
StrategyDescriptionOutput Type
concat_arrayCombine all outputs into an arrayJSON
firstReturn only the first iteration’s outputJSON
lastReturn only the last iteration’s outputJSON
merge_pdfMerge all output PDFs into oneFile
concat_textCombine all text outputs into an arrayJSON
noneNo output (side effects only)None
Example: Iterate Array Process each line item in an invoice separately:
{
  "map_method": "iterate_array",
  "map_source_path": "data.line_items",
  "reduce_strategy": "concat_array",
  "reduce_concat_data_key": "processed_items"
}
Example: Split by Page Process a multi-page document one page at a time:
{
  "map_method": "split_by_page",
  "split_page_range": 1,
  "reduce_strategy": "concat_array"
}
Example: Split by Key Split a document containing multiple invoices by invoice number:
{
  "map_method": "split_by_key",
  "partition_key_name": "invoice_number",
  "partition_key_description": "The invoice number found at the top of each invoice",
  "reduce_strategy": "concat_array",
  "reduce_concat_item_key": "invoice_id"
}
Iteration Context: Inside the loop, contained nodes can access:
  • {{key}} - The partition key for the current item (page range or semantic key)
  • {{input_json}} - The full input JSON (for iterate_array method)
  • {{key_description}} - The partition key description (for split_by_key)
Use Cases:
  • Extract data from each page of a multi-page document independently
  • Process bundled documents that contain multiple invoices/receipts
  • Apply different logic to each item in an array
  • Batch process documents with parallel execution
For Each processes items sequentially. The reduce phase runs after all iterations complete. Use split_by_key when documents have multiple logical sections that need AI to identify boundaries.

Node I/O Types

Understanding input/output types helps you connect nodes correctly:
TypeColorDescription
FileBlue (📎)Documents, PDFs, images
JSONPurple ({ })Structured data objects
TextCyan (📄)Plain text strings

Compatibility Matrix

Source → TargetFileJSONText
File
JSON
Text

Tips for Building Workflows

Identify what data you need at the end, then work backwards to determine which nodes you need.
Instead of computing values after receiving webhook data, use Functions nodes to add totals, percentages, and derived fields directly in the workflow.
When handling different document types, use Classifier to route each document to the appropriate extraction schema before processing.
When handling mixed document bundles (e.g., invoice + contract in one PDF), use Split first, then apply specific Extract schemas to each category.
When processing multiple documents or branches, use Merge JSON to combine all extracted data into a single structured output.
Route data based on extracted values—for example, send high-value invoices to a different webhook or flag certain conditions for review.
When your extraction schema doesn’t match your downstream API or database structure, use Reshape to transform fields without modifying your extraction schema.
When a single PDF contains multiple logical documents (like several invoices), use For Each with split_by_key to process each one independently with its own extraction.
When you need to retry or refine processing until a quality threshold is met, use While Loop with termination conditions that check for the desired outcome.
Run your workflow with documents that have missing fields, poor scan quality, or unusual formats to ensure robust handling.