Overview
Workflow nodes are the building blocks of document processing pipelines. Each node has specific inputs, outputs, and configuration options.Node Categories
Nodes are organized into three categories:| Category | Purpose |
|---|---|
| Core | Workflow entry/exit points and utilities |
| Tools | Document processing operations |
| Logic | Conditional flows and data transformations |
Core Nodes
Document (Start)
Document
The entry point for your workflow. Upload documents here for processing.
- PDF documents
- Images (PNG, JPG, JPEG, GIF, WebP, TIFF, BMP)
- Microsoft Word (.docx, .doc)
- Microsoft Excel (.xlsx, .xls)
- Microsoft PowerPoint (.pptx, .ppt)
- Drag multiple Document nodes for workflows that combine multiple inputs
- Each Document node can receive one file per workflow run
- Files are automatically converted to PDF when required by downstream nodes
JSON Input (Start)
JSON Input
Entry point for structured JSON data. Define a schema and pass JSON data when running the workflow.
| Setting | Description |
|---|---|
| JSON Schema | Define the structure of expected input data |
- Use JSON Input nodes to pass structured data into workflows without documents
- Combine with Document nodes to enrich extractions with external data
- Connect to Extract nodes as additional context for AI-powered extraction
- Use with Functions or If/Else nodes for data-driven workflow logic
Text Input (Start)
Text Input
Entry point for plain text data. Pass text strings when running the workflow.
- Use Text Input nodes to pass instructions or context into workflows
- Connect to Agent Edit nodes to provide fill instructions
- Combine with other nodes that accept text inputs
- Useful for dynamic prompts or user-provided instructions
Webhook (End)
Webhook
Send workflow outputs to an external HTTP endpoint.
| Setting | Description |
|---|---|
| Webhook URL | The HTTPS endpoint to receive the data |
| Headers | Custom HTTP headers (e.g., authentication tokens) |
completion: The extraction result with parsed datafile_payload: Document metadata including filename and URLuser: User email address (if authenticated)metadata: Additional workflow metadata
Note
Note
Add comments and documentation to your workflow.
Outputs: None Notes don’t affect workflow execution—they’re purely for documentation. Use them to:
- Explain complex logic
- Document configuration choices
- Leave instructions for teammates
Tools Nodes
Extract
Extract
Extract structured data from documents using a JSON schema.
| Setting | Description | Default |
|---|---|---|
| Schema | JSON Schema defining fields to extract | {} |
| Model | AI model for extraction | retab-small |
| Temperature | Randomness in extraction (0-1) | 0 |
| Image Resolution | DPI for document rendering | 150 |
| Consensus | Number of parallel extractions (1-10) | 1 |
| Reasoning Effort | How much the model “thinks” | minimal |
| Additional Inputs | Named inputs for context (text, JSON, or files) | [] |
When
n_consensus > 1, the node runs multiple extractions in parallel and returns:
- data: The consensus result
- likelihoods: Confidence scores for each field (0-1)
- Text inputs: Instructions or context as plain text
- JSON inputs: Structured data from other nodes
- File inputs: Additional reference documents
Parse
Parse
Convert documents to structured text/markdown using AI vision.
Outputs: File (parsed document), Text (extracted content) Configuration:
| Setting | Description | Default |
|---|---|---|
| Model | AI model for parsing | retab-small |
| Image Resolution | DPI for document rendering | 150 |
- Pre-process documents before extraction
- Convert scanned PDFs to searchable text
- Extract text from images
Split
Split
Split multi-page documents into separate PDFs by category.
Outputs: Multiple File outputs (one per category) Configuration:
| Setting | Description |
|---|---|
| Categories | List of document categories with names and descriptions |
| Model | AI model for classification |
Non-PDF documents are automatically converted to PDF before splitting.
Classifier
Classifier
Classify documents into one of the predefined categories.
Outputs: Multiple File outputs (one per category, only the matched category receives the document) Configuration:
| Setting | Description |
|---|---|
| Categories | List of document categories with names and descriptions |
| Model | AI model for classification |
| Feature | Split | Classifier |
|---|---|---|
| Input | Multi-page document | Single document |
| Output | Multiple PDFs (pages grouped by category) | Same document routed to one category |
| Use Case | Separating bundled documents | Routing different document types |
Edit
Edit
Fill PDF forms using AI with natural language instructions or pre-defined templates.
- File (document to edit) — only when “Use Template” is off
- Text (instructions/data)
| Setting | Description |
|---|---|
| Model | AI model to use for filling |
| Use Template | Toggle to use a pre-defined template instead of an input document |
| Template | Template to use (shown when “Use Template” is on) |
-
Document Mode (default): Edit an input document using AI
- Connect a document from another node (e.g., Start, Split, Parse)
- Provide natural language instructions via the Text input
- The AI fills the form fields based on your instructions
-
Template Mode: Fill a pre-defined template
- Enable “Use Template” toggle
- Select a template from your template library
- No document input required — the template provides the form structure
- Provide data/instructions via the Text input
Logic Nodes
Human in the Loop (HIL)
Human in the Loop
Pause workflow execution for human review and approval.
Outputs: JSON (verified data) Configuration: None (inherits schema from connected source) How It Works:
- Connect to an Extract or Functions node (schema is automatically inherited)
- When the workflow runs, it pauses at the HIL node
- A reviewer sees the extracted data alongside the source document
- The reviewer can approve, modify, or reject the data
- After approval, the verified data continues through the workflow
- Validate critical extractions before sending to downstream systems
- Quality control for high-value documents
- Compliance requirements that mandate human oversight
The HIL node automatically inherits the JSON schema from the connected upstream node. It also preserves any computed fields from Functions nodes.
Functions
Functions
Add computed fields using Excel-like formulas.
Outputs: JSON (with computed fields) Configuration:
| Setting | Description |
|---|---|
| Functions | List of computed fields with target paths and expressions |
| Function | Description | Example |
|---|---|---|
SUM | Sum of values | SUM(items.*.price) |
AVERAGE | Average of values | AVERAGE(scores.*) |
COUNT | Count of items | COUNT(line_items.*) |
MIN / MAX | Minimum/maximum value | MAX(items.*.quantity) |
IF | Conditional | IF(total > 1000, "Large", "Small") |
CONCAT | Join strings | CONCAT(first_name, " ", last_name) |
ROUND | Round number | ROUND(amount, 2) |
Functions are evaluated in dependency order. You can reference other computed fields in expressions.
Reshape
Reshape
Transform JSON by selecting and renaming fields into a new structure.
| Setting | Description |
|---|---|
| Mappings | List of field mappings from input to output paths |
| Input Path | Output Path |
|---|---|
client_name | client.name |
client_address | client.address |
bank_name | bank.name |
- Restructure flat data into nested objects for downstream systems
- Select only the fields you need from a large extraction
- Rename fields to match your API or database schema
- Prepare data for webhooks that expect a specific format
Unmapped fields are excluded from the output. If an input path doesn’t exist, the mapping is skipped without error.
If / Else
If / Else
Route data to different branches based on conditions.
Outputs: Multiple JSON outputs (one per branch: If, Else If, Else) Configuration:
| Setting | Description |
|---|---|
| Conditions | List of conditions to evaluate in order |
| Has Else | Whether to include a default else branch (default: true) |
| Type | Operators |
|---|---|
| Existence | exists, does_not_exist, is_empty, is_not_empty |
| Comparison | is_equal_to, is_not_equal_to |
| String | contains, starts_with, ends_with, matches_regex |
| Number | is_greater_than, is_less_than, is_greater_than_or_equal_to, is_less_than_or_equal_to |
| Boolean | is_true, is_false |
| Array | length_equal_to, length_greater_than, length_less_than |
| Date | is_after, is_before, is_after_or_equal_to, is_before_or_equal_to |
- Conditions are evaluated in order (If, Else If 1, Else If 2, …)
- The first matching condition determines the output branch
- Data is routed to exactly one branch
- If no conditions match and
has_elseis true, data goes to the Else branch - Downstream nodes on non-matched branches are skipped
- Route high-value invoices for additional approval
- Process documents differently based on vendor country
- Flag incomplete extractions for review
Merge PDF
Merge PDF
Combine multiple PDF documents into a single file.
Outputs: File (merged PDF) Configuration:
| Setting | Description |
|---|---|
| Inputs | Named input slots for PDFs to merge |
API Call
API Call
Make HTTP requests to external APIs and use the response in your workflow.
| Setting | Description | Default |
|---|---|---|
| URL | The API endpoint URL | Required |
| Method | HTTP method (GET, POST, PUT, PATCH, DELETE) | POST |
| Headers | Custom HTTP headers (e.g., authentication) | {} |
| Body Template | JSON template with placeholders for input data | {} |
- Validate extracted data against external systems
- Enrich documents with data from your CRM or ERP
- Trigger actions in third-party services based on extraction results
- Look up additional information using extracted identifiers
{{path.to.field}} syntax to reference values from the input JSON:
{{data.invoice_number}}→ Inserts the invoice_number field{{data.vendor.name}}→ Inserts nested fields{{data.line_items[0].amount}}→ Inserts array elements
API Call nodes execute synchronously. For long-running operations, consider using webhooks to trigger external workflows asynchronously.
Merge JSON
Merge JSON
Combine multiple JSON objects into a single structured object.
Outputs: JSON (merged object) Configuration:
| Setting | Description |
|---|---|
| Inputs | Named input slots for JSON objects to merge |
- Combining extractions from multiple documents
- Aggregating data from parallel processing branches
- Creating comprehensive outputs from Split or Classifier workflows
While Loop
While Loop
Repeat contained nodes until a termination condition is met.
| Setting | Description | Default |
|---|---|---|
| Max Iterations | Maximum number of loop iterations | 10 |
| Termination Conditions | Conditions that stop the loop when matched | [] |
| Loop Context Template | Text template with loop variables for contained nodes | "" |
- Data enters the loop through the left input handle
- Contained nodes inside the loop process the data
- After each iteration, termination conditions are evaluated
- If a condition matches (or max iterations reached), the loop exits
- Final output is passed through the right output handle
| Type | Operators |
|---|---|
| Comparison | is_equal_to, is_not_equal_to, is_greater_than, is_less_than |
| Existence | exists, does_not_exist, is_empty, is_not_empty |
| Boolean | is_true, is_false |
| Array | length_equal_to, length_greater_than, length_less_than |
{{iteration_number}}- Current iteration (e.g., “1 of 10”){{termination_condition_breakdown}}- Detailed evaluation of why the loop continued{{data}}- JSON data from the previous iteration
- Iteratively refine extractions until quality thresholds are met
- Process documents until a specific field value is detected
- Implement retry logic with conditional exit
The loop always runs at least once. Termination conditions are evaluated after each iteration completes.
For Each
For Each
Process items in parallel and combine results.
| Setting | Description | Default |
|---|---|---|
| Map Method | How to split input into items | split_by_page |
| Max Iterations | Maximum items to process | 100 |
| Reduce Strategy | How to combine iteration outputs | concat_array |
| Method | Description | Input Type |
|---|---|---|
iterate_array | Process each item in a JSON array | JSON |
split_by_page | Split PDF into N-page chunks | File |
split_by_key | Split PDF by semantic boundaries (AI-detected) | File |
| Strategy | Description | Output Type |
|---|---|---|
concat_array | Combine all outputs into an array | JSON |
first | Return only the first iteration’s output | JSON |
last | Return only the last iteration’s output | JSON |
merge_pdf | Merge all output PDFs into one | File |
concat_text | Combine all text outputs into an array | JSON |
none | No output (side effects only) | None |
{{key}}- The partition key for the current item (page range or semantic key){{input_json}}- The full input JSON (foriterate_arraymethod){{key_description}}- The partition key description (forsplit_by_key)
- Extract data from each page of a multi-page document independently
- Process bundled documents that contain multiple invoices/receipts
- Apply different logic to each item in an array
- Batch process documents with parallel execution
For Each processes items sequentially. The reduce phase runs after all iterations complete. Use
split_by_key when documents have multiple logical sections that need AI to identify boundaries.Node I/O Types
Understanding input/output types helps you connect nodes correctly:| Type | Color | Description |
|---|---|---|
| File | Blue (📎) | Documents, PDFs, images |
| JSON | Purple ({ }) | Structured data objects |
| Text | Cyan (📄) | Plain text strings |
Compatibility Matrix
| Source → Target | File | JSON | Text |
|---|---|---|---|
| File | ✅ | ❌ | ❌ |
| JSON | ❌ | ✅ | ✅ |
| Text | ❌ | ❌ | ✅ |
Tips for Building Workflows
Start with the output in mind
Start with the output in mind
Identify what data you need at the end, then work backwards to determine which nodes you need.
Add Functions for calculations
Add Functions for calculations
Instead of computing values after receiving webhook data, use Functions nodes to add totals, percentages, and derived fields directly in the workflow.
Use Classifier for routing
Use Classifier for routing
When handling different document types, use Classifier to route each document to the appropriate extraction schema before processing.
Split before specialized processing
Split before specialized processing
When handling mixed document bundles (e.g., invoice + contract in one PDF), use Split first, then apply specific Extract schemas to each category.
Combine results with Merge JSON
Combine results with Merge JSON
When processing multiple documents or branches, use Merge JSON to combine all extracted data into a single structured output.
Use If/Else for conditional logic
Use If/Else for conditional logic
Route data based on extracted values—for example, send high-value invoices to a different webhook or flag certain conditions for review.
Use Reshape to prepare data for external systems
Use Reshape to prepare data for external systems
When your extraction schema doesn’t match your downstream API or database structure, use Reshape to transform fields without modifying your extraction schema.
Use For Each for multi-document PDFs
Use For Each for multi-document PDFs
When a single PDF contains multiple logical documents (like several invoices), use For Each with
split_by_key to process each one independently with its own extraction.Use While Loop for iterative refinement
Use While Loop for iterative refinement
When you need to retry or refine processing until a quality threshold is met, use While Loop with termination conditions that check for the desired outcome.
Test edge cases
Test edge cases
Run your workflow with documents that have missing fields, poor scan quality, or unusual formats to ensure robust handling.