What are Workflows?
Workflows are visual, node-based pipelines that let you chain together multiple document processing operations. Instead of writing code for each step, you can drag and drop nodes onto a canvas, connect them, and create powerful document automation flows. A workflow typically consists of:- Input nodes - Entry points for data:
- Document - Upload files (PDF, images, Word, Excel)
- JSON Input - Pass structured JSON data
- Processing nodes - Operations like Extract, Parse, Split, Classifier
- Logic nodes - Conditional flows like Human-in-the-Loop, Formula, If/Else routing, and API Call
- Output nodes - Optional destinations such as webhooks for your processed data
Creating a Workflow
- Navigate to the Workflows section in your dashboard
- Click Create Workflow to open a new canvas
- Drag nodes from the sidebar onto the canvas
- Connect nodes by dragging from output handles to input handles
- Configure each node by clicking on it
- Your workflow auto-saves as you build
Connecting Nodes
Nodes communicate through handles that define the type of data they accept or produce:| Handle Type | Icon | Description |
|---|---|---|
| File | 📎 | Document files (PDF, images, Word, Excel) |
| JSON | { } | Structured data extracted from documents |
Connection Rules
- File → File: Pass documents between processing nodes
- JSON → JSON: Pass extracted data between logic nodes
- Each input handle accepts only one connection
- Connections validate automatically to prevent incompatible links
Edit Mode vs Run Mode
Workflows have two operational modes:Edit Mode
- Add, remove, and configure nodes
- Create and delete connections
- Rename the workflow
- View generated Python code
Run Mode
- Upload documents to input nodes
- Execute the workflow step-by-step
- View results at each stage
- Download processed files and extracted data
Running a Workflow
A workflow is fundamentally an asynchronous job. When you start it, Retab creates a workflow run, executes each step on the server, and stores the results on that run. A webhook is optional: you can configure one if you want Retab to notify your system automatically, but you can also simply run the workflow and poll the run until it finishes. For the SDK and HTTP endpoint details, see the workflow API reference:- List Workflows
- Get Workflow
- Run Workflow
- Get Run
- Submit HIL Decision
- Get HIL Decision
- List Steps
- Get Step
From the Dashboard
- Switch to Run Mode
- Upload a document to each Document input node
- Click Run Workflow
- Watch as each node processes (status indicators show progress)
- Click on output handles to view results
Using the SDK
The Python SDK exposes workflow metadata, graph authoring, run execution, and typed step inspection:client.workflows.*forlist(),get(),create(),publish(),duplicate(), andget_entities()client.workflows.blocks.*andclient.workflows.edges.*for programmatic graph changesclient.workflows.runs.*andclient.workflows.runs.steps.*for running flows and reading results
Discover input node IDs
Workflow run inputs are keyed by the IDs of yourstart and start_json nodes. get_entities() is the easiest way to discover them.
Run and wait for completion
Workflows support two input maps:documentsfor Document (start) nodesjson_inputsfor JSON Input (start_json) nodes
run.steps contains per-node status summaries. For typed inputs and outputs on each node, use the step helpers.
Inspect typed step outputs
Step payloads are normalized intoHandlePayload objects. For JSON-producing nodes, extracted_data is shorthand for the default output-json-0 handle.
client.workflows.runs.steps.list(run.id) when you want the persisted step documents for every node, or client.workflows.runs.steps.list(run.id, node_ids=[...]) when you only need a subset of those step documents. Use client.workflows.runs.steps.get_many(run.id, [...]) when you want normalized handle payloads for a subset of nodes.
Build workflows from code
The same SDK can create and publish workflow graphs:client.workflows.list() or client.workflows.get(workflow_id) when you need to browse existing workflows before launching a run, and client.workflows.duplicate(workflow_id) when you want a draft copy of an existing flow.
Polling vs Webhooks
There are two standard ways to use a workflow in production:1. Run the workflow and poll the run
This is the core workflow model.- Start the workflow from the SDK or API
- Receive a
run.idand an initial status immediately - Poll the workflow run until it reaches a terminal status such as
completedorerror - Read the step results from the completed run
2. Add a webhook for automatic notification
If you want Retab to call your backend automatically when a workflow finishes, add a webhook on an End node. This is useful when:- your system is event-driven
- another service should start work automatically after the workflow completes
- you do not want an active polling loop
Workflow Execution Order
Workflows execute in topological order based on the node connections:- Start from Document input nodes
- Process each node once all its inputs are ready
- Continue until all nodes are processed or an error occurs
- Optionally send results to any configured Webhook output nodes
Conditional Routing
When using Classifier or If/Else nodes, only the branches that receive data are executed. Nodes on skipped branches are marked as “skipped” rather than failed.Viewing Generated Code
Every workflow can be exported as Python code. Click View Code in the sidebar to see the equivalent SDK calls for your workflow. This is useful for:- Integrating workflows into your existing codebase
- Running workflows in production environments
- Understanding how the visual nodes translate to API calls
Best Practices
Start simple
Start simple
Begin with a single Extract or Parse node, then gradually add complexity. Test each addition before moving on.
Use descriptive labels
Use descriptive labels
Rename nodes to describe their purpose (e.g., “Invoice Data” instead of “Extract 1”). This makes complex workflows easier to understand.
Add notes for documentation
Add notes for documentation
Use Note nodes to document sections of your workflow. They don’t affect execution but help explain the logic.
Validate with Human-in-the-Loop
Validate with Human-in-the-Loop
For critical data, add a HIL node after extraction. This ensures a human reviews low-likelihood results before they proceed.
Use Classifier for document routing
Use Classifier for document routing
When processing different document types, use a Classifier node to route each document to the appropriate extraction schema.
Test with sample documents
Test with sample documents
Before deploying, run your workflow with representative sample documents to catch edge cases.
Example: Invoice Processing Workflow
Here’s a common workflow pattern for processing invoices:- Start node accepts the invoice PDF
- Extract node pulls out vendor, amount, date, line items
- HIL node flags low-likelihood extractions for human review
- End node sends verified data to your webhook
Example: Multi-Document Classification Workflow
For workflows that process mixed document bundles:- Classifier routes documents by category (Invoice, Contract, Receipt)
- Each Extract node uses a document-specific schema
- Formula nodes compute derived fields for each document type
- Merge JSON combines results from all branches into a single output