Workflows - Retab Docs

What are Workflows?

Workflows are visual, block-based pipelines that let you chain together multiple document processing operations. Instead of writing code for each step, you can drag and drop blocks onto a canvas, connect them, and create powerful document automation flows. A workflow typically consists of:

Input blocks - Entry points for data:
- Document - Upload files (PDF, images, Word, Excel)
- JSON Input - Pass structured JSON data
Processing blocks - Operations like Extract, Parse, Split, Classifier
Logic blocks - Conditional flows like review gates, Function, If/Else routing, and API Call

Tests

Workflow tests validate individual block outputs with saved inputs and assertions. Use them to check that an Extract, Function, Split, or Classifier block still behaves as expected after you change schemas, prompts, code, or block configuration. Learn more in Tests.

Experiments

Experiments measure a block’s consistency by replaying the same Extract, Split, Classifier, or split-by-key For Each block over a fixed document set with multiple consensus passes. Use them to compare block configurations, find low-agreement documents or fields, and decide what needs stricter tests. Learn more in Experiments.

Creating a Workflow

Navigate to the Workflows section in your dashboard
Click Create Workflow to open a new canvas
Drag blocks from the sidebar onto the canvas
Connect blocks by dragging from output handles to input handles
Configure each block by clicking on it
Your workflow auto-saves as you build

Connecting Blocks

Blocks communicate through handles that define the type of data they accept or produce:

Handle Type	Icon	Description
File	📎	Document files (PDF, images, Word, Excel)
JSON	`{ }`	Structured data extracted from documents

Connection Rules

File → File: Pass documents between processing blocks
JSON → JSON: Pass extracted data between logic blocks
Each input handle accepts only one connection
Connections validate automatically to prevent incompatible links

Declarative Workflow Spec

You can also define and manage workflows from YAML. A declarative spec uses apiVersion: workflows.retab.com/v1alpha2 and keeps topology in spec.edges. Every edge endpoint is explicit: it names the block and the raw runtime handle.

apiVersion: workflows.retab.com/v1alpha2
kind: Workflow
metadata:
  id: wf_abc123
  name: Invoice Workflow

spec:
  blocks:
    start_document-node:
      type: start_document
      label: Input Document

    extract-node:
      type: extract
      label: Extract Fields
      config:
        inputs:
          - name: source_doc
            type: file
            is_primary: true
        json_schema:
          type: object
          properties: {}

  edges:
    - from:
        block: start_document-node
        handle: output-file-0
      to:
        block: extract-node
        handle: input-file-source_doc

The SDK exposes the spec lifecycle under client.workflows.spec:

validation = client.workflows.spec.validate(yaml_definition)
plan = client.workflows.spec.plan(yaml_definition)
created = client.workflows.spec.apply(yaml_definition)
updated = client.workflows.spec.apply_to_workflow("wf_abc123", yaml_definition)
exported = client.workflows.spec.get(created.workflow_id)

Use validate() for parse and handle checks, plan() to preview changes, apply() to create a new workflow from YAML, apply_to_workflow() to modify an existing workflow draft, and get() to get canonical YAML from an existing workflow. plan() and apply() return Terraform-style summary, resource_changes, and rendered_plan fields so clients can inspect exactly what changed. Call client.workflows.publish(workflow_id) separately when the draft should become the live published workflow. Publishing happens inside the current environment — the API key decides whether you are publishing into test or production. To move a workflow across environments, see Promotion. For endpoint details, see:

Edit Mode vs Run Mode

Workflows have two operational modes:

Edit Mode

Add, remove, and configure blocks
Create and delete connections
Rename the workflow
View generated Python code

Run Mode

Upload documents to input blocks
Execute the workflow step-by-step
View results at each stage
Download processed files and extracted data

Toggle between modes using the switch at the top of the canvas.

Running a Workflow

A workflow is fundamentally an asynchronous job. When you start it, Retab creates a workflow run, executes each step on the server, and stores the results on that run. You can then poll the run until it finishes and inspect the stored step outputs. For the SDK and HTTP endpoint details, see the workflow API reference:

From the Dashboard

Switch to Run Mode
Upload a document to each Document input block
Click Run Workflow
Watch as each block processes (status indicators show progress)
Click on output handles to view results

Using the SDK

The Python, Node, and Go SDKs expose workflow metadata, graph authoring, run execution, and typed step inspection:

client.workflows.* / client.Workflows.* for list(), get(), create(), update(), delete(), and publish()
client.workflows.blocks.* / client.Workflows.Blocks.* and client.workflows.edges.* / client.Workflows.Edges.* for programmatic graph changes
client.workflows.runs.* / client.Workflows.Runs.* and client.workflows.steps.* / client.Workflows.Steps.* for running flows and reading results

Discover input block IDs

Workflow run inputs are keyed by the IDs of your start_document and start_json blocks. List the workflow’s blocks to discover them.

from retab import Retab

client = Retab()

blocks = client.workflows.blocks.list("wf_abc123")

document_start_id = next(block.id for block in blocks.data if block.type == "start_document")
json_start_id = next(block.id for block in blocks.data if block.type == "start_json")

Run and wait for completion

Workflows support two input maps:

documents for Document (start_document) blocks
json_inputs for JSON Input (start_json) blocks

import time
from pathlib import Path

from retab import Retab

client = Retab()

workflow = client.workflows.get("wf_abc123")
blocks = client.workflows.blocks.list(workflow.id)
document_start_id = next(block.id for block in blocks.data if block.type == "start_document")
json_start_id = next(block.id for block in blocks.data if block.type == "start_json")

run = client.workflows.runs.create(
    workflow_id=workflow.id,
    documents={
        document_start_id: Path("path/to/invoice.pdf"),
    },
    json_inputs={
        json_start_id: {"customer_id": "cust_123", "priority": "high"},
    },
)

terminal_statuses = {"completed", "error", "cancelled"}
while run.lifecycle.status not in terminal_statuses and run.lifecycle.status != "awaiting_review":
    time.sleep(1)
    run = client.workflows.runs.get(run.id)

print(run.lifecycle.status)
if run.lifecycle.status == "awaiting_review":
    print(run.lifecycle.waiting_for_block_ids)
elif run.lifecycle.status == "error":
    raise RuntimeError(run.lifecycle.message)
elif run.lifecycle.status == "cancelled":
    raise RuntimeError(run.lifecycle.reason or "Workflow run was cancelled")
else:
    for step_summary in client.workflows.steps.list(run.id):
        step = client.workflows.steps.get(step_summary.step_id)
        if step.handle_outputs:
            print(step.block_id, step.handle_outputs)

steps.list(run.id) returns the step roster for a run. For the full execution record for one block, including typed inputs and outputs, use steps.get(run.id, block_id).

Inspect step outputs

Start with steps.list(run.id) when you need the blocks that ran. Then call steps.get(run.id, block_id) for the specific execution record you want to inspect. Step payloads are normalized into HandlePayload objects. For JSON-producing blocks, extracted_data is shorthand for the default output-json-0 handle.

# Step roster:
for step in client.workflows.steps.list(run.id):
    print(step.block_id, step.lifecycle.status)
    if step.artifact:
        print(step.artifact.operation, step.artifact.id)

# Full execution record for one step:

step = client.workflows.steps.get(run.id, "extract-block-id")
print(step.lifecycle.status)
if step.extracted_data:
print(step.extracted_data)

Use steps.list(run.id, block_ids=[...]) when you only need a subset of step summaries. Use steps.get(run.id, block_id) when you need the normalized execution record for a single block.

Fetch the artifact record

Some blocks persist a durable artifact record. step.artifact is only the stable pointer:

{ "operation": "conditional_evaluation", "id": "ceval_abc123" }

Use client.workflows.artifacts.get(step.artifact) to dereference that pointer. The response is the backing record flattened with operation at the top level, so consumers can dispatch on one object without juggling an extra record wrapper.

step = client.workflows.steps.get(run.id, "conditional-block-id")
if step.artifact:
    artifact = client.workflows.artifacts.get(step.artifact)
    print(artifact.operation)
    print(artifact.matched_condition_ids)
    print(artifact.evaluations)

workflows.artifacts.list(run.id) dereferences every artifact produced by a run. Pass operation= or block_id= when you only need a subset.

condition_records = client.workflows.artifacts.list(
    run.id,
    operation="conditional_evaluation",
)

operation	produced by	record includes
`extraction`	`extract`	extraction result, choices, likelihoods, schema details
`split`	`split`	split result and output document grouping
`classification`	`classifier`	selected class and consensus details
`parse`	`parse`	parsed document content
`edit`	`edit`	edited document result
`partition`	`for_each_sentinel_start`	partitioned items for the loop
`conditional_evaluation`	`conditional`	`evaluations`, `selected_handles`, `matched_condition_ids`
`while_loop_termination`	`while_loop`	termination reason and final condition evaluations
`api_call_invocation`	`api_call`	request/response attempts, retry trace, and final error
`function_invocation`	`function`	function inputs, output, duration, and final error

Build workflows from code

The same SDK can create and publish workflow graphs:

workflow = client.workflows.create(name="Invoice Pipeline")
blocks = client.workflows.blocks.list(workflow.id)
start_document_block = next(block for block in blocks.data if block.type == "start_document")

extract_block = client.workflows.blocks.create(
workflow.id,
id="extract-invoice",
type="extract",
label="Extract Invoice",
position_x=320,
position_y=0,
config={
"json_schema": {
"type": "object",
"properties": {
"invoice_number": {"type": "string"},
"total_amount": {"type": "number"},
},
},
},
)

client.workflows.edges.create(
workflow.id,
id="edge-start-to-extract",
source_block=start_document_block.id,
target_block=extract_block.id,
source_handle="output-file-0",
target_handle="input-file-0",
)

client.workflows.publish(workflow.id, description="Initial version")

Use client.workflows.list() or client.workflows.get(workflow_id) when you need to browse existing workflows before launching a run.

Reading Workflow Results

The standard production pattern is to run the workflow, keep the returned run.id, and poll the run until lifecycle.status reaches completed, error, cancelled, or awaiting_review.

Start the workflow from the SDK or API
Receive a run.id and an initial lifecycle immediately
Poll the workflow run until it finishes or waits for review
Read the step results from the completed run

The workflow run is the source of truth for execution state and outputs. This is enough for many scripts, internal tools, and backend services.

Workflow Execution Order

Workflows execute in topological order based on the block connections:

Start from Document input blocks
Process each block once all its inputs are ready
Continue until all blocks are processed or an error occurs
Read outputs from the completed run and its step results

If a block fails, execution stops and the error is displayed on that block.

Conditional Routing

When using Classifier or If/Else blocks, only the branches that receive data are executed. Blocks on skipped branches are marked as “skipped” rather than failed.

Viewing Generated Code

Every workflow can be exported as Python code. Click View Code in the sidebar to see the equivalent SDK calls for your workflow. This is useful for:

Integrating workflows into your existing codebase
Running workflows in production environments
Understanding how the visual blocks translate to API calls

Best Practices

Start simple

Begin with a single Extract or Parse block, then gradually add complexity. Test each addition before moving on.

Use descriptive labels

Rename blocks to describe their purpose (e.g., “Invoice Data” instead of “Extract 1”). This makes complex workflows easier to understand.

Add notes for documentation

Use Note blocks to document sections of your workflow. They don’t affect execution but help explain the logic.

Validate with review gates

For critical data, add a review gate to the extraction block. This ensures a reviewer checks low-likelihood results before they proceed.

Use Classifier for document routing

When processing different document types, use a Classifier block to route each document to the appropriate extraction schema.

Test with sample documents

Before deploying, run your workflow with representative sample documents to catch edge cases.

Example: Invoice Processing Workflow

Here’s a common workflow pattern for processing invoices:

Start block accepts the invoice PDF
Extract block pulls out vendor, amount, date, line items
The extract block’s review gate flags low-likelihood extractions for review
Read the verified data from the completed workflow run

Example: Multi-Document Classification Workflow

For workflows that process mixed document bundles:

Classifier routes documents by category (Invoice, Contract, Receipt)
Each Extract block uses a document-specific schema
Function blocks compute derived fields for each document type
Merge JSON combines results from all branches into a single output

​What are Workflows?

​Tests

​Experiments

​Creating a Workflow

​Connecting Blocks

​Connection Rules

​Declarative Workflow Spec

​Edit Mode vs Run Mode

​Edit Mode

​Run Mode

​Running a Workflow

​From the Dashboard

​Using the SDK

​Discover input block IDs

​Run and wait for completion

​Inspect step outputs

​Fetch the artifact record

​Build workflows from code

​Reading Workflow Results

​Workflow Execution Order

​Conditional Routing

​Viewing Generated Code

​Best Practices

​Example: Invoice Processing Workflow

​Example: Multi-Document Classification Workflow