Introduction
Theparse method turns a document into normalized text content, returned both page-by-page and as one combined string. It is the right tool when you need readable document text for RAG pipelines, search indexing, prompting, debugging, or any workflow that works on free text rather than schema-constrained extraction.
Unlike extract, parse does not try to fit the document into a JSON schema. Instead, it returns:
pages: one parsed string per pagetext: the full document content as a single stringdocument: basic file metadatausage: page count and credits consumed
html, markdown, yaml, or json, depending on what your downstream system expects.
For chunking, chonkie is a good fit for RAG-style pipelines.
Parse API
A parsed document payload with text content and usage metadata.
Use Case: Preparing Documents For RAG
This pattern is useful when you want Retab to handle document parsing and your application to handle chunking and indexing.Best Practices
When To Use Parse
- Use
parsewhen your downstream system wants readable text. - Use
extractwhen you need typed fields that match a schema.
Picking A Table Format
- Use
markdownfor chunking, prompting, and most RAG pipelines. - Use
htmlwhen preserving table structure matters more than readability. - Use
jsonoryamlwhen another parser will consume the table output directly.
Choosing DPI
- Start with
192DPI for general-purpose parsing. - Drop to
96DPI when throughput matters more than OCR quality. - Increase toward
300DPI for scans, fine print, or low-quality images.
Indexing Advice
- Store the page number with every chunk you create from
result.pages. - Keep the original
document.idordocument.filenamealongside indexed text so retrieval results remain traceable.