Skip to content

Welcome to Open Parse

Easily chunk complex documents the same way a human would.

Chunking documents is a challenging task that underpins any RAG system. High quality results are critical to a sucessful AI application, yet most open-source libraries are limited in their ability to handle complex documents.

Open Parse is designed to fill this gap by providing a flexible, easy-to-use library capable of visually discerning document layouts and chunking them effectively.

Features

  • ๐Ÿ” Visually-Driven: Open-Parse visually analyzes documents for superior LLM input, going beyond naive text splitting.
  • โœ๏ธ Markdown Support: Basic markdown support for parsing headings, bold and italics.
  • ๐Ÿ“Š High-Precision Table Support: Extract tables into clean Markdown formats with accuracy that surpasses traditional tools.
  • ๐Ÿ› ๏ธ Extensible: Easily implement your own post-processing steps.
  • ๐Ÿ’กIntuitive: Great editor support. Completion everywhere. Less time debugging.



Transformation

Quick Start

Basic Example

import openparse

basic_doc_path = "./sample-docs/mobile-home-manual.pdf"
parser = openparse.DocumentParser()
parsed_basic_doc = parser.parse(basic_doc_path)

for node in parsed_basic_doc.nodes:
    print(node)

๐Ÿ““ Try the sample notebook here

Semantic Processing Example

Chunking documents is fundamentally about grouping similar semantic nodes together. By embedding the text of each node, we can then cluster them together based on their similarity.

from openparse import processing, DocumentParser

semantic_pipeline = processing.SemanticIngestionPipeline(
    openai_api_key=OPEN_AI_KEY,
    model="text-embedding-3-large",
    min_tokens=64,
    max_tokens=1024,
)
parser = DocumentParser(
    processing_pipeline=semantic_pipeline,
)
parsed_content = parser.parse(basic_doc_path)

๐Ÿ““ Sample notebook here


Cookbooks

Other Cookbooks

Sponsors

Does your use case need something special? Reach out.