Skip to content

Preprocessing

read_pdf_as_markdown_via_pymupdf4llm(file_name, base_path)

Read a PDF file and convert it to markdown using pymupdf4llm.

Parameters:

Name Type Description Default
file_name str

The name of the PDF file to read.

required
base_path Path

The base path where the PDF file is located.

required

Returns: The Markdown text.

Source code in src/kibad_llm/preprocessing.py
11
12
13
14
15
16
17
18
19
20
def read_pdf_as_markdown_via_pymupdf4llm(file_name: str, base_path: Path) -> str:
    """Read a PDF file and convert it to markdown using pymupdf4llm.

    Args:
        file_name: The name of the PDF file to read.
        base_path: The base path where the PDF file is located.
    Returns:
        The Markdown text.
    """
    return pymupdf4llm.to_markdown(str(base_path / file_name))