Guides

Using the PDF Data extractor

The PDF data extractor helps users extract structured data from PDFs into Lido's spreadsheet.

Video walkthrough


The Extra Instructions Section

The Extra Instructions section allows you to write plain English directions for extracting data from your PDFs. These instructions help guide the extraction process, especially for complex documents or when specific data points require clarification.

Common use cases and examples of Extra Instructions

Below are some example scenarios and corresponding instructions to guide the AI in extracting the required data accurately:

Formatting Numbers or Dates

If your data requires specific formatting, like numbers or dates, you can specify your desired format.

Example instruction:

Ensure that all inventory numbers are treated as numerical values, and the retail prices are formatted as currency (e.g., $1,000.00). For dates, use the format MM/DD/YYYY.

Locating Specific Items in the Document

If all of the documents in the batch have a similar structure and you run into issues with certain fields not being properly extracted by default, you can add a prompt that tells the AI where specific data points are located in the PDF.

Example instruction:

The invoice number is in the upper right-hand corner of the page. Ensure the invoice number is extracted correctly and not confused with the PO number, which appears at the center of the page.


PDFs Containing Multiple Documents

When a single PDF contains multiple invoices, purchase orders (POs), or similar documents, you can tell AI how to locate the start of each new document.

Example Instruction:

The PDF I am giving you is made up of multiple invoices. The appearance of a new 'Invoice Number' signals the start of a new invoice. For each line item in each invoice, extract the following fields: Invoice Number, Date, Vendor Name, Item Description, Quantity, and Price.

Note: Make sure that Extract Multiple Rows per Document is checked


Tips for Writing Effective Instructions

  1. Be Specific: Clearly describe where the data is located or how it appears (e.g., "The discount percentage is always located at the bottom of the table").
  2. Use Field Names: Refer to data fields explicitly (e.g., "Quantity" or "Unit Price").
  3. Provide Examples: If possible, include examples of what the data looks like in the document.
  4. Anticipate Common Errors: Address known extraction challenges (e.g., "Ensure that zeroes are included in quantity fields and not dropped").

Extract Multiple Rows per Document

The Extract Multiple Rows per Document checkbox determines how data is extracted from documents that contain multiple line items or entries. This setting is critical for ensuring that your extracted data is structured correctly based on the document's content.

What Happens When This Option Is Selected

When Extract Multiple Rows per Document is selected, the extractor assumes that each document contains multiple rows of data (e.g., line items in an invoice or purchase order). This means each line item in the document will be treated as a separate row in the resulting spreadsheet.

What Happens When This Option Is Not Selected

When Extract Multiple Rows per Document is not selected, the extractor assumes that each document contains a single row of data. This means all data from the document will be condensed into one row in the resulting spreadsheet.

Example instructions for Multiple Row extraction

Your selection of Multiple Row extraction might impact your extra instructions. A couple of basic examples are below.

Example with Multiple Rows

Instruction: "Extract all line items from each invoice. Include Invoice Number, Item Name, Quantity, and Unit Price for each row."
Result: Each item appears as a separate row with Invoice Number repeated.

Example with Single Row:

Instruction: "Extract the total amount and vendor name for each invoice. Summarize all line items into one row."
Result: A single row is created per invoice with the total amount and vendor name.

Connecting a Google Drive folder

Follow these steps to connect a Google Drive folder to Lido's data extractor.

  1. Connect your Google Drive account to Lido
  2. Share the Google Drive folder(s) that you wish to access in Lido with [email protected]
    1. The owner of the folder must match the email of the connected Google Drive account
  3. Select your folder in Lido to access all documents in the folder for extraction