Enabling OCR
1. Default Text Processing with PdfMiner¶
Use PdfMiner if your documents are text-heavy, well-structured, and do not contain non-text elements that require OCR.
2. Optionally OCR with PyMuPDF¶
If your documents are scanned images or contain non-text elements, you may need to use OCR to extract text. PyMuPDF handles this, see their license here.
Use with caution
This method is not recommended as a default due to the additional computational cost and inherent inaccuracies of OCR.