This is a fork of the pdfminer tool, with a specific focus on extracting semantic XML out of OCR-ed PDF. It extracts pdf content page by page, and also identifies words and lines using distinct tags.
python libraries for pdf scraping: https://medium.com/analytics-vidhya/python-packages-for-pdf-data-extraction-d14ec30f0ad0 How to "convert pdf forms to xml" and ...
Results that may be inaccessible to you are currently showing.
Hide inaccessible results