source ~/pdf_env/bin/activate pip install --upgrade pip pip install pdftotree==0.2.13 pip install h5py==2.10.0 pip install tensorflow pip install Keras pip install spacy python -m spacy download ...
This Clowder extractor converts pdf documents to text and json. It uses GROBID 0.8.0 to convert pdf to xml and then uses s2orc-doc2json to convert xml to json. The doc2txt/json2txt is used to convert ...