Archives

Archive for July, 2007

Extracting text from PDFs using python and pdftotext

( Python )
 

The answer was reasonably simple but it was very gruelling to obtain . Firstly, the false leads: 1) Prescript proved to be an out-of-date, unsupported waste of time. 2) Ghostscript has never had much emphasis on user-friendliness or documentation. Was hoping to use its pdf2ascii functionality. Can’t remember precisely what happened but I think it [...]

 

google