Open Source Software for Document Recognition/Analysis
https://github.com/tesseract-ocr/tesseract
https://github.com/Early-Modern-OCR
Hacker News discussion
unpaper