locally hosted web application that allows you to perform various operations on PDF files - Stirling-Tools/Stirling-PDF
Look Scanned is a pure frontend site that makes your PDFs look scanned! No need for printers and scanners anymore - everything you need to do is just a few clicks.
OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched.
PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR to existing PDFs.
Paperless is a simple Django application running in two parts: a Consumer (the thing that does the indexing) and the Web server (the part that lets you search & download already-indexed documents). If you want to learn more about its functions keep on reading after the installation section.
Fast text recognition (OCR) to create searchable PDFs. More than 100 languages available!
Compose PDF from many sources like existing PDFs, images, emails, webpages, scans or iPhone's built in document scanner (Continuity Camera).
Drag or paste everything on the window to append it to the current document and automatically apply text recognition if required.
Squeeze to reduce the size of your PDF files and saves space on your drives.
Emails and webpages get converted to paged PDF files which is great for archiving invoices you get by mail like from AppStore.
Full support for all desktop scanners.
Save, print or share the PDF files you created on a fly.
Have the text read to you by the computer voice.
How Can Tabula Help Me?
If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful it is — there's no easy way to copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface. Tabula works on Mac, Windows and Linux.
Paru—Pandoc wrapped around in Ruby
If you need to convert files from one markup format into another, pandoc is your swiss-army knife. Pandoc can convert documents in markdown, reStructuredText, textile, HTML, DocBook, LaTeX, MediaWiki markup, TWiki markup, OPML, Emacs Org-Mode, Txt2Tags, Microsoft Word docx, LibreOffice ODT, EPUB, or Haddock markup