To run tesseract on each image file using a single command, we need to use a for loop.
In Linux we can easily split PDF documents by pages using the command line utility called pdftk.įrom this article you will learn how to extract individual pages or a range of pages from a PDF file and save them as another PDF document.Ĭool Tip: Plan to send this PDF somewhere or just keep? How about to protect it with a password? This is really easy for ones who split PDF files from the command line! Read more →įirst of all it is required to install the pdftk utility: $ sudo apt-get install pdftk Split PDF FileĮxtract the 5th page from the ORIG_FILE.pdf and save it to the NEW_FILE.pdf: $ pdftk ORIG_FILE.pdf cat 5 output NEW_FILE.pdfĮxtract several individual pages: $ pdftk ORIG_FILE.pdf cat 1 4 6 output NEW_FILE.pdfĬool Tip: Merge PDF files in Linux using the ghostscript command! Read more →Įxtract a range of pages: $ pdftk ORIG_FILE.pdf cat 1-5 output NEW_FILE.pdfĮxtract the combination of individual pages and a range of pages: $ pdftk ORIG_FILE.pdf cat 1 5 7 10-12 output NEW_FILE. We’ll call our image files turing-01.png, turing-02.png, and so on: pdftoppm -png turing.pdf turing. Sometimes it is required to extract some pages from a PDF file and save them as another PDF document.