You can easily convert PDF files to editable text in Linux using the “pdftotext” command line tool. However, if there are any images in the original PDF file, they are not extracted. To extract images from a PDF file, you can use another command line tool called “pdfimages”.

NOTE: When we say to type something in this article and there are quotes around the text, DO NOT type the quotes, unless we specify otherwise.

The “pdfimages” tool is part of the poppler-utils package. You can check to see if it’s installed on your system and install it if necessary using the steps described in this article.

To extract images from a PDF file using pdfimages, press “Ctrl + Alt + T” to open a Terminal window. Type the following command at the prompt.

pdfimages /home/lori/Documents/SampleWithImages.pdf /home/lori/Documents/ExtractedImages/image

NOTE: For all the commands shown in this article, replace the first path in the command and the PDF filename to the path and filename for your original PDF file. The second path should be the path to the root folder into which you want to save the extracted images. The word “image” at the end of the second path represents whatever you want to preface your filename with. The filenames of the images are numbered automatically (000, 001, 002, 003, etc.). If you want to add text to the beginning of each image, enter that text at the end of the second path. In our example, each image filename will start with “image”, such as image-001.ppm, image-002.ppm, etc. A dash is added between the text you specify and the number.

01_entering_command_to_extract_images

The default image format is PPM (portable pixmap) for non-monochrome images, or PBM (portable bitmap) for monochrome images. These formats are designed to be easily exchanged between platforms.

NOTE: You may get two image files for each image in your PDF file. The second image for each image is blank, so, you’ll be able to tell which images contain the images from the file by the thumbnail on the file in the File Manager.

02_ppm_images

To create .jpg image files, add the “-j” option to the command, as shown below.

pdfimages -j /home/lori/Documents/SampleWithImages.pdf /home/lori/Documents/ExtractedImages/image

NOTE: You can also change the default output to PNG using the “-png” option or TIFF using the “-tiff” option.

03_command_for_jpg_files

The main image file for each image is saved as a .jpg file. The second blank image is still a .ppm or .pbm file.

04_jpg_images

If you only want to convert images on and after a certain page, use the “-f” option with a number to indicate the first page to convert, as shown in the example command below.

pdfimages -f 2 -j /home/lori/Documents/SampleWithImages.pdf /home/lori/Documents/ExtractedImages/image

NOTE: We combined the "-j" option with the "-f" option so we would get .jpg images and did the same with the "-l" option mentioned below as well.

05_first_page_to_convert

To convert all images before and on a certain page, use the “-l” (a lowercase "L", not the number "1") option with a number to indicate the last page to convert, as shown below.

pdfimages -l 1 -j /home/lori/Documents/SampleWithImages.pdf /home/lori/Documents/ExtractedImages/image

NOTE: You can use the “-f” and “-l” options together to convert images in a specific page range in the middle of your document.

06_last_page_to_convert

If there is an owner password on the PDF file, use the “-opw” option and the password in single quotes, as shown below. If the password on the PDF file is a user password, use the “-upw” option instead with the password.

NOTE: Make sure there are single quotes around your password in the command.

pdfimages -opw ‘password’ -j /home/lori/Documents/SampleWithImages.pdf /home/lori/Documents/ExtractedImages/image

07_owner_password_pdf_file

For more information about using the pdfimages command, type “pdfimages” at the prompt in a Terminal window and press “Enter”. The command usage displays with a list of options available for use in the command.