Quick Links

PDF files were designed to promote sharing. Everyone can open them---in their web browser if they have nothing else. Linux lets you manipulate, merge, and split PDF files on the command line.

The Portable Document Format

The Portable Document Format (PDF) solved a problem. When you created a document on a computer and wanted to share it with someone else, sending them the document didn't always work.

Even if they had the same software package you'd used to create your document, they might not have the same fonts installed on their computer that you had on yours. They'd be able to open the document but it would look wrong.

Related: How to Install Google and Microsoft Fonts on Linux

If they didn't have a copy of the software you used to create the package they wouldn't be able to open it at all. If you used software that was only available on Linux, it was pointless sending that document to someone who only used Windows.

Adobe created a new file format in 1992 and called it the portable document format. Documents created to that standard---ISO 32000---contain the images and fonts needed to correctly render the contents of the file. PDF files can be opened by PDF viewers on any platform. It was a cross-platform, simple, and elegant solution.

A PDF file isn't intended to be malleable like a word-processor document. They don't readily lend themselves to editing. If you need to change the content of a PDF, it's always better to go back to the source material, edit that, and generate a new PDF. In contrast to trying to change the content, structural manipulations can be performed on PDF files with relative ease.

Here are some ways to create PDF files on Linux, and how to perform some of the transforms that can be applied to them.

Creating PDF Files on Linux

Many of the applications available on Linux can generate PDF files directly. LibreOffice has a button right on the toolbar that generates a PDF of the current document. It couldn't be easier.

The LibreOffice Writer PDF button

For fine-grained control of PDF creation, the Scribus desktop publishing application is hard to beat.

If you need to create documents with scientific or mathematical content, perhaps for submission to academic journals, an application that uses LaTeX, such as Texmaker, will be perfect for you.

Related: How to Use pandoc to Convert Files on the Linux Command Line

If you prefer a plain-text workflow, perhaps using Markdown, you can use

        pandoc
    

to convert to, and from, a great many file formats, including PDF. We have a guide dedicated to

        pandoc
    

 but a simple example will show you how easy it is to use.

Install Texmaker first.

        pandoc
    

relies on some LaTeX libraries for PDF generation. Installing Texmaker is a convenient way to meet those dependencies.

The

        -o
    

(output) option is used to specify the type of file that will be created. The "raw-notes.md" file is a plain-text Markdown file.

pandoc -o new.pdf raw-notes.md

Using pandoc to create a PDF from a Markdown file

If we open the "new.pdf" file in a PDF viewer we see that it is a correctly-formed PDF.

Opening the PDF created by pandoc

The qpdf Command

The  qpdf  command allows you to manipulate existing PDF files, whilst preserving their content. The changes you can make are structural. With qpdf you can perform tasks such as merging PDF files, extracting pages, rotating pages, and setting and removing encryption.

To install qpdf on Ubuntu use this command:

sudo apt install qpdf

Installing qpdf on Ubuntu

The command on Fedora is:

sudo dnf install qpdf

Installing qpdf on Fedora

On Manjaro you must type:

sudo pacman -S qpdf

Installing qpdf on Manjaro

Merging PDF Files

At first, some of the qpdf command line syntax may seem confusing. For example, many of the commands expect an input PDF file.

If a command doesn't require one, you need to use the --empty option instead. This tells qpdf not to expect an input file. The --pages option lets you choose pages. If you just provide the PDF names, all pages are used.

To combine two PDF files to form a new PDF file, use this command format.

qpdf --empty --pages first.pdf second.pdf -- combined.pdf

Combining two PDF files to create a new PDF file

This command is made up of:

  • qpdf: Calls the qpdf command.
  • --empty: Tells qpdf there is no input PDF. You could argue that "first.pdf" and "second.pdf" are input files, but qpdf considers them to be command line parameters.
  • --pages: Tells qpdf we're going to be working with pages.
  • first.pdf second.pdf: The two files we're going to extract the pages from. We've not used page ranges, so all pages will be used.
  • --: Indicates the end of the command options.
  • combined.pdf: The name of the PDF that will be created.

If we look for PDF files with ls, we'll see our two original files---untouched---and the new PDF called "combined.pdf."

ls -hl first.pdf second.pdf combined.pdf

Using ls to list the existing and new PDF files

There are two pages in "first.pdf" and one page in "second.pdf." The new PDF file has three pages.

The new PDF file has all the pages from the two original PDF files

You can use wildcards instead of listing a great many source files. This command creates a new file called "all.pdf" that contains all the PDF files in the current directory.

qpdf --empty --pages *.pdf -- all.pdf

Using wildcards in the qpdf command line

We can use page ranges by adding the page numbers or ranges behind the file names the pages are to be extracted from.

This is will extract pages one and two from "first.pdf" and page two from "second.pdf." Note that if "combined.pdf" already exists it isn't overwritten. It has the selected pages added to it.

qpdf --empty --pages first.pdf 1-2 second.pdf 1 -- combined.pdf

Using page ranges to select the pages to add to the new file

Page ranges can be as detailed as you like. Here, we're asking for a very specific set of pages from a large PDF file, and we're creating a summary PDF file.

qpdf --empty --pages large.pdf 1-3,7,11,18-21,55 -- summary.pdf

Using a complicated set of page ranges

The output file, "summary.pdf" contains pages 1 to 3, 7, 11, 18 to 21, and 55 from the input PDF file. This means there are 10 pages in "summary.pdf"

Page 10 of the new PDF is page 55 from the source file

We can see that page 10 is page 55 from the source PDF.

Splitting PDF Files

The opposite of merging PDF files is splitting PDF files. To split a PDF into separate PDF files each holding a single page, the syntax is simple.

The file we're splitting is "summary.pdf", and the output file is given as "page.pdf." This is used as the base name. Each new file has a number added to the base name. The --split-pages option tells qpdf what type of action we're performing.

qpdf summary.pdf page.pdf --split-pages

Splitting a PDF file into many PDF files of one page each

The output is a series of sequentially numbered PDF files.

ls page*.pdf

using ls to list the numbered PDF files

If you don't want to split out every page, use page ranges to select the pages you want.

If we issue this next command, we'll split out a collection of single-page PDF files. The page ranges are used to specify the pages or ranges we want, but each page is still stored in a single PDF.

qpdf large.pdf section.pdf --pages large.pdf 1-5,11-14,60,70-100 -- --split-pages

Splitting a PDF with page ranges

The extracted pages have names based on "section.pdf" with a sequential number added to them.

ls section*.pdf

using ls to list the numbered PDF files

If you want to extract a page range and have it stored in a single PDF, use a command of this form. Note that we don't include the --split-pages option. Effectively, what we're doing here is a PDF merge, but we're only "merging" pages from one source file.

qpdf --empty --pages large.pdf 8-13 -- chapter2.pdf

Extracting a range of pages from a PDF file and storing them in one new PDF file

This creates a single, multi-page PDF called "chapter2.pdf."

Rotating Pages

To rotate a page, we create a new PDF that's the same as the input PDF with the specified page rotated.

We use the --rotate option to do this. The +90 means rotate the page 90 degrees clockwise. You can rotate a page 90, 180, or 270 degrees. You can also specify the rotation in degrees anticlockwise, by using a negative number, but there's little need to do so. A rotation of -90 is the same as a rotation +270.

The number separated from the rotation by a colon ":" is the number of the page you want to rotate. This could be a list of page numbers and page ranges, but we're just rotating the first page. To rotate all pages use a page range of 1-z.

qpdf --rotate=+90:1 summary.pdf rotated1.pdf

Rotating the first page of a PDF

The first page has been rotated for us.

A PDF file with the first page rotated 90 degrees clockwise

Encrypting and Decrypting

PDF documents can be encrypted so that they require a password to open them. That password is called the user password. There's another password that's required to change the security and other permission settings for a PDF. It's called the owner password.

To encrypt a PDF we need to use the --encrypt option and provide both passwords. The user password comes first on the command line.

We also specify the strength of encryption to use. You'd only need to drop from 256-bit encryption to 128-bit if you want to support very old PDF file viewers. We suggest you stick with 256-bit encryption.

We're going to create an encrypted version of the "summary.pdf" called "secret.pdf."

qpdf --encrypt hen.rat.squid goose.goat.gibbon 256 -- summary.pdf secret.pdf

Creating an encrypted PDF

When we try to open the PDF, the PDF viewer prompts us for a password. Entering the user password authorizes the viewer to open the file.

A PDF viewer promtping for the password to open an encrypted PDF file

Remember that qpdf doesn't change the existing PDF. It creates a new one with the changes we've asked it to make. So if you make an encrypted PDF you'll still have the original, unencrypted version. Depending on your circumstances you might want to delete the original PDF or safely store it away.

To decrypt a file, use the --decrypt option. Obviously, you must know the owner password for this to work. We need to use the --password option to identify the password.

qpdf --decrypt --password=goose.goat.gibbon secret.pdf unlocked.pdf

Creating a decrypted PDF from an encrypted PDF

The "unlocked.pdf" can be opened without a password.

qpdf is an Excellent Tool

We're deeply impressed with qpdf. It provides a flexible and richly featured toolset for working with PDF files. And it is very fast, too.

Check out their well-written and detailed documentation to see just how much more it can do.