Quick Links

When your documents are mainly text only in nature, then it would seem like the file sizes for .docx and .pdf versions should be fairly similar when saved, but that is not always the case. Today's SuperUser Q&A post has the answer to a curious reader's questions about the large difference in file sizes.

Today’s Question & Answer session comes to us courtesy of SuperUser—a subdivision of Stack Exchange, a community-driven grouping of Q&A web sites.

Boxing gloves clip-art courtesy of Clker.com.

The Question

SuperUser reader Borek wants to know why PDF files generated by Microsoft Word are so large:

I created a simple Microsoft Word document containing just this sentence, nothing else:

  • This is a small document.

Then I saved the document as .docx and .pdf files. Here are the file sizes:

  • .docx: 12 kB
  • .pdf: 89 kB

The difference between the two files is huge (technically) and it really bothers me when documents that are mostly textual in nature are just tens of kB in .docx format, but are hundreds of kB in size when converted to PDF files. What is so inefficient about the PDF format? Is it just Microsoft Word using some terrible output algorithm?

By the way, the PDF output settings on my Microsoft Office installation are set to create the smallest files possible:

why-are-pdf-files-generated-by-microsoft-word-so-large-01

Why are PDF files generated by Microsoft Word so large?

The Answer

SuperUser contributor rene has the answer for us:

If you open the PDF file in Notepad++, you will find:

why-are-pdf-files-generated-by-microsoft-word-so-large-02

And that object is referenced here at the end in the /FontFile2 instruction:

why-are-pdf-files-generated-by-microsoft-word-so-large-03

The fonts used by a Microsoft Word document are embedded into PDF files so that they are self-contained. I used this slide-deck from Adobe to decipher the PDF instructions.

If you want to prevent fonts from being embedded in a PDF file, then make sure your Microsoft Word documents make use of one of the 14 standard typefaces available in PDF viewers (Source: Wikipedia).

  • Times New Roman > Times (v3) (in regular, italic, bold, and bold italic)
  • Courier New > Courier (in regular, oblique, bold, and bold oblique)
  • Arial > Helvetica (v3) (in regular, oblique, bold, and bold oblique)
  • Symbol > Symbol
  • Wingdings > Zapf Dingbats

Have something to add to the explanation? Sound off in the comments. Want to read more answers from other tech-savvy Stack Exchange users? Check out the full discussion thread here.