← Home
To the index of computing posts

pdf on Linux

Compressing a pdf File

Here’s a problem: You need to send a pdf, but your file is too large to send or whatever. Don’t despair, There’s a remedy! You need the program ps2pdf from the ghostscript package. Then, all you need to do is type this into your terminal:

 ps2pdf LARGE.pdf SMALL.pdf

This will compress LARGE.pdf into SMALL.pdf. The difference in size is quite impressive! I was able to reduce a file with scanned documents from 9.2 MB down to 1.6 MB (that’s roughly five times smaller!) with hardly any noticeable decline in quality.

Getting Ghostscript
ps2pdf was already on my computer when I tried it out for the first time. It probably was installed with texlive-full, a package that I always install on my computers. Texlive is a collection of packages for the LaTeX typesetting system and includes ghostscript. You can however just download ghostscript like so (on Debian/Ubuntu and derivatives):

sudo apt install ghostscript

If that doesn’t work
ps2pdf is a script that calls ghostscript. If ps2pdf didn’t work for you, you can use ghostscript directly.

Example: An archive sent me a dossier of over 90 pages as scans of every single page in high resolution. The pages were sent as pdf files, which means every single page was a pdf file—a large one!—which made browsing through the dossier quite cumbersome. First, I concatenated all the single pages into a pdf file (using the program pdfjam).

However, that file was then over 120 Mb in size and still very slow to use. The usual trick of using ps2pdf didn’t work, so I did some browsing on the internet and found that the following command did the trick for me:

gs -o OUTFILE.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -dNOPAUSE -dBATCH INFILE.pdf

Let’s break down the elements: -o OUTFILE.pdf determines the name of the output file. -sDEVICE=pdfwrite is for ghostscript to know which output device to use. This is needed for the output to be generated at all.
-dPDFSETTINGS=/ebook: This is where the magic happens. This is the switch that is a shorthand for ghostscript to use space saving settings in the transformation of the pdf file. -dNOPAUSE -dBATCH tells ghostscript to not work in interactive mode. If you leave this out, ghostscript will ask you to hit [ENTER] after every page that is transformed, which is ,of course, tedious. INFILE.pdf determines the input file.

With this command, my pdf shrank from about 128 Mb to 28 Mb. The file now loads quickly and the quality is still fine to work with.

Shaving off even more: Converting to grayscale
You can save up a little more space by converting a pdf in colour to greyscale. To do this, you can use ghostscript again with this script I found here and changed somewhat:

#!/bin/dash

# Convert a PDF to Greyscale
# ==========================
#
# This script was found here
# https://superuser.com/questions/104656/convert-a-pdf-to-greyscale-on-the-command-line-in-floss
# on 2019-06-04.
#
# Usage: 
# ------
# 
# The input file (with colours) is the first argument ($1) of this 
# script. The output file (in greyscale) is the second argument ($2) of 
# this script.

gs \
 -sOutputFile="$2" \
 -sDEVICE=pdfwrite \
 -sColorConversionStrategy=Gray \
 -dProcessColorModel=/DeviceGray \
 -dCompatibilityLevel=1.4 \
 -dNOPAUSE \
 -dBATCH \
 -dAutoRotatePages=/None \
 "$1"

Problem: There are many possibilities to join pdf files, but typically hyperlinks get lost.

Solution: One programm that allows joining pdfs without losing hyperlinks is pdftk (PDF Tool Kit). The command-line version (PDFtk Server) can be installed as a snap.

sudo snap install pdftk

PDF files can then be joined like so (see also examples on PDF TK’s homepage):

pdftk inputfile1.pdf inputfile2.pdf cat output outputfile1.pdf

To get help on the usage of pdftk, you can also type:

pdftk --help

We can also make a little script to put together all pdfs in a directory into a single file named after the directory and appended with "_comp".


pdftk \
	./*.pdf \
	cat output \
	"${PWD##*/}"_comp.pdf

← I’m scared. Take me back HOME!

Pierre-Louis Blanchard, 29.08.2021