Gregory Hildstrom Projects Publications Resume Links Contact About Google+ Facebook Youtube Donate




Introduction
Performance
Windows
Mac OS X
Linux

Introduction

I got a help call from my wife (a paralegal) asking about adding Bates numbering (aka Bates labeling or Bates stamping) to a PDF scan of a legal document. Bates numbering generally consists of a sequential 6+ digit number (the Bates number, Bates label, or Bates stamp) that is stamped in the corner of documents. Many law firms prefix the Bates number with a letter or number code that corresponds to a particular case or client. So, the first document in my case may be 20 pages and Bates numbered HILD0000001 through HILD0000020. The second document in my case may be 5 pages and Bates labeled HILD0000021 through HILD0000025. This allows fine-grained tracking of every single page related to a case.

Wow, Bates numbering software is expensive. During my quick search I came across several shareware programs with 7-day trial periods and $200+ price tags and many more in the $30-$50 price range. They may be full-featured, flexible, and convenient, but that was way out of our price range for a simple tool like this. Fortunately, I had done a bunch of PDF automation with
PDFtk (included) in the past, which is an excellent tool, so I knew there was probably a cheaper and easier way.

I created a tiny C program (free GPL open-source code included) to generate the overlay pdf file with the Bates numbers (aka Bates labels or Bates stamps). It prompts the user for label location, prefix, number format, and font size before generating the overlay pdf file. I adjusted the margins and page layout so the Bates label is about 0.25" from the bottom, which will be in the white margins of most documents and not on top of existing text. The label text has a gray background in case that page area is not white. After the overlay/labeling pdf has been generated, the driver script uses PDFtk's multistamp operation to do the actual page-by-page overlay, which puts the Bates label pages on top of the input pdf file pages and stores the results in output pdf file pages. I packaged everything in an archive file that does not need to be installed; it can simply be extracted and run from anywhere on your computer. I changed the name to from bates-number-a-pdf to bates-label-a-pdf to differentiate it from the unrefined first version and to call attention to the fact that it goes beyond simple numbering with a prefix. It handles legal paper size, letter paper size, and probably many more. Also, I included many other helper scripts in the archives so this should be all you need to assemble and Bates number large pdf files.

Performance

These timings were performed on a Fedora 20 workstation with 16GB RAM, a quad-core Intel Core i7 2.8GHz CPU, and a 512GB SSD. The times reported are the sum of "user" and "system", not "real", so time spent waiting for typed input at the prompts is not included. I used readily available pdf files for comparison purposes.
for FILE in test_files/*.pdf; do
    time ./bates-label-a-pdf.sh $FILE
done
pdfpagesMBsecondspages per second
legal1.pdf (included)10.061.7390.575
lessons_in_electric_circuits_ac.pdf5283.882.277231.884
lessons_in_electric_circuits_dc.pdf5304.362.229237.775
linux_device_drivers.pdf63212.882.372266.442
light and matter.pdf102083.022.989341.251
light and matter x10.pdf10200830.1128.278360.704

Bates Label a PDF for Windows

bates-label-a-pdf-windows.zip (2013-08)

I tested on Windows 8 64-bit, Windows 7 64-bit, and Windows XP 32-bit.
After unzipping the .zip file, here are the usage steps:





Bates Label a PDF for Mac OS X

bates-label-a-pdf-mac.tar.bz2 (2013-09)

I tested on Mac OS X 10.7.5 Lion and 10.8.4 Mountain Lion, but it should work on other versions too.
After downloading the .tar.bz2 file, here are the usage steps:





Bates Label a PDF for Linux

bates-label-a-pdf-linux.tar.bz2 (2013-08)

I tested on Fedora 19, but it should work on other versions and distributions too.
After downloading the .tar.bz2 file, here are the usage steps: