18 February 2019

How to check an index that contains figure and table numbers

Whenever we typeset an index, we always carry out routine checks such as verifying that entries are in alphabetical order, and that page numbers are in numerical order. This is particularly useful when a book has been indexed by its author rather than by a professional indexer.

We’ve developed an indexcheck script to do this automatically — much faster and way more accurate than any manual process — so adding value in this way doesn’t carry any great overhead for us.

Although it’s quite usual for an index to have page numbers in bold and/or italics to indicate figures and/or tables, we recently had an index which also cited the actual figure number and table number, rather than the number of the page:

The publisher was happy to index tables/figures in this way, arguing that it was easy to locate them given the book’s chapter numbering system.

Obviously, our indexcheck script wanted to sort those numbers like this:

  Source: 26–30, 2.3b, 2.4b, 2.5b, 2.6i
  Sorted: 2.3b, 2.4b, 2.5b, 2.6i, 26–30

(Adding “b” and “i” suffixes for bold and italic is an easy way to ensure figures and tables sort after any reference to main text on the same page.)

With nearly 200 figures and tables in the book, manual reconciliation of page numbers clearly wasn’t an option, but nor were we prepared to skip this part of the checking process. How could we accommodate this referencing style in order to verify the alphanumeric integrity of the index as a whole?

The answer lay in implementing a supplementary routine for indexcheck.

First, we used InDesign’s table of contents to generate a list of figures based on the “Caption” paragraph style, putting the page number first, followed by a tab:

This produced a listing as follows:

from which it was simple to grep out the unwanted content to create a list in the form of page number followed by figure or table number, separated by a tab:

A few lines of code transformed that list into an associative array:

"1.1b" =>  "2←Figure 1.1"
"1.2b" =>  "5←Figure 1.2"
"1.3b" =>  "7←Figure 1.3"
"1.4b" =>  "12←Figure 1.4"
"1.5b" =>  "13←Figure 1.5"
"1.6b" =>  "13←Figure 1.6"
"1.7b" =>  "14←Figure 1.7"
"1.8b" =>  "14←Figure 1.8"
"1.9b" =>  "15←Figure 1.9"
"1.10b" => "15←Figure 1.10"
"2.1i" =>  "18←Table 2.1"
"2.2i" =>  "19←Table 2.2"

which could then be used as a supplementary search and replace before sorting the numbers.

As a result, our initial indexcheck output of:

  Source: 26–30, 2.3b, 2.4b, 2.5b, 2.6i
  Sorted: 2.3b, 2.4b, 2.5b, 2.6i, 26–30

now became:

  Source: 26–30, 27←Figure 2.3, 28←Figure 2.4, 28←Figure 2.5, 27←Table 2.6
  Sorted: 26–30, 27←Figure 2.3, 27←Table 2.6, 28←Figure 2.4, 28←Figure 2.5

which made it easy to see that the reference to Table 2.6 — the italicised “2.6” in our source material — was in the wrong place:

and that the correct listing should instead be:

In all we found 32 figure/table references that were out of order, so it was well worth making the effort to do the checking.

This fun coding project not only enabled us to maintain our standards and add value to the project, but the workflow can now also be redeployed on any future indexes in this form.