Rhetorical Device

Computer Literacy: Part 4

Computer Literacy: Part 4 is an essay by Jack Rusher, published here Wednesday, September 27, 2006. It is part of Science For Humanists.

A humanistic guide to our digital laborers, part 4. See also parts one, two and three.

In the second part of this series we explored the binary search algorithm as a method for quickly finding a card in a drawer. If you’ve forgotten it, go have a look and then come back. Don’t worry, we’ll wait for you.

1. The reasons for this awful design have to do with the physical structure of computer memory. This is one of those cases where we must overcome an unfortunate but unavoidable limitation to make things work.

The aforementioned solution is a lovely thing, but it has one major drawback: it requires that the drawer be in order before the search begins. This doesn’t sound so bad, but it’s actually quite difficult to keep the drawers in order because the numbered slots in which the cards are stored — unlike in a typical filing cabinet — are inflexible and immovable1. That is to say that when we place a new card in the middle of a drawer, we cannot squeeze it between two others; instead, we must move all the cards higher than it backward by one slot, one at a time.

Pierre’s cousin, François.

Helmut’s strategy for dealing with this problem was to break up the cards into subsets kept in different drawers, one for A-L and one for M-Z. This reduced the difficulty of re-ordering them by reducing the number of cards that had to be handled while adding a new one. It worked well enough while there were about as many cards in each range, but it went badly when a large number of refugees from Alphastan, all of them with A- names, moved to Pierre’s district.

Around the Clock Filing

2. Modular arithmetic was rigorously codified by Carl Friedrich Gauss in his 1801 book Disquisitiones Arithmeticae, but it’s actually quite old.

In order to explain the technique Adalina used to overcome this difficulty, I must first explain modulus arithmetic. There’s no need for alarm. We all use modulus arithmetic2, whether or not we know that’s what it’s called. It’s the kind of “wrap-around” arithmetic we use to figure out what time to drop round for tea.

3. Tea time! Did you bring any cakes?

4. This is also how we figure out the “remainder” after we divide one number by another. Division can be performed by counting up to the dividend around a special clock that’s numbered from zero to the divisor (i.e. 500/9 can be computed by counting around a nine-digit clock face 500 times. The quotient is the number of times around the clock, the remainder is the last number reached on the clock’s face).

As an example of “clock modulus,” on an old twelve-hour clock it will be four o’clock in the afternoon3 nine hours after it’s seven o’clock in the morning. We know this because we understand that there are only twelve numbers on the clock’s face, and that we should start again at the beginning after we’ve “wrapped around” the end4.

5. The geekiest among you are muttering something about Georg Cantor right now. I can hear you.

A mathematician would call the numbers on the face of the clock a “finite set5,” which is what we’re going to do from here on out.

The reason we need to be clear about this before going on is that just about all the techniques used to optimize the storage and retrieval of information are really just clever metaphors for modulus arithmetic.

Making a Hash of the Lot

6. This, surprisingly, has nothing at all to do with a hash pipe.

One of the best algorithms for keeping data evenly distributed between a set of drawers is called a hash table6. The purpose of a hash table is to create a fairly small number of drawers into which we can evenly distribute a much larger number of cards without stopping to look in the drawers to figure out which ones are more or less full. This is kind of tricky, so we do what engineers have always done under such circumstances: we cheat.

A hash table uses an algorithm called a hash function to map each possible card to a particular drawer, and then relies on probability to ensure that the actual cards that get filed are well distributed by that function. Hash functions can be complicated, but we’re going to demonstrate the idea with a couple of very simple ones.

7. Remember: the alphabet is a set of 26 letters.

The first time Adalina used a hash table, she instructed the carpentry staff to build seven drawers for Pierre, then had him file each card based on its first letter7 using the above-mentioned clock trick to calculate the modulus of that letter’s position in the alphabet against the number of drawers. This allowed Pierre to quickly choose a drawer when a new card came in, and mostly guaranteed that the cards would end up spread evenly between the drawers. If this sounds confusing just have a look at the following figure:

I hear you saying, “Gosh, that’s pretty cool, but a sudden influx of Alphastanis would still cause havoc!” You’re right, of course, which is why actual hash functions are somewhat more complicated than what’s described above. The simplest trick used to improve this kind of algorithm is a variation on an ancient arithmetic toy of mystics and astrologers that has been repurposed by the Abstractionists.

Like Numerology, Only Useful

8. The Romans started with the Greek system, which is why the word “calculate” is based on the Latin word for small stones, “calculi.”

9. From now on “applying the clock trick” will be called “calculating the modulus” (which sounds a bit like a euphemism for masturbation) or abbreviated to “something modulo some number,” where “some number” stands for how many numerals are on the clock’s face.

10. (iso, "equal," plus psephos, "pebble.")

11. From the Greek αριθμομαντεια, literally “divination by numbers,” which would be a great name for a prog rock record.

12. The Hebrew term for this, Gematria, is a deformation of the Greek term geometry.

The Greeks learned and practiced arithmetic with small stones called “psephos8.” Among the things they did with these pebbles was add a bunch of them together and then use the clock trick9 (before there were clocks; they were clever, these Greeks) to figure the modulus of the sum against some other number. They called this process isopsephy10. It’s also the heart of an old occult practice that is now called numerology or arithmancy11,12.

13. This sort of thing has been going on since the earliest forms of writing were invented. Many ancient writing systems had a numerological advantage because the symbols of the alphabet were also used as numbers. In the old Phoenician system, for instance, the letter aleph (which is the ancestor of the letter “A”) was also the number one.

For those you who’ve never seen this kind of thing, it involves assigning a number from, say, 0-9 to each letter in the alphabet using modulus arithmetic, then adding together all the numbers for all the letters in the name, also modulo 9. The result is a single number that is thought to be symbolic of something or other13.

14. I encourage you to play around with this idea on a piece of paper to get a sense of how it works in practice.

Adalina’s way of dealing with the Alphastani exodus was to use a something very similar to numerology. Pierre decides which of the seven drawers should receive which card by assigning a number to each letter of the name on the card based on that letter’s position in the alphabet, adding all those numbers together, and then calculating the modulus of the resulting number against the number of drawers. This result is a surprisingly good distribution between the drawers, even if every name starts with an A14:

Al’Bob
A (1) + L (12) + B (2) + O (15) + B (2) = 32
32 modulo 7 = drawer 4

Al’Joe
A (1) + L (12) + J (10) + O (15) + E (5) = 43
43 modulo 7 = drawer 1

Al’Tom
A (1) + L (12) + T (20) + O (15) + M (13) = 61
61 modulo 7 = drawer 5

There are more — and way more complicated — hash functions, but they all come down to variations on this theme.

In the next installment we’ll take apart another common algorithm, the binary tree.