The Code Book: The Secret History of Codes and Code-breaking. Simon Singh
in the plaintext either of the two symbols could be chosen, and by the end of the encipherment each symbol would also constitute roughly 1 per cent of the enciphered text. This process of allotting varying numbers of symbols to act as substitutes for each letter continues throughout the alphabet, until we get to z, which is so rare that it has only one symbol to act as a substitute. In the example given in Table 5, the substitutes in the cipher alphabet happen to be two-digit numbers, and there are between one and twelve substitutes for each letter in the plain alphabet, depending on each letter’s relative abundance.
Table 5 An example of a homophonic substitution cipher. The top row represents the plain alphabet, while the numbers below represent the cipher alphabet, with several options for frequently occurring letters.
We can think of all the two-digit numbers that correspond to the plaintext letter a as effectively representing the same sound in the ciphertext, namely the sound of the letter a. Hence the origin of the term homophonic substitution, homos meaning ‘same’ and phonos meaning ‘sound’ in Greek. The point of offering several substitution options for popular letters is to balance out the frequencies of symbols in the ciphertext. If we enciphered a message using the cipher alphabet in Table 5, then every number would constitute roughly 1 per cent of the entire text. If no symbol appears more frequently than any other, then this would appear to defy any potential attack via frequency analysis. Perfect security? Not quite.
The ciphertext still contains many subtle clues for the clever cryptanalyst. As we saw in Chapter 1, each letter in the English language has its own personality, defined according to its relationship with all the other letters, and these traits can still be discerned even if the encryption is by homophonic substitution. In English, the most extreme example of a letter with a distinct personality is the letter q, which is only followed by one letter, namely u. If we were attempting to decipher a ciphertext, we might begin by noting that q is a rare letter, and is therefore likely to be represented by just one symbol, and we know that u, which accounts for roughly 3 per cent of all letters, is probably represented by three symbols. So, if we find a symbol in the ciphertext that is only ever followed by three particular symbols, then it would be sensible to assume that the first symbol represents q and the other three symbols represent u. Other letters are harder to spot, but are also betrayed by their relationships to one another. Although the homophonic cipher is breakable, it is much more secure than a straightforward monoalphabetic cipher.
A homophonic cipher might seem similar to a polyalphabetic cipher inasmuch as each plaintext letter can be enciphered in many ways, but there is a crucial difference, and the homophonic cipher is in fact a type of monoalphabetic cipher. In the table of homophones shown above, the letter a can be represented by eight numbers. Significantly, these eight numbers represent only the letter a. In other words, a plaintext letter can be represented by several symbols, but each symbol can only represent one letter. In a polyalphabetic cipher, a plaintext letter will also be represented by different symbols, but, even more confusingly, these symbols will represent different letters during the course of an encipherment.
Perhaps the fundamental reason why the homophonic cipher is considered monoalphabetic is that once the cipher alphabet has been established, it remains constant throughout the process of encryption. The fact that the cipher alphabet contains several options for encrypting each letter is irrelevant. However, a cryptographer who is using a polyalphabetic cipher must continually switch between distinctly different cipher alphabets during the process of encryption.
By tweaking the basic monoalphabetic cipher in various ways, such as adding homophones, it became possible to encrypt messages securely, without having to resort to the complexities of the polyalphabetic cipher. One of the strongest examples of an enhanced monoalphabetic cipher was the Great Cipher of Louis XIV. The Great Cipher was used to encrypt the king’s most secret messages, protecting details of his plans, plots and political schemings. One of these messages mentioned one of the most enigmatic characters in French history, the Man in the Iron Mask, but the strength of the Great Cipher meant that the message and its remarkable contents would remain undeciphered and unread for two centuries.
The Great Cipher was invented by the father-and-son team of Antoine and Bonaventure Rossignol. Antoine had first come to prominence in 1626 when he was given a coded letter captured from a messenger leaving the besieged city of Réalmont. Before the end of the day he had deciphered the letter, revealing that the Huguenot army which held the city was on the verge of collapse. The French, who had previously been unaware of the Huguenots’ desperate plight, returned the letter accompanied by a decipherment. The Huguenots, who now knew that their enemy would not back down, promptly surrendered. The decipherment had resulted in a painless French victory.
The power of codebreaking became obvious, and the Rossignols were appointed to senior positions in the court. After serving Louis XIII, they then acted as cryptanalysts for Louis XIV, who was so impressed that he moved their offices next to his own apartments so that Rossignol père et fils could play a central role in shaping French diplomatic policy. One of the greatest tributes to their abilities is that the word rossignol became French slang for a device that picks locks, a reflection of their ability to unlock ciphers.
The Rossignols’ prowess at cracking ciphers gave them an insight into how to create a stronger form of encryption, and they invented the so-called Great Cipher. The Great Cipher was so secure that it defied the efforts of all enemy cryptanalysts attempting to steal French secrets. Unfortunately, after the death of both father and son, the Great Cipher fell into disuse and its exact details were rapidly lost, which meant that enciphered papers in the French archives could no longer be read. The Great Cipher was so strong that it even defied the efforts of subsequent generations of codebreakers.
Historians knew that the papers encrypted by the Great Cipher would offer a unique insight into the intrigues of seventeenth-century France, but even by the end of the nineteenth century they were still unable to decipher them. Then, in 1890, Victor Gendron, a military historian researching the campaigns of Louis XIV, unearthed a new series of letters enciphered with the Great Cipher. Unable to make sense of them, he passed them on to Commandant Étienne Bazeries, a distinguished expert in the French Army’s Cryptographic Department. Bazeries viewed the letters as the ultimate challenge, and he spent the next three years of his life attempting to decipher them.
The encrypted pages contained thousands of numbers, but only 587 different ones. It was clear that the Great Cipher was more complicated than a straightforward substitution cipher, because this would require just 26 different numbers, one for each letter. Initially, Bazeries thought that the surplus of numbers represented homophones, and that several numbers represented the same letter. Exploring this avenue took months of painstaking effort, all to no avail. The Great Cipher was not a homophonic cipher.
Next, he hit upon the idea that each number might represent a pair of letters, or a digraph. There are only 26 individual letters, but there are 676 possible pairs of letters, and this is roughly equal to the variety of numbers in the ciphertexts. Bazeries attempted a decipherment by looking for the most frequent numbers in the ciphertexts (22, 42, 124, 125 and 341), assuming that these probably stood for the commonest French digraphs (es, en, ou, de, nt). In effect, he was applying frequency analysis at the level of pairs of letters. Unfortunately, again after months of work, this theory also failed to yield any meaningful decipherments.
Bazeries must have been on the point of abandoning his obsession, when a new line of attack occurred to him. Perhaps the digraph idea was not so far from the truth. He began to consider the possibility that each number represented not a pair of letters, but rather a whole syllable. He attempted to match each number to a syllable, the most frequently occurring numbers presumably representing the commonest French syllables. He tried various tentative permutations, but they all resulted in gibberish – until he succeeded in identifying one particular word. A cluster of numbers (124-22-125-46-345) appeared several times on each page, and Bazeries postulated that they represented les-en-ne-mi-s, that is, ‘les ennemis’. This proved to be a crucial breakthrough.
Bazeries was then able to continue by examining other parts of the ciphertexts where these numbers appeared within different words. He then inserted