Note: Unlike the previous steps, this and subsequent steps are speculative and incomplete. I provide motivation and observations that indicate some methods of attack that are consistent with more clues and features of Kryptos. I think that these concepts are of sufficient promise to warrant presentation, in the hopes that others can help me test them. (I have precious little time for Kryptos these days.)
Motivation:
Step 9 (which was a Quagmire III using keywords “LUCID” and “MEMORY”) resulted in the following set of 48 characters:
“EBXAJRAAJPPBEHBLDVEPEXXDXOVFXPARFZPEFACAGPEDPRXN”.
As I described in earlier steps, this set of characters is distinctly structured, with the letters (A, E, P, and X) representing 25 out of 48 of the entries. (The top four peaks of such a histogram sum, on average, to about 16.1, with a standard deviation of 1.8, if the underlying distribution is random.) Along with nine nulls in the histogram, I have shown the odds against this occurring at random to be about 138,000:1. Furthermore, it’s not like only one or two of these peaks were truly dominant and that the others were included just because they lead to a convenient anagram. Indeed, these four peaks are equally dominant, with one exceeding the other three by only one count and each putting the rest of the alphabet to shame. The occurrence becomes even more curious when you consider that those four letters anagram to the word “APEX”. The potential significance is embodied in this quote:
• “The final part is obviously the, you know, the apex of the pyramid there.”
-Jim Sanborn, excerpt from interview in Wired Magazine
But there is a problem. One thing that troubles me about this result is the implied shortness of K4. It seems hard to believe that anything of value could be conveyed in such a brief snippet of plain text, especially if it is supposed to lead to a broader mystery, as indicated by Mr. Sanborn. And the following quote is even more problematic:
• “The final section that hasn’t been decoded yet is approximately one hundred characters. If I say it’s any more specific than that, then I’m giving you guys a clue, but I’m not gonna do that. So it’s approximately a hundred characters.”
-Jim Sanborn, excerpt from Hirshhorn Transcript, September 23, 2005
In the previous steps, I have gone to great lengths to ensure that my processes are compatible with clues and quotes provided by Sanborn and Scheidt. So it would hardly seem appropriate for me, at this point, to suddenly decide that this troublesome quote can be ignored. It cannot. And yet, the statistics I have computed above are simply too hard to ignore. Quite the dilemma.
So the question now is “what method could Sanborn have used to simultaneously compress about one hundred characters of plain text into forty-eight characters of cipher text, and how could that method also have resulted in such a strong signature of frequencies?” This seems to be a contradiction, which most people hate. It just so happens that I love contradictions, because they provide extra clues and/or constraints on the possible solutions, provided, of course, that you are confident in the contradictory data to begin with. I am quite confident in the results of my steps 1-9 on this site. I understand that some researchers aren’t, so they would be inclined to resolve my contradiction by simply dismissing my findings to date and saying, “look... no contradiction”. That is fine. For my own self, I will press on.
In a previous occupation, I was a member of a team of problem solvers. The subject of our study was an incredibly complex government information system that included massive parallel processing across 768 processors and 2304 custom ASIC boards, and including millions of lines of C++, FORTRAN 90, SQL, and microcode all integrated together. There was also a physical sensor involved in data collection, complete with some very complicated physics that supported the end-to-end system. At the time, the input to this information system was synthetic data generated by simulation, because the sensor that would eventually provide the data was not available yet. So potential problems could occur anywhere in the hardware or software of the information system, the input data simulator, or even in the quality assurance software that was used to test results. During the course of our bug hunting, it was not uncommon to find ourselves faced with facts that seemed to contradict one another. Whenever this happened, it drove my team nuts, because they feared that this meant that not all of the “facts” were indeed facts, and, paralyzed by uncertainty, they lost confidence in their ability to proceed. After a while, however, I came to love such scenarios, because it usually meant that we were about to solve the problem. It turned out that when the “facts” seemed to be in contradiction with one another there were often some unconscious assumptions that contributed to the contradiction. For instance, we may have assumed simple causality in a massively parallel system, not accounting for the presence of race conditions and/or non-thread-safe code. (This is a contrived example only, because the actual details of the system are too complex to go into here. So don’t be so harsh in nitpicking the example.) The team soon added a new tool to their arsenal: when the facts at hand are deemed strong on their own but lead to a contradiction when taken together, try to make a list of all of the unwritten assumptions that may be at play. Then question those assumptions one by one, removing any that is not solid. The solution usually came swiftly after that.
I’m going to employ that approach now. Since I have confidence in the findings of steps 1-9, I want to explore why this additional information (that there will ultimately be approximately double the number of characters in the K4 plain text) indicates a contradiction. Well, I’m fairly sure that I don’t have to convince many of you to be concerned about this, but it wouldn’t hurt to consider the reasons why. Suppose for the moment that we have selected a message containing 100 characters, and we wish to encode it. Most of the known substitution methods (e.g. poly-alphabetic, playfair, two square, and a host of others) result in a significantly decreased self-I.C. and generally won’t yield a sequence that is nearly as structured as the APEX sequence. Amongst the classical substitution ciphers, only one manages to result in high self-I.C, and that would be the mono-alphabetic substitution which leaves the histogram unscathed except for relocation of the peaks and nulls. (Think about the cryptograms in the Sunday paper, except without spaces.) Unfortunately mono-alphabetic substitution is one-in-one-out, so it would result in 100 characters of cipher text. Since the APEX sequence is much shorter, mono-alphabetic cannot be the solution (at least not all by itself). We would still be in need of a compression technique (100 characters down to 48) that doesn’t generally result in a somewhat randomized histogram... and nothing comes to mind. Thus the contradiction and the reason for concern.
So what is the way out of this dilemma? What unspoken assumption are we making here that drives us to the appearance of conflict? Look back up at the previous paragraph, and the answer is there. I used the word “generally” twice: (1) when describing the type of self-I.C. that emerges from a poly-alphabetic solution, and (2) when describing the type of histogram that emerges from a compression technique. Certainly it is possible that either of those two observations could fail in this circumstance. We basically got lucky, and the unlikely occurred. But the devil’s advocate answer to that suggestion comes to mind: as long as we are entertaining the notion that random chance could make these hypothetical methods provide an unexpectedly high self-I.C, then so could my steps 1-9. In other words, it’s just as likely that my steps 1-9 obtained a lucky circumstance as it is that a poly-alphabetic data compression scheme achieved the result. How can I possibly have confidence in my findings even while I assume a contrary principle in order to resolve the dilemma? I can’t. At least not in good conscience.
So what other assumption are we making here? This time, the assumption appears in the devil’s advocate answer. Who says that only random chance provided unexpected results from the encryption technique? What if either the method of encryption or the message itself was contrived by its creator to simultaneously reduce the character count and indicate the word “APEX” in the resulting histogram?
What? Some of you don’t like the direction I’m heading? It tramples on one of your premises, i.e. that encryption techniques are chosen independently from the message and therefore Sanborn wouldn’t have done that? You are assuming that Sanborn had something to say, chose the best words possible first, and then only later considered how to encrypt them? (Typically, that is the way that encryption systems are used. One should not be so constrained that they must tailor a message to match the encryption.) Well all I can say is “bingo”. I have found an assumption that is contributing to the dilemma, and yet nobody can say with any certainty that this assumption is valid. Harken back to “Step 0: Premises” where I discuss the dangers of worshipping a premise. As a matter of fact, from the following quotes it is apparent that the plain text messages of K1-K3 were indeed contrived for this puzzle that is Kryptos:
• “I will say that I have left instructions in the earlier text that refer to later text.”
- Jim Sanborn, excerpt from interview in Wired Magazine
• “When I asked about the misspellings and asked if they were accidental or deliberate, Sanborn said that they were deliberate, but it was less important *what* they were. He said, and I quote: "it's more the orientation of those letters that's useful there." Later on in the evening he repeated that point, saying it was the "positioning" that was important.”
- Elonka Dunin, post to Kryptos Group
• “Mr. Scheidt basically gave me an outline of historic and contemporary ... encoding systems that have been formally used by the agency and were still used by the agency and other people [in 1990]. He gave me a whole variety of possible systems to use and ways to modify all of those systems. But as a visual artist, I like to rely on systems that include visual as well as digital material that can be deciphered by machines. It's also well-known that I did use some matrix codes Ed gave me, and I have also designed visual systems for encoding, which are much harder for cryptographers to crack because they're individualistic.”
-Jim Sanborn, excerpt from interview in Wired Magazine
• “How do you deliver a message in such a way that conveys not only the encrypted data, but also the key?”
-Ed Scheidt
The last quote in particular suggests that Kryptos has managed to deliver the key(s) along with the encryption. This doesn’t generally happen per chance in classical encryption techniques, and it must be contrived to do so. Furthermore, in the next quote, Scheidt refers to Kryptos as a “puzzle” rather than a code:
• “There are many layers. A lot of thought went into this. There is more to the puzzle than what's been talked about.”
-Ed Scheidt (from Elonka Dunin’s 2003 road trip)
This is a puzzle, dear Reader, not a code. A lot of thought went into it, so we need to apply a lot of thought to get out of it. And there are many layers. Just based on this solitary quote from Ed Scheidt, I believe that anyone who is engaging in “trial and error” tactics, i.e. blindly trying various decryption techniques and keys without strong coupling to the clues, is completely wasting their time. (You may feel the same about my specific path, but I assert that my general mode of attack - performing thought experiments and following them through - will be necessary for solution.) Furthermore, if we indeed have some confidence in my steps 1-9, then we already have ample evidence of contrived methods and messages for the sake of this “puzzle”. So for the time being, I’m going to discard the assumption that the encryption is pure (i.e. independent from the message) and see what other avenues open up...
Note: I am about to present you with two ideas. My favorite of the two is the second idea, because the first idea has fewer clues that resonate with it. The first idea also offers less guidance on how to proceed. Nevertheless, it can’t be eliminated, so I present it here in case others find it promising and feel like checking it out.
Idea 1: Mono-alphabetic Substitution and Acronyms
As already observed, a mono-alphabetic substitution is a good candidate for a technique that might result in distinct frequencies such as those found in my APEX sequence. Unfortunately, it would also result in about 100 characters of cipher text. Also, one could point out that the APEX sequence may be a bit too regular for English, although with only 48 characters, variance in plain text is certainly sufficient to cause that. However, I had an idea that would explain both: the original message of about 100 characters might have been contrived to contain several phrases that were equivalent to common acronyms. The phrases could then be replaced by the associated acronyms and the encryption subsequently applied. A possibility that I entertained earlier was that these acronyms were Morse Code pro-signs or other common Morse acronyms. (Perhaps this is the reason for the previously unreconciled clues “RQ” and “SOS”.) This idea resonates well with the words “the information was gathered and transmitted underground (unsic)...”. The idea also resonates well with the ambiguity in the answer to the question regarding whether K-4 is plain English, posed numerous times to the creators. We have received contradictory responses from Sanborn and/or Scheidt on different occasions. Sanborn has also supposedly asserted that at some point K-4 is not quite plain English and that one more level of deciphering must be employed (i.e. such as Morse Acronyms-> English). The idea of “acronyms as compression tools” would have three effects:
-
(1)the 100 characters could be reduced to 48,
-
(2)the English frequencies could easily become even more “spiky” than usual, and
-
(3)the subsequent mono-alphabetic substitution step could be made extremely difficult.
So far, I have been unable to solve the APEX sequence using this assumption, although I’ve had limited time for the attempt. The main problem, even if the assumption is true, is that most techniques at my disposal, numerical or otherwise, rely on the plain text having relatively normal frequencies for monographs, digraphs, trigraphs, etc, of the English language, but those frequencies could be badly broken by the presence of acronyms (not to mention the fact that Sanborn may have contrived the ultimate message to contain many uncommon letters and avoid common ones). But you would still expect the resulting plaintext to have a high I.C.
Well, as I said in the note above, I have pulled away from this path in recent days. The bottom line is that my methods and limited time are insufficient to break through to another “Easter Egg” or the solution. I encourage other researchers, who may have more time and resources than I, to take up the hunt at this point, if they deem it promising. Meanwhile I have had another idea that seems to chime well with more clues, and it also suggests a method of attack that is testable, given enough time and/or people.
Idea 2: Book Code
Suppose that the remaining 48 characters are to be used with a specific book to indicate locations of words that spell out the message. For instance, the characters could be replaced by numeric equivalents (e.g. 1 through 26) and then taken pairwise to indicate a page (or paragraph) and a word location. This would yield 24 words, which with an average word length of 4 letters would result in a message of about the appropriate length. The trouble with this is threefold: (1) average word length in the English language is closer to 5 letters, and (2) if the characters are taken pairwise, then the total set of words to choose from is only 262 = 676, which is a fairly limited vocabulary for selection, and (3) in order to imbed the “APEX” easter egg, some bias would need to be applied towards words for which either letter in the character pair is on the set (A, P, E, X). These are significant constraints, but once again I must point out that it really might not have been that hard for Sanborn to come up with a satisfactory answer as long as he was willing to contrive a result that fits the criteria.
Another concept that could work is that the characters are to be used as triplets, for example to indicate a chapter, paragraph, and a word. This would yield 16 words, with an average word length of about 6 letters. The resulting vocabulary of 263 = 17576 (and associated increased number of words with A, P, E, or X as components) would make it significantly easier to contrive a suitable message. The downside here is that now the words would need to average 6 letters in length, rather than the typical 5.
This is all fine and well, but it doesn’t address the central question of a book code... that being which book is to be used? That’s where the clues and other observations come into play.
Pre-Step Observations:
Regardless of which idea from above you prefer to move forward with, or indeed if you have a thought of your own that could break the dilemma, I think that we must appeal to the remaining clues and/or easter eggs. Let’s begin with the clues.
In particular, let’s begin with the Morse Code snippets that have alternately provided confirmation and guidance for steps 1 through 9 above.
If a person were to stand in front of the CIA headquarters facing South and looking at the Morse Code symbols drilled into the copper at the South and North strata , (s)he would find in this order (reading from top to bottom, i.e. South to North, and left to right):
-
• DIGETALEEE
INTERPRETATU
-
• TISYOUR
POSITION
-
• EEVIRTUALLYE
EEEEEEINVISIBLE
-
• EESHADOWEE
FORCESEEEE
-
• LUCIDEEE
MEMORYE
-
• RQ
• SOS
In my earlier work, I was able to make associations, some weak and others strong, between these Morse Code snippets and the methods employed for some of the steps. Looking again at these snippets, it appears that the associations that were made occurred in this particular order. In other words, we have the following potential mapping:
DIGETALEEE INTERPRETATU: Step (4) Dig Tale Interpretation
TISYOUR POSITION: Step (6) (Wha)T is Your Position
VIRTUALLY INVISIBLE: Step (7) The Underground, because the top layer is “transparent”, or “virtually invisible”
SHADOW FORCES: ? Step (8) The Latitudes ?
LUCID MEMORY: Step (9) Keyword for Quagmire III
RQ: To Be Determined
SOS: To Be Determined
So assuming this mapping were intentional, we would surmise that we were expected to use the “palimpsest” keyword, the plaintext of K1, and the (locations of the) misspelled letters to find our way to the “doorway” and then begin to proceed through the Morse Code to help guide us on the “straight and narrow” path of the puzzle. Note that I have included question marks on my proposed mapping between “Shadow Forces” and Step (8). I proposed this mapping because that is the step that occurs at the appropriate place and because the transposition I discovered was found via brute force, something which makes many people not appreciate the step and also makes me suspect that I missed a clue. Note that each of the words “shadow” and “forces” are of the appropriate length to be keys for a transposition employing 6 columns, and that there are also diagonal lines cut into the copper that might suggest a diagonal transposition route similar to what I employed. But to date I have not spent a significant amount of time trying to reconcile these clues with my result. Note that if anyone could manage to find an elegant way to perform a transposition that uses “Shadow” and/or “Forces” as a key (or a clue for a key) and obtain similar results as my step 8, then that would really “seal the deal” on this process and we would create many converts to “APEX Theory”. I am already a believer, so I haven’t spared the time. If anyone takes on this task and is successful, I would be greatly in your debt.
At some point I became intrigued by the notions described above, and so I found myself confronted with the APEX sequence from step (9) and the next snippet of Morse Code: “RQ”.
In Morse Code, the word “RQ” stands for “the previous was an interrogative”. You know how sometimes, while speaking, you might phrase a sentence in such a way that it appears to be a statement, but listeners know it is actually a question because of your rising tonal inflection at the end? Well, you can’t transmit tonal inflection in Morse Code, so sometimes you need a way to say “by the way, what I just said was a question, not a statement”. That’s what “RQ” is for.
So given that the only thing of interest after step (9) is the potential clue word “APEX”, I decided to see what I could glean by making it a question. And a whole new door of possible clues was subsequently thrown open.
The beginning of the storm of new clues was initiated by the theme that was most obvious. Note the following:
• We have the word “APEX” formed by the histogram peaks
-
• Jim Sanborn said: “The final part is obviously the, you know, the apex of the pyramid there.” (Wired Magazine)
-
• K3 comes from a book about King Tut’s Tomb (in ancient Egypt)
-
• K3 and K4 are suggestively paired together, set off from the rest, on a subsequent Kryptos-related Sanborn piece known as “Antipodes”
Given these observations, I was inspired to consider whether we were indeed supposed to think about pyramids from ancient Egypt. So when I thought about the word “APEX”, with a question mark, and in the context of a pyramid, it was rather natural to consider the Great Pyramid. For those that don’t know, the Great Pyramid of Cheops is missing its apex. In fact, it has been missing its apex since the earliest written records, and so what actually became of it is quite the mystery. Some theories assert that it was never completed, through design or by circumstance. Other theories assert that it was removed by either humans, erosion, or other natural causes. Any way you look at it, the question “APEX?”, along with the other context given above, really seems to point toward the Great Pyramid.
So I began to research the Great Pyramid, acquiring electronic books online as well as classical texts via Amazon. The more I learned, the more clues came to light. Rather than bore you with a paragraph on each, I’m going to list the observations that I think are clues as opposed to coincidences:
-
•The main component of the Great Pyramid is limestone, and there is a chunk of limestone on the ground next to the Kryptos sculpture.
-
•Interior chambers of the Great Pyramid were fashioned from “polished red granite” (quoting the exact words used by Tompkins, author of “Secrets of the Great Pyramid”, 1978, page 15) and “polished red granite” was also used by Sanborn at the CIA (quoting a description, most likely provided by Sanborn, found at the CIA website).
-
•Two other Kryptos materials, petrified wood (from a forest East of Cairo) and copper were also found in the Great Pyramid.
-
•There is a “Khufu-looking-pyramid” structure in the strata, according to Monet’s Kryptos Observations. (Note that “Khufu” is another name for “Cheops”, the creator of the Great Pyramid.)
-
•Sanborn reportedly suggested his intention to have the “line” defined by the courtyard strata (with pond) and the Kryptos sculpture be parallel to the “line” defined by the external strata. (He was a bit disturbed to learn that they were not perfectly so.) What purpose are parallel lines unless to draw attention to their orientation? From aerial imagery, I have estimated that the line formed by the sculpture and courtyard strata forms about a 50.5 degree angle, plus or minus 2 degrees, with the axis of the CIA headquarter building and courtyard (i.e. the walkway into and through it). The Cheops pyramid faces are sloped at an angle of 51 and a half degrees (or so). The above angles are consistent to within Sanborn’s ability to establish proper placement and my ability to measure angles based on low-resolution (and not quite Nadir) imagery, especially given that choice of endpoints on the strata and/or sculpture is somewhat arbitrary.
To sum up this section, I believe that a strong candidate for the next step is a book code that utilizes a book whose topic is the Great Pyramid. It is my further conjecture that the final Morse Code snippet (“SOS”) is either a clue about which precise book to use, or a clue about how to use the APEX sequence with the book. (And if you like the original idea above, i.e. mono-alphabetic substitution with acronyms, then you might consider some way to use the words “Great Pyramid” as keys for the process. This would imply a great deal of contriving on Sanborn’s part to fashion a plain text that is replete with specific letters, but I’m not against that line of thought.)
Suggested Step Process:
I have identified a number of books that seem like potential candidates for this step. However, none of them resonate with “SOS” in any obvious way. I had hoped to find a book for which the title, author, or publisher, would match that acronym, but so far no such luck. I can say that the author of one book from the 1800’s (Charles Piazzi Smyth) was an astronomer, an avid sailor, and owned a sailboat named “Titania”. I figured that the relationship to “SOS” was probably too obscure (and wishful), but I downloaded digital copies of his books anyway and took a quick stab at going for the solution. No luck, of course. (These books are now out of print and available only in a few libraries. The online copies were scanned in by Google or Microsoft, etc.)
If you are interested in carrying on the effort, here are a few books that I found to be of interest:
Taylor, John. The Great Pyramid; Why Was it Built: & Who Built It?, Longman, Green, Longman, and Roberts, 1859 (London).
Smyth, Charles Piazzi. Our Inheritance in the Great Pyramid, A. Strahan, 1864 (London).
Smyth, Charles Piazzi. Life and Work at the Great Pyramid During the Months of January, February, March, and April, A.D. 1865; With a Discussion of the Facts Ascertained, Edmonston and Douglas, 1867 (Edinburgh).
Tompkins, Peter. Secrets of the Great Pyramid, Harper & Row, 1978 (New York).
Jones, Bernard E. Freemason’s Guide and Compendium, 1950.
Some of the above books have multiple editions, which may be pretty important for a book code, depending upon how it is implemented. Note that these are only a few of the possible book titles, so you could help out by expanding the list and/or trying to crack the code yourself.
And by the way, my APEX sequence actually has a couple of minor variations. Referencing back to step 8, a convention was chosen about where to begin the x-axis. The convention was important, because some letters appear twice along the abscissa, and that presented a dilemma when indexing into the array. Well, the bottom line is that there are two slight changes to the convention that would result in minor changes to the APEX sequence and yet still preserve the qualities that drew my attention to begin with (the histogram and the peaks at “A”, “P”, “E”, and “X”). Here are the possible variations, with the effected characters indicated using a red font:
APEX 1:
“EBXAJRAAJPPBEHBLDVEPEXXDXOVFXPARFZPEFACAGPEDPRXN”
APEX 2:
“EBLAJRAAJPPBEHBLDVEPEAXDXOVFXPARFZSEFACAGPEDPRXN”
APEX 3:
“EBLAJRAAJPPKEHBMDVEPEAXDXOVFXPARFZSEFACAGPEDPRXN”
Also, since it will probably be necessary to resolve the characters of these sequences into numbers, don’t forget to try either the standard English alphabet as well as the “Kryptos” ordering of the alphabet, as suggested on the sculpture.
Word of warning: when you try various ways of extracting words from a book, don’t be surprised to commonly find phrases that seem very close to having some sort of meaning. Historical books, such as the ones I indicated above, are typically thematic so you can expect to find words that seem to belong together even when pulling at random. I believe that we are looking for plain text with very clear meaning, so if you have to bend your mind a bit to make something work, it probably isn’t correct. Don’t get too caught up in an “almost” answer.
Additional (entertaining) notion: many authors have suggested a correspondence between the units of measure used in construction of the Great Pyramid and the ancient units of measure provided in the Bible in reference to Noah’s Ark. And if there was ever a situation in ancient times that cried out the need for an SOS, Noah’s plight in the Great Flood certainly was it. Just food for more thought. Is the book the Bible? Which version, which chapter, what method? (Ha... King James seems appropriate.)
Problem: In my hunt for information regarding the Great Pyramid, I have discovered that the topic is insufficiently constrained to allow for confidence in any book that is selected for decryption trials. The bottom line is that the mysteries surrounding the Great Pyramid have spawned much interest and influenced numerous societies (including secret societies) throughout the ages since its construction. As a result, I found myself making numerous possible connections and inferences, resulting in a large array of potential books to choose from. In the next step, I consider methods to reduce the scope of my investigation and to identify a few specific good candidates.