Warning: We are moving into an aspect of my work that requires either a deep understanding of statistics or quite a bit of faith. The mathematical easter eggs here are even more compelling than the visual ones prior, but in my experience, if someone doesn’t fundamentally understand the significance of mathematical expressions, the natural tendency is to deem them as unimportant. To do that now would guarantee that you will not appreciate my final results. So either get some faith or get an expert. But don’t dismiss this.
Motivation:
According to K2, the “information” was supposedly transmitted underground to some unknown location. Later, when that location is mentioned again, “WW” is referenced as a person that knows the exact location, and then a message from him is delivered in the form of geodetic coordinates (i.e. lats and longs). Here is that passage:
• “...Who knows the exact location? Only WW. This was his last message. x. Thirty-eight degrees fifty seven minutes six point five second north seventy seven degrees eight minutes forty four seconds west xleyertwo...”
-Jim Sanborn, Kryptos Part Two
I postulated that this might be a clue regarding the nature of K4: instead of representing encrypted information, it describes the location of the information within the “original matrix”. Actually, I realize that such a construct would still, technically, be encrypted information, but I want to distinguish this technique as an entirely custom procedure, unique only to this problem. If this is truly the correct path for solving K4, then it is highly unlikely that any other solution path would result in success. Conversely, if any other distinctly different path works, then the path I am on is nothing more than a highly imaginative random character generator... with very cool Easter Eggs and mathematical oddities (as you will soon see). Some have used this individualistic nature of my proposed path as a “knock” against my approach, but I think that it resonates well with the following quote:
• “It's also well-known that I did use some matrix codes Ed gave me, and I have also designed visual systems for encoding, which are much harder for cryptographers to crack because they're individualistic.”
-Jim Sanborn, excerpt from interview in Wired Magazine
As a two dimensional structure, a matrix requires two pieces of information to indicate one piece of datum. Given the above observations, I eventually considered the notion that K4 might behave like geo-coordinates - a series of (x, y) pairs - into the “original matrix”, with which we could extract the final answer. The fact that K4 was of odd length was merely an obstacle to this premise, not a show-stopper. (Mr. Sanborn had already deleted characters in K2 and K3, so why not drop or add an odd number of them here as well?)
Truthfully, what I was hoping to find is that K4 alternated between characters that belonged on the X-axis (i.e. “longitudes”) of layer two and characters that belonged on the Y-axis (i.e. “latitudes”). With regard to the X-axis, this is not a hard condition to meet, because it contains at least one instance of each of the alphabetic characters. But the Y-axis contains only 19 of the 26. (See the boldfaced characters of the “Prime Meridian” of the “Underground” array, for example.) Therefore, if it just so happened that every other character of K4 was a “legitimate latitude”, i.e. it belongs to the set of 19 permissible characters, then one could use K4 to index by pairs into this array (although you will need to “disambiguate” any x coordinate that is represented more than once on the X-axis). The likelihood of that happening by pure chance is 1 in (19/26)^48, which is about 1 in 3.5 million. And sadly, it didn’t happen here, but that should come as no surprise. There may have already been a lot of steps along this path, but there has been no cryptography. Based on the earlier mentioned quotes from Mr. Sanborn, regarding use of Ed’s matrix systems and the number of cryptographic systems used in total (5-6) it seems like we are overdue for some left-brain work. Also, these steps are supposed to involve a little effort to see the next “door”... they can’t just fall in our laps for free. The relevant quote again:
• “There are lots of doors to go through to get to the meaning of the code. Every time you enter one doorway you might, in the distance, see another door. You go through that doorway and then you go through another doorway. It unfolds as it's deciphered.”
-Jim Sanborn (from November, 2005 CNN Jamie McIntyre video)
The concept that K4 was possibly a set of coordinates into the original matrix was still attractive, so I decided to explore a bit.
Pre-Step Observations:
The set of characters that are “legitimate latitudes” for the original matrix, assuming we use the boldface column that is the “Prime Meridian” for indexing, is
(B, E, F, H, J, K, L, M, N, O, P, Q, R, S, T, V, W, X, Z).
And the characters that are NOT permissible as y-coordinates, because they don’t exist on the Prime Meridian, are
(A, C, D, G, I, U, Y).
To help with observation, I created a spreadsheet formula that counts how many occurrences of a given letter appear in a given column of the original matrix. If you look at the downloadable Excel file, “PathofKryptos.xls”, there is a section entitled “The Latitudes”. Here, you will find that the 97 characters of K4 have been written out in two rows (in order to fit the page) and that beneath each character is a number that tells how many entries that character has on the Prime Meridian. (Actually, if you change the y-axis selector to a value other than 28, you can explore the prospects of indexing into the original matrix using ANY column.) As long as you have left the y-axis selector at the default value of 28, the result is as shown in the figure below.

Here, you should see something fascinating: there are two “runs” of characters that have an amazing number of ones in them (circled in blue) and there are likewise two runs of characters that have an overabundance of zeros (circled in red). By “amazing” and “overabundance”, I mean seemingly more than predicted by chance.
In the following discussion, I seek to show that the probability of these runs occurring by pure chance is exceptionally low. Looking at the first blue run, which begins with “FBB…” and ends with “…ZZW”, you can count 32 characters of which only one is not a legitimate latitude. The probability of this occurring by chance is

or about 1 in 1939. This probability is given by the Binomial Distribution, which can be used for computing the probability for observing any number of “good” or “bad” latitudes. A representative plot is given below. Although this is a discrete distribution, I use a continuous curve only because it is easier on the eyes.

Note, however, that even the most likely observation of N=24 has a probability of less than 0.16, so in order to not understate the odds of any given observation, I often compute the cumulative probability (i.e. the area under the curve) between the observation (N) and the nearest tail of the distribution. Then I multiply that value by two. The resulting “probability” is associated with the entire region of values whose median is the observation of interest (N) . In the next figure, I illustrate the concept using a normal distribution with mean 0, standard deviation 0.25, and an observation of 0.5. This distribution and the associated values were chosen only for easier visualization of the concept. The computed probability using this methodology is associated with the shaded region under the curve, and the observation (0.5) is “representative of” this range of values. This leads to two nice qualities: (1) the median observation of the entire distribution (in this case, 0) will yield a probability of occurrence of 1, rather than something much smaller, and (2) the results of this computation are conservative. That is, it yields results that suggest the observations are perhaps more likely than they truly are, so we aren’t so prone to attributing significance where there really is none. Conversely, if the probabilities obtained in this manner are still very small, then significance is virtually assured.

Using this methodology, but with the discrete Binomial Distribution rather than the continuous, a new probability for the first blue run in K4 can be obtained:

or about 1 in 894, which may not be that impressive all by itself, but there are three other anomalous runs to account for. (Also remember: this is most likely conservative.) Following the same methodology for the other three runs (from left to right in K4) we find that their probabilities are 0.0233, 0.236661, and 0.00048. If these sequences are viewed as independent, then the total probability of all of these occurring simultaneously is the product of the individual probabilities, which is about 0.000000003, or 1 in about 337 million. Unfortunately, I can’t claim independence of these runs, because I cherry-picked their boundaries. Note in the first figure from the top of this page, on the second row of K4, that if I had counted the characters “NFBN” as being part of the run to the left instead of the run to the right, both runs’ probabilities would have increased. A similar fortuitous choice occurred regarding the character “J” that exists at the boundary of the last two runs. So because of the freedom to choose what worked best for my theory, I have lost independence of the observations and I cannot simply multiply their individual results. However, there is another observation to be made: the positions of all of these “runs” within K4 are of peculiar nature. There seems to be a tendency for them to be grouped in sets of (roughly) 16 characters. I didn’t know how to calculate odds for that, but my sense was that this is a very significant observation, so I explored it.
Suppose for the moment that the plaintext of K4 was 48 characters long. In that case, Mr. Sanborn could have selected locations from the original matrix for each of these characters, which would have resulted in 48 coordinate pairs, or 96 cipher text letters. Then he could append any arbitrary character at the end (to cause confusion) and subsequently write out the sequence of (x,y) pairs in rows of 6 columns, resulting in an incomplete array that is 17 rows tall by 6 columns wide. (See the example transposition pattern in the figure below).

Each of the y coordinates must be contained in the set of “legitimate latitudes”, or else the indexing cannot be performed. The x coordinates, on the other hand, are free to be any of the 26 alphabet characters. The additional “confuser” character that sits alone on the 17th row also gets the symbol “x” because it can be any character and not harm the indexing. Extracting these characters by some other route would scramble the array so that the “y”s end up in new locations, imposing the “legitimate latitude” constraints in those positions. So the (x,y) pairs would be effectively separated, and the decryption becomes a greater challenge.
For example, suppose Mr. Sanborn extracted the characters by columns, using a numeric key such as (1, 2, 4, 3, 6, 5). Then the result would basically be a string of 17 x’s, followed by 16 y’s, then 16 y’s, then 16 x’s, then 16 y’s, and 16 x’s. (Recall that x’s can be any character, so it doesn’t matter how many ones and zeros there are in those subsets, but chance would suggest that the number of zeros in these sets should be in the vicinity of 3-6.) In the next figure, I have placed dividers at the locations implied by this type of transposition.

Notice that if it weren’t for the “Bad G” in the top row and the “Bad GD” in the bottom, K4 would have fit this notional transposition scheme perfectly.
So we see that there may be a reason that there are so many legitimate latitudes in 3 out of the 6 divisions of K4: they may belong on the Y-axis, so there must be long sequences of ones. But what about the other 3 sections, i.e. the ones that belong on the X-axis? Since the X-axis contains every possible character (and some of them twice) why should there be an overabundance of “bad latitudes” in any of these sections? For instance, the last section, which has 10 zeros, turns out to be even more unlikely than the third section, which consists entirely of ones. My postulate is that Mr. Sanborn is trying to make up for the lack of representation of “bad latitudes” in the y coordinates by biasing towards their selection when an x-coordinate is called for. (After all, if you are going to so much trouble to mask English frequencies, then the last thing you want to do is leave a detectable signature!)
Now that we have a candidate transposition that imposes its own boundaries, we can re-compute the odds for this arrangement without “cherry-picking” the runs, therefore independence of observations is a valid assumption. Performing the same type of calculation as above, but this time for each of the 6 sets of characters (grouped as in Figure 11) we obtain the probabilities 0.9987, 0.0912, 0.0132, 0.0827, 0.3067, and 0.0011. When multiplying these independent probabilities together, the result is 0.000000035, or about 1 chance in 28.4 million. And recall: that calculation is conservative. Something is going on here that is suspiciously similar to the notional transposition.
So, we have a candidate transposition that almost perfectly matches K4. But “close” is not good enough in a transposition methodology. The bottom line is that this notional transposition is not correct, but the odds that were computed are very strong evidence that we are near to the right path. If only we could figure out how to “tweak” this transposition scheme so that those bad characters aren’t bad anymore! What similar transposition scheme(s) would yield long runs of good and bad latitudes like these but allow for some anomalies?
Since there is only one unbroken run of 16 consecutive “legitimate latitudes”, it is clear that traditional columnar transposition could not have been used, regardless of which order the columns are extracted upon encryption. However, I considered that a variant of columnar transposition in which (x,y) pairs are entered by diagonals starting from one of the corners might suffice, since the repetitive nature of the columns would be upset near the top and bottom of the array. I also considered that the pairs could be entered in (y,x) format as opposed to (x,y) and that “shifts” of information could be employed. Under these hypotheses, with the aid of a computer, it turns out that one unique solution does exist. Actually, depending upon the amount of the shift, the diagonal direction and starting corner, the set of possible 6-column permutations, and whether the pairs are entered as (x,y) or (y,x), there are a number of different solutions but they are all equivalent with regard to the decrypted (x,y) pairs that result. Hence, my assertion that there is only one unique solution for the “6 column diagonal entry” paradigm. Given that the number of trial solutions numbered over 54 million (even though there are equivalence classes) this seems significant. I ran the same tests using other columns of the original matrix to indicate “legitimate latitudes” (i.e. NOT the prime meridian) as well as other randomly selected sets of 19 characters. No solutions were found. Bottom line: it isn’t easy to meet the requirements imposed by this hypothesis by pure chance. After having arrived at this point by following the clues and observing the Easter Eggs, do you really conceive that pure chance accomplished the feat this time?
Nevertheless, for completeness sake, I investigated alternative transpositions, such as keyed rail fence transpositions, double columnar transpositions, etc. I considered whether Mr. Sanborn added or deleted 1 or 3 characters and whether he did so before, after, or even during the transposition. I also considered whether he employed circular shifts before and/or after. The possible methodologies were endless, and with the help of a computer, I found many potential solutions, but they all had several fatal flaws: they were extremely arbitrary and complicated, did not adequately explain the “groups of 16” observation, and resulted in too many permutations to clearly “stay on path”.
By contrast, the lone “6 column diagonal” solution is simple, explains the “groups of 16” observation, has no variations/permutations, and also seems motivated by a relationship with the raised K3 characters “YAR”. The process follows.
Step Process:
After much study, it became apparent that the real obstacle in the set of K4 characters is the “Y” in position 65. If one deletes that character then a consecutive sequence of 14 legitimate latitudes is created. It also turns out that if one performs a “circular shift” of K4 two positions to the right (i.e. so that the “AR” at the end moves to the front and all other characters slide back two spots) then there is a simple way to transpose these characters into a legitimate set of (x,y) pairs. Recall that the odds of this occurring by chance are 1 in 3.5 million, so the fact that a relatively simple transposition (that is motivated by clues) achieves this quality is very compelling. (Try it yourself by hand with any other route… and good luck.) The new transposition process is depicted in the next figure and works as follows:
(1)Delete the “Y” and shift “AR” to the front
(2)Write the result into columns using the key (1, 2, 4, 3, 6, 5)
(3)Extract by diagonals according to the template provided

The resulting 96 character string is:
“AORZLOGZIBXKWFKBTSABRNJSTBUPCOJW
OVDTKFXTIWLLOTGTURGMKQDVHZUSIQUF
HJAQLPUQWPBKASIRSWUSNNGEEFDKKKCZ”,
in which I have colored all “legitimate latitudes” in green and the “illegitimate” ones in red. And, when formed into (x,y) pairs the result is:
(A, O), (R, Z), (L, O), (G, Z), (I, B), (X, K), (W, F), (K, B),
(T, S), (A, B), (R, N), (J, S), (T, B), (U, P), (C, O), (J, W),
(O, V), (D, T), (K, F), (X, T), (I, W), (L, L), (O, T), (G, T),
(U, R), (G, M), (K, Q), (D, V), (H, Z), (U, S), (I, Q), (U, F),
(H, J), (A, Q), (L, P), (U, Q), (W, P), (B, K), (A, S), (I, R),
(S, W), (U, S), (N, N), (G, E), (E, F), (D, K), (K, K), (C, Z)
Once again, there is no other transposition that meets the requirements, explains all of the observations, and is simple enough to remain on the “straight and narrow” path. (Not to mention the fortuitous association with “YAR”.)
Post-Step Observations:
Note that all red-colored characters (i.e. “bad latitudes”) only appear in x-coordinates, as required in order to properly index into the “original matrix”. Speaking of indexing, we still have to “disambiguate” the X-axis. In other words, since the characters (A, B, C, D, E, F, G, H, I, J, L) appear twice along the x-axis, we have to decide which occurrences to use. It turns out that if you adopt the convention to always use the second appearance of these letters, so that the “coordinates” used are the ones with the black border around them on “The Underground” array, then subsequent steps yield surprising results. Indexing into the “original matrix” using these conventions and the coordinate pairs above gives the following 48 character result:
“TMHTIPRKYAFDTQSOWSTXTKOCHWFXONRZVNFAVKOTUNTJIEOK”
This is the “information” that was apparently transmitted underground. Nope, it’s not plain English, but it’s not without its interesting qualities either. But that is the subject of the following step...
(Side note: with this over-determined quality of the proposed encryption method, i.e. the freedom to pick amongst multiple locations in the “original matrix” for any given letter, Mr. Sanborn could arbitrarily introduce meaningless periods, words like “DIG”, and/or repetitive digraphs. I think he did so, and we’ve been barking up those trees ever since.)