The π Code

Mike Keith
April 1999


 

  Martin Gardner's fictional "Doctor Matrix" used to say that,
properly interpreted, the number π (the ratio of the circumference
of a circle to its diameter, whose decimal expansion begins
3.14159265358979323846...) contains the entire history of mankind.
In this article I give some results of looking at π in a
relatively new way: as an infinite string of letters derived from
its expansion in base 26 or base 27.

  (Side note: Ivars Peterson's MathTrek column for April 2000 reports
on some of these findings in a tongue-in-cheek style very reminiscent
of the aforementioned Dr. Matrix.)

Base 26

  Base 26 is one of two fairly natural ways of representing numbers
as text using a 26-letter alphabet.  The number of interest is
expressed numerically in base 26, and then the 26 different base-26
digits are identified with letters as 0=A, 1=B, 2=C, ... 25=Z.
Here are the first 100 digits of pi expressed in this way:

D.DRSQLOLYRTRODNLHNQTGKUDQGTUIRXNEQBCKBSZIVQQVGDMELM
  UEXROIQIYALVUZVEBMIJPQQXLKPLRNCFWJPBYMGGOHJMMQISMS...

  Lo!  At the 6th digit we find a two-letter word (LO), and only a
few digits later we find the three-letter ROD embedded in the four-letter
TROD.  How many other English words can be found if we continue looking?

  First, a few π facts are in order.  The digits of π (in any base)
not only go on forever but behave statistically like a sequence of
uniform random numbers.  (Mathematically proving that this is the
case - the "π is normal conjecture" - is a deep unsolved problem,
but numerical analysis of several billion digits suggests that it is
true.)  Consequently, π in base 26 emulates the mythical army of
typing monkeys spewing out random letters.  Among other things, this
implies that any text, no matter how long, should eventually appear
in the base-26 digits of π!

  We can use the seemingly random nature of π's digits to estimate how
many words of various lengths we can expect to find in its first
million digits (letters).  For example, for 4-letter words: each
group of consecutive 4 letters in π is equally likely to be one of
the 264 possible combinations.  My dictionary has roughly 5600
4-letter words, so on average there should be a valid 4-letter word
about once every (264)/5600 = 81 digits.

  Here are the corresponding estimates (of how many digits we should
expect to scan before finding an N-letter word) for N=2 to 10:

N   #digits
--  -------
2   4
3   13
4   81
5   1000
6   14800
7   272000
8   5.7 Million
9   140 M
10  3900 M

  Dividing these numbers into 1,000,000 gives an estimate of how
many N-letter words should be expected in the first million base-26
digits.  For N=7 this gives 1000000/272000 = 3.67, and
indeed we found three 7-letter words: SUBPLOT at digit 115042,
CONJURE at 246556, and DEWFALL at 883265.  Counts for the other
lengths were also as expected.  No 8-letter or longer words were
found.

  The estimates above are for finding any N-letter word; of course, a
specific N-letter word should only occur on average once every 26N
digits.  We should expect to need about 2.5 x 1018 letters in order
to find the phrase TO BE OR NOT TO BE (without the spaces) once.
We can only get as far as TO BE in the first million.

  The very first N-letter word in base-26 π (for each N) is
notable; remarkably, those words from N=1 to N=8 almost make a
little poem:

  O, lo -
  Rod trod steel.
  (Oxygen subplot.)

  These words occur at digits 6, 5, 11, 10, 6570, 11582, and 115042.
The only possible contender for an earlier word that we found is
the OED word (marked "obs.") HELLY ["pertaining to hell"], which
occurs at digit 5458.  

  That the first 6-letter word is OXYGEN suggests that π is truly the
very stuff of life!  Here are all the 6-letter words we found, in
order of appearance (reading across the rows):

OXYGEN  SALIFY  MEDICS  PANNES  CLEDGY  VIRIAL  REVETE  PRINKY  LIBYAN  THINGY
AMPLER  UPSTEP  REBUTS  POLITY  TEENSY  HURROO  AVOWER  CORVES  EXARCH  FOGDOM
CUPHEA  BOGOTA  ADHAKA  SOPHIC  HAVANA  RISSOA  CLANGS  CHINOL  BAKUTU  UPTUBE
GRANNY  SNUDGE  DEIFIC  ALTERS  DESIRE  BEGGAR  URATIC  WORMER  MACANA  REFLEE
OPTICS  URNISM  OVIBOS  POTGUN  AMOUNT  DROVER  OCTOPI  BISLEY  ANCONE  MURING
SOZZLE  DEFIED  WARTED  WHILST  LIVERY  MINTER  AMBURY  ASARON  ORGIES  STRACK
GEOMYS  ZENITH  APONIA  RETUNE  TUNFUL  UNFULL  EMPERY  MUTATE  VOICER  KUBERA
ALFURO  DOOLIE  BALDIE  BUSHER  CAMPER  BULLAN  SCROFF  EXCEED  CHEERY  SKIERS

  We can also look for words that appear as consecutive letters but
running backwards.  The first backwards N-letter words we found are:

N  Word   Position of 1st letter
-- ----   ----------------------
1  O       6
2  OR      12
3  TRY     10
4  FILM    140
5  FILMY   140
6  FLOUTS  6254
7  ALPHORN 458071

and the distribution of these, as expected, is similar to the distribution
of the forward words.  For example, we found three backwards 7-letter
words, and no 8-letter ones, just like in the forward case.  The other
two 7-letter backwards words are FULLEST (at 408089) and HYLIDAE (at 695340).

  Before venturing into two dimensions, we mention one more
recreation involving the linear string of base-26 π digits, inspired
by noting the first appearance of the word ONE at position 10087.
Where, we ask, does the number N appear in words, for each N?  Here are
most of the answers up to N=10:

ZERO  389247
ONE    10087
TWO    13463
THREE   ---
FOUR   11324
FIVE   64838
SIX    14295
SEVEN 786958
EIGHT   ---
NINE  175372
TEN    15276

  No number words larger than TEN appear in the first million base-26
digits, nor does THREE or EIGHT appear (SEVEN appears exactly once.)
Instead of just looking for the first occurrence, we can note a
number word each time it appears.  Those which appear in the first
million digits, in order, are:

14261226622521221122121666116612192221261122122666
61162122616261266629221616122066662612201226622156
26226112266611261266222611121266116111666121121722
226522216626221220222611616662262666

(where we have written 1 for ONE, etc).  Since SIX only has three
letters it appears a lot, which means the Beast Number 666 also
appears frequently in the string above.
Hans Haverman extended the search to 5 million 
base-26 digits, and discovered the word THREE at position 1556763
and FIFTY at 2300987 (and also 4896456!). No other new number words
occur in that range.  He also found the first eight-letter word:
the Webster's-Third-Unabridged word ARMAGNAC, at position 3095146..


The Next Dimension

  We can provide another "degree of freedom" by arranging the
the base-26 digits of π in a two-dimensional array.  There are many
ways to do this (a spiral, a diagonal zigzag filling the
quarter-infinite plane, and so on), but for now we just employ one
method, which is to fill an infinite vertical strip S units wide, by
writing the first S digits in one horizontal row, then the next S
digits in the row below that, and so on.  We can then select any
portion of the array and look for words that occupy consecutive
letters and run in any of the eight possible directions (like a
word-search puzzle).  Perhaps some of the words will even interlock.
Perhaps the words will have something in common.  Perhaps we will
unlock the Pi code!

  Of course, this is the same thing that was done to "discover" the
infamous "Bible code".  Since we can choose the letter distance
between rows (S), this gives us many (in our case, a million or so)
different ways of looking at the letter string under study, so the
possibility of finding "interesting" arrangements of words is
considerably increased, compared to a one-dimensional search.

  For instance, here's a grid we found (with parameters as shown
at the bottom, where "Pos" denotes the digit position of the
upper left corner):

u r r n d a c i v r g c w n p e u b
w f p r z c v k m m p u d p w g l y
v y u q s v b m u y s n v m r k l i
z a k x u g v s e d h m p l l x l d
g n j d u v m x w e s y i e g q i z
f q o p q t k u s j s r z o k v v d
k e n z k a y j n d c f t s r n b r
s m i z d h p s i r d u f u w f o i
p u f r e c n l f f z f o q l h b j
c h n y e a h O M E G A r d r x k p
w a e a i p l d f l o a H p i u s q
o n i q e n z b r i n d t P v k z h
e q g p l s c r v a s g j s L j l v
p i e x s z t t z y v j k p u A l s
y g n q q e j l e l v k v w o o h m
D r q f n g x k b r p i v e x m f f
O y e q h z a x v b q r t p k r s c
G a k z d h s j j o q x f m b h e i
Pos = 148655   S = 14061

This contains the words ALPHA (shown in capitals going diagonally
starting in the lower right) connected with OMEGA (in the center,
horizontally), with GOD (lower left corner) nearby!  On the other
side of the coin, consider:

a c j w c t h v g r o f d k h c
l c o h i t n r c z t y a r k d
u l n j t j c w h w z z e k q p
i b d q u h k e h b d e e d w p
w f j j k c x u c z S n h c g a
c x m t c m m m i h A r l c q j
z w o x r w x z h m T r r e w q
g k t t k a y a c m A o k d v q
z j n a u a D E M O N a k i s n
v z s v y h d i f f b w x b a s
t a n k g h e h j e j h j y u u
j x v y i m d h u t q v g j x u
f y c d q z o u y l k d v j q t
p i j c m j w z u t w g m t t e
c c s b v n g a j h q c w x w j
a h y r D E A R j k x z r u c v
Pos = 255717   S = 13771

which has DEMON and SATAN interlocked, with DEAR on the bottom row.
In this case no diagonals are used, which is even more remarkable.
Many other words are present in both of these arrays; we merely noted
the ones that seemed to have a common theme.

  Words don't even have to be in straight lines, if that fits
our purpose.  Consider the S=2736 array in the vicinity of the
word CONJURE (one of the three 7-letter words in linear π):

e a e b y t n q t d 
v t h h p j H q a t 
o w f z z P O b b d 
o s c i b x C v p l 
h l C O N J U R E u 
b t s j g r S n z v 
w w g r e h j s b u 

Connected with CONJURE is HOCUS (going vertically) and POCUS
(in an L-shape) - two quite appropriate words.

  We can explore the arts as well as the sciences.  At pos=505070 with S=3999
the following array appears:

j g s o t w q q c c d r h k e k
g v a x y f l f j l k u a f e l
o z t d l g b l M p w i h s D M
j n z g c t j D M z s r q d c n
d o e q l j d o r m r w l u z g
i l u i z n s q x l s y g p a q
x g t z s o k i b z l v b l r i
t D M v s d D I G M y c p o i q
h p n l j u B E B O P d t h g m
j e t c r y q d w D f e n i l y
n n e u z i c v e A l j q b l n
u x p l z v j l d L i p f v i o
j y y t f c y b q g x p d h e p

  In the center we are exhorted to DIG MODAL BEBOP, a popular form of
jazz from the 1950's.  If we do we'll certainly feel GLAD (start at the
'g' below MODAL and read upwards).  One of the giants of modal jazz
was Miles Davis, whose initials appear no less than four times in
the grid.

  Some grids are rich enough to contain entire sentences or poems.  
This array is at pos=554766 with S=1058

T L Y P T S W W I B B B D M O T
N B T S U N I L L S X A Q H F J
U L K R G X K F C D K R O U Q U
Y C K D Y Z K U A M A H S I T Z
H N Y O E M C H D F E E M K P C
V L X I T Y B M Q P M R R Z B R
V S B A C Z W U B P D O Z M S S
Z R D Z B E J I V Z C N P Q S H
Q P I N M M W A T E R X P H W Y
V D R P X I T V V F T X Z L N G
O R G D A V P X F S T U V N V X
V O I C D L N V J J H C K I T L
I F E S H W W S F U C A U Z G C
V Y B L I M H I C T Q A B C M I
P M G O K J J G R Q B O U Z W K
E R Y K Z O I K V G W H P G V L

and is fruitful enough that we can write a complete 5-7-5 haiku
using only words found in the grid:

    Sun, elk in water;
  Oho! For her I'll try to
    Be a hero yet.

Another interesting type of grid is illustrated by this one:

f s z u y x h t p p
d n u e a q o p c i
u e q o m a x x g v
a b w D A W N r a w
i p e M E A L n z m
m f y L E R P r g v
c t c A R I L g j a
l e i L I C K q t c
Pos = 65340   S = 103986

  Note the five four-letter horizontal words grouped into
a 4x5 rectangle.  This is the largest such rectangle we found for
any values of Pos and S less than a million, and it's even more
remarkable because the five words have a similar theme (since ARIL
is a seed covering and LERP is an edible insect deposit on a plant).
Thus we could say, "For my meal at dawn, I will lick lerp from an aril."

  The next step is to look for an NxN word square with words both
horizontally and vertically.  By choosing (Pos,S) it is easy to find
3x3 word squares, so we attempted the more difficult feat of
finding a 4x4 square.  Alas, the best we found is the following
near-miss that has 7 of the required 8 words:

o h e h r a l w p o
r z p W I S T k i x
x d r O V E R d m g
f t q R A T E f q y
h z i D U S T y c a
Pos = 173387  S = 199449

  This square contains WIST, OVER, RATE, and DUST horizontally, plus
WORD, SETS, and TRET ("an allowance made for damage in goods
during transit" [OED]) vertically.  A perfect 4x4 square does not appear to
exist in the first million letters of π (regardless of the value of S),
but since it all depends on the completeness of one's dictionary it is
hard to be sure.

  The 6x6 square below is, perhaps, an indication that it's time to
stop this discussion and move on to something else:

M I K E u r
z K i n h l
u E q c b m
j I u p r f
s T b y j f
m H h b o h
x k c i s k
Pos=278978  S=18909

After all, it's obvious to whom the square speaks, and it clearly
spells out the message "U R" (see upper right corner) "SICK" (bottom
row, backwards).


Base 27

  Another way to look at the digits of π is to express it in base 27,
with the extra digit assigned to a space, so that we get a series
of words (strings of letters surrounded by spaces), not just letters.
In the May 1993 issue of "Word Ways", Lee Sallows suggested that the
most natural assignment is 0=space, so that all the letters are
assigned non-zero values.  (Otherwise, one of the letters (say, A)
will have to have the value zero, which leads to word pairs like
WAKE and AWAKE, which have the same numerical value, even though
it seems more natural for them not to.)  Given 0=space, the most
obvious scheme for the letters is A=1, B=2, and so on.

  The beginning of π in this system is:

c.cvezcvbmlyzxmswprpiijzhweemupdrxou jhcfmobyhsijlpjsca 
  zgxlhqunzwkhdfphtstzoprsnu nhawsjlquvbnqpvzqlwwliytpdauuddkzfgmpcu 
  fnwsavktwroffceijqrhtlvuqlqnox mjrjmq sqmqscvymhqwjrzkwqdathn 
  fmwfr fzugxgdjsqpk ckjirtxtiq c 

where we have divided the lines at word boundaries (i.e., there is
a space at the end of each line).

  It is harder to construct an N-letter word in base 27 than in base 26,
because we have to find an (N+2)-letter string consisting of the
word with a space preceding and following it.  If there are W(N)
N-letter words in our dictionary, then in D base-27 digits of pi
we should expect to find W(N)·D/27N+2 N-letter words.  For
D=1000000 this works out to be (for N=1, 2, ...) approximately

  152, 291, 94, 14, 1, 0.07, ...

whereas the actual number of words we found was

  137, 244, 83, 10, 0, ...

The 10 four-letter words that appear in the first million letters
of base-27 pi are:

Pos    Word
---    ----
27074  AWRY
168376 FULL
186597 WAAR  alt. form for WARE [OED]
263682 BUSS
318822 PUPA
554259 BALE
575129 CHIC
695434 KAYO
822868 KISH  "a wicker basket" [OED]
943143 RUSE

  The first 1-letter word, O, is at digit 6456.  The first two-letter
word, the Greek letter NU, appears at digit 10351, followed not much
later by US at 10868.

  The first 3-letter word is a bit of a poser, because we find a
great number of obscure specimens before finally hitting on a common
one (WHO) at digit 115288.  The earlier possibilities include LIV
(short for Olivia) at 29998, DUP (OED: contraction for "do up") at
41107, AAM (a liquid measure), YAD (OED: obsolete past tense of
"go"), DAR (OED: obsolete form of "dare") at 85782, and GES (OED:
obsolete form of "guess") at 95679.  Which of these should be
considered the first three-letter word in π is left to the reader
to decide.

  The first 4-letter word is, as shown above, AWRY.  Five-letter
or longer words do not, apparently, show up until after the
first million digits.  The word PI occurs twice, at positions
212659 and 979046.

  The first backward words of each length are O (6456), TO (696),
PUD (41107 - the reversal of DUP mentioned above), and VETO (10354).


Pi as Cipher Text

  Another interesting way of looking at base-27 π is to consider it
as a a text encoded with a substitution cipher.  As with the
two-dimensional approach to base-26 π, this way of looking at the
digits allows us to find a lot more syntactically-correct English
texts.  It might seem that this would produce many long strings
of words (after all, there are 26! ways of assigning letters in
a substitution cipher), but as we add more words the letter-pattern
constraints they induce rapidly curtail the number of possible decodings.

  Here are a few two and three-word ciphers, with plausible English
translations:

57029:  rfsyrcllx eugtyocv = fearfully misproud
(How we should feel as we contemplate the mysteries of pi?)

155865: dlahfwi dswzavznr = Wolfram weanlings
(An indication that π was invented by Stephen Wolfram?  Or maybe
several exceptionally young members of his company?)

76615: iaig fdbizsrqz lvfrixma = eyes numerator hintedly
(What the math student does on seeing the fraction 355/113.)

592835: eupplcycw ch = snookered 'em
(What pi did to everyone who tried to plumb its mysteries.)

  Some of pi's short ciphertexts only have one solution using
Unabridged Mirriam-Webster words.  Two such are:

edemymksb u rqoqhibut = Anacyclus, I redeposit.
(The customer addresses the bank teller's plant.)

vtm rrpgegtmt = Psi oogenesis.
(The psychic farmer says he can increase egg production via brain waves.)

  The longest solvable ciphertext we found has five words, but none
of its solutions are grammatically interesting.  The longest single
word with a valid English counterpart is

814790: wpbjngstikmnuydo = VENTRICULOGRAPHY

and this is the only 16-letter specimen we found.  The 15-letter
ones we discovered are:

lmrqvdbzyoianjp  
bgcnmqpjruylkhx  
kiaptmhgzfyvxwo  
gdqaborwyjxkput  
ezlfybqgpkasoiu
sypudkxgfmictob = DERMATOGLYPHICS 

qmerjilgmaudxuv
sgvyhqpngwcdtcb = POLYDAEMONISTIC 

bvtxuhzpoyperwg
vyzmqngwhuwjkde = SULPHOCARBAMIDE 

vmbhpewspojgwdt = AMPHIBOLIFEROUS 
mqnykhpqwoajkut = HYPERGLYCOSURIA 
ksinnpdchrifaeg = UNAPPROXIMATELY 
aciawzysqhaxmni = ENCEPHALOMETRIC 
vlvqejfjclsumia = MEMBRANACEOUSLY 
jqhjtpubsrxkagv = SWASHBUCKLERING 
bwnyikdimofctdj = INTERPAROXYSMAL 
fobmxdpzsaukujb = PHENYLCARBIMIDE 
ojsxfwrzbtknvus = PREDISCOUNTABLE 

  The reason DERMATOGLYPHICS appears six times is because the
cipher words associated with it are 15-letter isograms; it is well known
that DERMATOGLYPHICS is the longest unabridged Mirriam-Webster isogram.
Two other letter patterns appear twice, as shown in the list above.

  In this article we have just scratched the surface in exploring
the digits of π as text.  Many challenges remain, including extending
the search past 1,000,000 letters, searching for text in other languages,
and using non-Roman alphabets.
Post Script

  Though this does not seem to be a useful way of looking at all the digits
of π, we mustn't fail to note one last logological property.  Write π as
usual in decimal, and group the digits as follows:

  3. 14 15 9 26 5...

and then make the obvious substitution A=1, B=2, etc.  You get C.NOIZE, which
is rather fitting, because the random nature of π's digits means that when
you look at it you SEE NOISE!