code2

Crypto Analysis to Crack Vigenere Ciphers

(This post assumes some familiarity with both Vigenere and Ceasar Shift Ciphers.  You can do some background reading on them here first).

We can crack a Vigenere Cipher using mathematical analysis.  Vigenere Ciphers are more difficult to crack than Caesar Shifts, however they are still susceptible to mathematical techniques.  As an example, say we receive the code:

VVLWKGDRGLDQRZHSHVRAVVHZKUHRGFHGKDKITKRVMG

If we know it is a Vigenere Cipher encoded with the word CODE then we can create the following decoding table.

VIGENERE4

Here we have 4 alphabets, each starting with the letters of the code word.  To decode we cycle through the alphabets.  The first code letter is V so we find this in the C row and then look at the letter at the top of the column – this is T.  This is our first letter.  Next the second code letter is also V, but this time we find it the O row.  The column letter corresponding to this is H.  We continue this method which gives the decoded sentence:

THIS IS AN EXAMPLE OF HOW THE VIGENERE CIPHER WORKS

How do we know what cipher to use? 

In any kind of crypto-analyis we need to decide which technique has been used.  Say for example we receive the message:

GZEFWCEWTPGDRASPGNGSIAWDVFTUASZWSFSGRQOHEUFLAQVTUWFV
JSGHRVEEAMMOWRGGTUWSRUOAVSDMAEWNHEBRJTBURNUKGZIFOHR
FYBMHNNEQGNRLHNLCYACXTEYGWNFDRFTRJTUWNHEBRJ

In real code breaking we won’t have a message alongside it saying, “Use a Vigenere Cipher.”  A large part of the skill of code breaking is deciding which encoding technique has been used.  For our received message we have the frequency:

VINEGERE7

So, in this case is it best to do look for a Caesar Shift or a Vigenere Cipher?  To find this out, we could do with finding out how “smooth” the bar chart is and how it compares with the expected frequencies.  The expected values in English are:

vigenere3

A Caesar Shift simply shifts every letter in the message by a given number of letters in the alphabet, so we would expect a frequency barchart for a Caesar Shift to have the same peaks and troughs (just shifted along).  The Vigenere makes frequency analysis more difficult because it “smooths out” the frequencies – this means that the bar chart for the frequency will be less spiky and more uniform.

Incidence of Coincidence

A mathematical method to check how smooth the bar chart is, is to use the Incidence of Coincidence – this method is outlined in this post on Practical Cryptography, and uses this formula:

VIGENERE5

There is also a script on the site to work out the I.C for us.  If we enter our received code we get an I.C of 0.045.  We would expect an I.C of around 0.067 for a regular distribution of English letters (which we would find in a Caesar Shift for example).  Therefore this I.C value is a clue that we have a Vigenere Cipher rather than a Caesar shift.

Exploiting the cyclic nature of the Vigenere Cipher

So, we suspect it is a Vigenere Cipher, next we want to find out what the code word that was used to generate the code table is.  To do this we can look at the received code for repeating groups of letters.   There is a cyclic nature to the Vigenere Cipher, so there will also be a cyclic nature to the encoded message.

Using the site Crypto Corner we can analyse the text for repeating patterns of letters.  This gives us:

VINEGER8

This clearly indicates that there are a lot of letters repeating with period of 3.  Therefore it is a good guess that the keyword is also length 3.

So, next we can split the received message into 3 separate messages:

GFEPRPGAVUZFRHFQUVGVAOGURADEHRBNGFRBNQRNYXYNRRUHR
ZWWGAGSWFAWSQELVWJHEMWGWUVMWEJUUZOFMNGLLATGFFJWEJ
ECTDSNIDTSSGOUATFSREMRTSOSANBTRKIHYHENHCCEWDTTNB

Here we have simply generated the first line by taking the first, fourth, seventh, tenth etc. letters.

Cracking the code

Now we can do three separate Cesar Shift tests on these separate lines:

The first line has frequency:

vigg1

which strongly suggests that R in the cipher text is going to E.  This gives us the following Caesar Shift:

vigenere10

The second line has the following frequency:

VINEGER11

Which strongly suggests that W in the cipher text is going to E.  This gives us:

vigenere11

Lastly we notice that this will give us the codeword NS_.  Well NSA, (the American digital spy agency) would be a good guess so for the third Caesar Shift we try:

vigg2

Putting these together we have the Vigenere Cipher:

vigg3

and this decodes our received code as:

THE SECRET CODE IS CONTAINED IN THIS MESSAGE.  YOU MUST ADD THE FIRST PRIME NUMBER TO THE SECOND SQUARE NUMBER TO CRACK THIS. WHEN YOU HAVE DONE THAT CLICK BELOW AND ENTER THE NUMBER.

We have done it!  We have cracked the Vigenere Cipher using a mixture of statistics, logic and intuition.  The method may seem long, but this was a cipher that was thought to be unbreakable – and indeed took nearly 300 years to crack.  Today, using statistical algorithms it can be cracked in seconds.  Codes have moved on from the Vigenere Cipher – but maths remains at the heart of both making and breaking them.

If you enjoyed this post you might also like:

The Maths Code Challenge – three levels of codes to attempt, each one providing a password to access the next code in the series.  Can you make it onto the leaderboard?

RSA public key encryption – the code that secures the internet.