Belmont Club

By Richard Fernandez

Bio

Get Updates From Richard Fernandez
A Comment About

Dancing In the Dark

July 25, 2010 - 6:59 pm - by Richard Fernandez
Paul Milenkovic
2010-07-26 06:36:04

Claude Shannon’s great contribution was the concept of statistical entropy being a measure of information.

Suppose I want to transmit one of 256 possible codes. I can do this by transmitting one ASCII character, having log_2(256) = 8 bits. Suppose I want to transmit one of 512 codes — that takes one more bit or a total of 9 bits.

Suppose the codes I want to transmit are not all equally likely — “e” and “s” and “t” are transmitted all of the time, think of the letter-guessing strategy in Wheel of Fortune, and “@” is transmitted rarely if at all. In that case one weights the log_2 of the number of possibilities — the number of bits — by the probability of that character. Hence the average number of bits per symbol i is the information source entropy

H = sum p_i log_2(1/p_i)

What we are dealing with in JournoList, ClimateGate, and now WikiLeaks are sources with very low entropy H. There is very little information in these sources as they all appear to be mappings of the same underlying code book with very few distinct symbols in it.