In the 1930s, “computer” was a job description: someone, usually a woman of mathematical bent, with an adding machine and a big sheet of columnar paper who performed a rigorous routine of hand calculations, using paper and pencil, slide rules and tables of logarithms. Stone knives and bearskins weren’t involved, but to modern eyes they might as well have been.

Large research organizations and the Department of War had a few special purpose mechanical computers intended to integrate differential equations. Vannevar Bush (who deserves his own article someday) brought a young grad student to MIT to work on the differential analyzer, a relatively advanced version of these. This video shows a version of the differential analyzer being applied to a problem for which it was utterly unsuited in Earth vs. the Flying Saucers:

This young man, a recent graduate of the University of Michigan, was named Claude Shannon, Jr. Shannon, while working on the differential analyzer, had the insight that these same computations could be done using combinations of a few simple circuits that performed basic logical operations on true and false values. He described how this could be done, and invented the whole concept of digital circuits, which derive from from Shannon’s thesis on what he called *switching theory*.

At about the same time, Alan Turing wrote his series of famous papers on computability; those papers included an idea of how a computer with memory might work, but without Shannon’s switching theory, no one knew how to actually *build* one. (Google did a great Google Doodle for Turning’s 100th birthday.)

Vannevar Bush then sent Shannon to the Cold Spring Harbor laboratory. Shannon worked with biologists and geneticists, and — remember this was before DNA had been discovered — described how genetics could be understood as an *algebra* using a small collection of symbols. This was enough to get Shannon a Ph.D. but attracted little attention at the time. However, his Ph.D. is now recognized as pre-figuring what we now call *bioinformatics*.

During the war, Shannon, still working for the War Department, was put to work on cryptography, where he merely invented a general mathematical basis of nearly all cryptography, and in the meantime proved that there is one and exactly one method of making an unbreakable cipher. This is called a *one-time pad*.

But this wasn’t enough. He went to work for Bell Labs, and began thinking about radio or telephone signaling. (His original switching theory was already the basic for new telephone switches — direct telephone dialing depended on Shannon’s Master’s.) What was common to all these different ways of signaling we already used: telegraph, telephone, radio, and that new-fangled thing television? Shannon had a surprising insight: what made a signal a signal was whether or not you could predict it.

To understand this, think about a game of 20 questions. You and an opponent are playing. Your opponent thinks of something, you ask the standard first question of “animal, vegetable, or mineral?”, and then you have to guess the opponent’s some “thing” with no more than 19 questions. The only other rules are that your opponent can’t lie, and the questions have to be yes or no questions. If you guess it correctly, you win; if you run out of questions, your opponent wins.

Surprisingly often, a skillful player can guess in considerably fewer than 20 questions, as each question reduces the collection of possible answers.

Now, here’s Shannon’s big insight — and if it doesn’t seem big now, just wait a minute: if you have fewer than about 1.6 million choices (really, 1,572,864) then you can *always* find the answer in 20 questions, or looking at it the other way, a game of 20 questions can distinguish about 1.6 million possible guesses. So getting a 20 questions game right on the first question is literally a million to one shot.

So, if you have two choices, say Republican or Democrat, then you can predict the answer after one question.

With three or four choices, say Ford, Mercedes Benz, Volkwagen, or Chrysler, you can be sure you have the answer after 2 questions. Eight choices means three questions.

So, if you have two choices, and you guess right the first time, you’re not very surprised. With eight, if you guess right the first time, you’re more surprised. With 1,572,864, if you get it first guess you’re *very* surprised.

Shannon’s first insight was that what we call “information” was basically a measure of the *size* of the surprise, and he could measure that with the number of yes or no questions you need to ask to distinguish among all the possibilities.

We call that count of yes/no questions, this measure of “the size of the surprise”, a *bit*.

Information theory shows up in communications, too. In communications, the idea is to think of the amount of information as how well you can predict what the next message will be. You can see this every day on the news: watching MSNBC is usually very predictable, but other channels are less so. By the way, when something is *completely* unpredictable, we call it “random”. A random number is like throwing a fair die: getting a 5 *shouldn’t* give you any information about what the next throw will be.

Mathematically, this is the logarithm to the base 2 of the number of different possibilities, but if that doesn’t mean anything to you (what do they *teach* kids in school these days?) don’t worry about it. What matters is that this one insight is the basis of what’s now called *information theory*, and as time has gone on, information theory shows up over and over again in describing the real world.

This one man, Claude Shannon, is directly responsible for computers, the internet, CDs, digital TV, really for digital *anything*. Although I haven’t gone into it here, information theory shows up in communications — Shannon’s information theory is directly responsible for the way cell phones and communications with space probes work — in biology, in finance, even in physics, where information theory is at the heart of much of what Stephen Hawking has been doing for the last 20 years. Nearly every bit of technology we use today that’s more complicated than a Phillips head screw is based on what Shannon did. And yet, most people have never heard of him.

Well, now you have.

******

*image courtesy shutterstock / ollyy*