Everyone is at least faintly familiar with the normal terms of classification: UNCLASSIFIED, CONFIDENTIAL, SECRET, and TOP SECRET. They’re defined by executive order, the most recent one being from 2009, but the standards have been more or less the same for many years:

TOP SECRET information is information that, if released, would cause “exceptionally grave” damage to national security;
SECRET information would cause “grave damage” to national security;
CONFIDENTIAL information would merely “damage” national security; and,
UNCLASSIFIED information, you guessed it, wouldn’t damage national security if released. (Strictly, UNCLASSIFIED isn’t a classification, it’s not defined in the executive order. But something that isn’t classified is marked (U) so it’s a very pedantic distinction.)

But these classifications are so not the whole story. Understanding what really happened with Snowden and the NSA data requires looking a bit deeper.

Risk

There is a nicely elegant way to measure how critical a piece of information really is, in dollars. We simply define the risk of some bad things happening as the cost associated with that bad thing, technically called the hazard, and the probability of it happening.

Risk = Probability × Hazard

All the sensitivity categories and all the rules are based on trying to reduce the risk. When what you’re trying to evaluate is classified under this U.S. government system, though, those risk numbers can get pretty astronomical. “Extremely grave” damage? The 9/11 attack cost the U.S. economy something like a trillion dollars. Let’s work some examples:

If there’s one chance in a thousand of the bad thing happening, then the risk is a billion dollars.
If there’s one chance in a million, then the risk is still a million dollars.
If there’s one chance in a billion, the risk is still a thousand dollars.

And 9/11, as traumatic as it was, is relatively small compared to what might happen in the case of “extremely grave” damage. When you quantify the risk, it’s easy to see why these secrets are worth all the effort.

Rules of the Game

Dealing with classified information has a lot of rules associated with it, of course. (The Federation of American Scientists has a nice set of slides on the rules on the web here, and there’s another good page here.) There are standards for how it’s stored, how it’s transmitted, and how — and where — it can be used, all based on trying to protect sensitive information according to its level of sensitivity. Those rules are based around some basic assumptions: the fewer people who know the information the better; the better we understand who has had access to information, the more likely we are to be able to protect it; and at any moment, there is some individual responsible for any piece of classified information.

Because of these rules, managing sensitive information is difficult, and things that are difficult are hard. And expensive. So there are tradeoffs between the cost and difficulty of managing the information and the desire to protect it.

So what are these rules?

First of all, you need to try to make sure that the people you make responsible for sensitive information are trustworthy. So you do more and more extensive checks of the background of people who get that responsibility. More on that shortly.

Second, you reduce the number of people who have access to any particular piece of information. There is a lot of information classified TOP SECRET, and even more at lower sensitivity levels. I don’t think I’ve ever seen real numbers, but based on my experience I’d guess that there is ten times as much SECRET information as TOP SECRET, and ten times as much CONFIDENTIAL as there is SECRET.

But that doesn’t tell the whole story either, for several reasons. First, classification is “catching” — documents are classified on a paragraph by paragraph basis. If there’s one piece of TOP SECRET in a paragraph, that whole paragraph is classified TOP SECRET, marked by putting a (TS) at the beginning of the paragraph. If there’s a paragraph, or part of a paragraph, marked (TS) on a page, the whole page is marked TOP SECRET at the top. If there’s a page of TOP SECRET in a document, the whole thing is marked TOP SECRET.

Add to that, no one was ever fired for classifying something too highly. Oh, there are counter-pressures, the biggest one being that something that’s highly classified is what is known in the trade as “a pain in the ass” or PITA. But still, it’s better to err on the side of caution.

Of course, these two things mean that there’s a lot of material out there classified (TS) that isn’t particularly sensitive, but it requires a process, with forms and signatures and such, to reduce the classification of a document. (Which, just so you can sound knowledgeable for your friends, is called “downgrading” the document. Preparing a new document with the sensitive stuff removed or blocked out is known as “sanitizing” the document.)

The second major issue, though, is something known as the aggregation problem. Simply put, the problem is this: the more information you have, the more likely you are to be able to deduce something really sensitive from it.

If you’re a bad guy, a Black Hat, and you know that a particular person works for the Department of Defense, that’s not particularly interesting. There are a lot of people in the DC area who work for the Department of Defense. But if you find out that this same person works at Fort George Meade in Maryland, it becomes more interesting: basically, they’re either working at NSA, or the DoD side of the intelligence world the Defense Intelligence Agency, or they’re in the Army Band.

If you then find out they’re tone-deaf, you’ve got something.

Compartmented Information

Because of aggregations and some related issues that I’ll mercifully not explain, another big part of classification is something called compartmentalization; you break the information down into related groups by project, or a target of interest, or by source, or by some technical factor like how it was obtained. There are also compartments involved with how the information can be transmitted, up to and including the famous guy with the briefcase handcuffed to his wrist.

Often these compartments have names that are literally words drawn at random from a pool of possible words. These are called codewords.

When someone tells you about something classified “above TOP SECRET” they’re not quite telling the truth — there is no classification level above TOP SECRET — but what they mean is that it’s information that has to be handled by special channels, or it’s peculiarly sensitive, like information about codes and cryptography, or it’s under a restrictive codeword.

Access to Compartmented Information

I told you we’d come back to the problem of determining if you’re actually trustworthy; here we are. For a low-level clearance, like for CONFIDENTIAL or SECRET, all that’s done is a National Agency Check. Basically they look at public records to see if you have a criminal record, or something similar.

For codeword information though, you have a much more thorough examination: they not only look at any open records, but they check back on places you’ve lived, and literally send people out to talk to people you’ve known in order to confirm that everything you told them is true, to check if you ever had any questionable ties (like joining a Tea Party group I guess), and to build up a picture of your life in order to look for anything that makes you seem untrustworthy.

Survive that and you then get interrogated by a specially trained agent while on a polygraph.

This whole process is very expensive, probably now upwards of a million dollars, and people who have a full polygraph clearance are quite rare.

Edward Snowden, Remember Him?

This finally takes us back to Snowden, the heroic traitorous whistleblower spy. It’s a lot easier to understand what happened now that we understand just what the context of all this really is. In Snowden’s case, it’s not just the talk about the NSA leaks, but the question of how a mere contractor got cleared for this stuff and how he got access to whatever information he actually has.

So what do we know of his history? He enlisted in the Army, apparently intending to go for the Special Forces, but was discharged after breaking both legs. (It’s been widely reported he enlisted in the Special Forces, but it doesn’t work like that. Four months in, he was in his initial training for whatever military specialty he was looking for.) However, by the time he’s been in the Army that long, he had at least a CONFIDENTIAL and probably a SECRET clearance, those being about as exclusive as a thunderstorm.

This undoubtedly helped him get work as a security guard at the Center for Advanced Study of Language of the University of Maryland. NSA does quite a lot of research in teaching and acquiring languages, because the people they listen to stubbornly refuse to speak English, but this is a very open organization. Still, I imagine it was while he was working there that he got started in the process to get the extended clearance, which took me a year back in the ’70s. Then he was hired by CIA to “work in Computer Security.”

Now we get to one of the things that a lot of people have said: “how did a guy with no college degree and just a GED get hired to work in IT security?” But there’s a basic misunderstanding there: not everyone “working in IT security” is doing research. CIA has a bunch of computers, and they need a bunch of systems administrators.

Would CIA hire someone with a clearance and some computer skills, but no degree? You bet your ass they would. Especially for a sysadmin job. But it’s also the sysadmin job that explains what Snowden may have been able to get access to (and what he probably didn’t really have access to). To explain that we’re going to have to do a little more exposition, this time on computer security in this world.

Like everything in computer science, there’s some math involved, but I promise I won’t go into it much. Here’s the basic idea. When you examine all the rules for sensitivity levels and classifications, it turns out that the real classification of some chunk of data is composed of three parts:

(sensitivity, channel, codeword)

So one file on a computer might be marked “UNCLASSIFIED, carrier pigeon only, alpha” and another file might be “SECRET, regular channels, none.” All of these lists of three things, “triples,” exist in a relationship technically called dominates — if one triple is “more classified” that another triple, the first triple “dominates” the second.

There is an example of this in the NSA slides Snowden released; the Washington Post has published some of them. This one is marked “Top Secret, SI, NOFORN,” where SI is the channel.

These form a partial order which just means that we can’t always compare two elements and see which dominates the other. That Wikipedia article has a good example in street addresses: you know that 300 Pike Street is a higher address that 100 Pike Street, but you don’t necessarily know where 300 Pike Street is compared to 300 Walnut Street.

What’s important, though, is that in general, to have access to information marked (X,Y,Z) you have to be cleared for X, have access to special channel Y, and you have to be individually “read into” information for codeword Z. But it turns out to be convenient to define two special markings, system high and system low. System high is simply defined to be higher than everything else, it dominates anything; system low is dominated by anything. UNIX geeks will recognize system high as being just like root or “superuser” access.

In a real system implementing this scheme, what’s called a multilevel secure system, in theory you control the access to each marking, each compartment, separately. So you restrict the access privileges for even the system administrators so they can only deal with certain information.

This quickly runs into the PITA problem, though, as you get more markings and time goes on. The sysadmin for (A,B,C) goes on maternity leave, and someone has to take up the slack; a new compartment is added, so someone has to get access, which normally means being “read into” the particular program or codeword, which takes paperwork; the ambassador forgets his password, and doesn’t have time to go through the secure process so someone has to be able to set the password for his account for him. Bit by bit, as people get sloppy over time, it turns out that there’s usually someone who has system-high access, “root” access. (One of the few commercial systems that supports this completely is Oracle’s Solaris operating system. I’ve set up a number of systems with the full access control, which means there’s not even a root account; I don’t think that has lasted a day on any system I’ve seen.)

I’m willing to bet that Snowden ended up — possibly with some quiet pushing on his part — with access to a system-high account, from which he could see anything on the system.

SHOT 6/8/08 The ACLU annual membership conference in Washington, D.C.

Glenn Greenwald

Doubts

That said, though, I seriously doubt a lot of what he’s told Glenn Greenwald. Take for example the story he told about being able to tap any phone, even the president’s. The story has a bunch of issues, but let’s just talk about the technical points.

When we talk about “compartmented” information, it’s not just a metaphor. These sorts of data are worked with in literal compartments, physically separate rooms with thoroughly annoying safeguards to keep things from leaking from one to the next. (For example, they always play background music to make it harder to overhear conversations. Where I worked, they had two tapes: one was the Christmas music. The other one wasn’t.) These are called secure compartmented information facilities or SCIFs. So different compartments aren’t just marked differently — in many cases, they’re physically separated, with no connections, including network connections.

It turns out that what Snowden actually said was he could “wiretap anyone, even the president, if he had an email address,” which makes me wonder if he really meant “wiretap” at all; whatever else we might call it, collecting emails from someone isn’t wiretapping, and to listen in on a phone conversation the email isn’t much immediate help.

If we look at the slides, though, they themselves don’t have real intelligence contents — they’re slides about a program. The actual data would very likely be in another compartment, one we hope would be pretty strictly controlled. Add to that the point that we’ve never heard that the NSA was actually collecting the contents of phone calls — at least in a reliable way, as Rep Jerrold Nadler reported that once and then backed off, probably because he didn’t realize what a wiretap is either — and this looks very very unlikely.

So let’s think about Snowden himself for a minute. Here’s the biography paragraph from his Wikipedia entry:

By 1999, Snowden had moved with his family to Ellicott City, Maryland, where he studied computing at Anne Arundel Community College to gain the credits necessary to obtain a high school diploma, but he did not complete the coursework. Snowden’s father explained that his son missed several months of school owing to illness and, rather than return, took and passed the tests for his GED at a local community college. Snowden worked online toward a Master’s Degree at the University of Liverpool in 2011. Having worked at a US military base in Japan, Snowden reportedly had a deep interest in Japanese popular culture and studied the Japanese language. He also said he had a basic understanding of mandarin, was deeply interested in martial arts, and listed Buddhism as his religion.

There’s nothing very suspicious in this — and it’s creepily similar to what my bio would have read at 29, except that I never got a gig in Japan — but it does give us an image of a guy who, at 29, felt like he hadn’t quite measured up and was looking for a way to Prove Everyone Wrong.

Also creepily similar to me at 29.

I went to graduate school; it looks like Snowden got the idea in his head to be a public figure. Once he did, showing off the slides and other things, I suspect he’s doing a lot of self-aggrandizement, especially to Greenwald, who will apparently believe anything as long as it proves the U.S. is Evil Incarnate.

That’s not to say Snowden did no damage: these slides are literally uncovering sources and methods, and while I have my own qualms about this kind of generalized data collection, he’s given a lot of details to Black Hats (Black Turbans?) who don’t want their data collected , or who want to introduce disinformation into the collection.

When I was cleared, I got a lengthy lecture, and took an oath, saying that I wouldn’t reveal the information I found out and reminding me that the maximum punishment for leaking something TOP SECRET was death. (It is. Look it up.) I’m sure Snowden got the same lecture, and took the same oath, and he chose to break both the oath and the law. If we can catch him, he should be punished. And don’t buy his paranoia about being killed — if we didn’t kill Philip Agee, we’re not going to bother with Snowden. That’s just one of those fairy tales that get told, like the ones about an agent in an operating room, prepared to kill a surgical patient if he start to talk in his sleep.

Snowden’s now in a pickle, though — I think he did this primarily to feel important, and now that he is important, he has to keep the story going. Whether what he tells Greenwald is true or not. As the stories get more interesting, I think we should look at any further revelations very carefully.

******

images courtesy shutterstock / Aquir / hafakot