Flight delays are being reported up and down the East Coast, courtesy of a glitch in the new air traffic control computer system. The system went down Saturday morning and initially affected airports in Washington, D.C., and New York. But the outage quickly spread to Baltimore and other points, causing the cancellation of more than 200 flights and leaving passengers scrambling on a busy weekend. The Washington Post describes the system affected:
ERAM stands for En Route Automation Modernization and is the computer “system that processes flight and surveillance data, provides communications and generates display data to air traffic controllers.”
ERAM is part of a larger infrastructure update called NextGen, with the rollout beginning about 18 months ago. It was designed to allow air-traffic controllers the ability to handle many more flights and to improve guidance of flights throughout the country.
NBC News reports that the computer system, years behind schedule and massively over budget, was installed in the last 18 months. Aviation Today reported in 2012 on the problems being experienced by ERAM that set back installation of the system by 3 years. The emphasis is mine. Tell me these guys didn’t help design the healthcare.gov website:
Noting Lockheed Martin’s “unique expertise” in the en route automation environment, FAA intended to award it the ERAM upgrade in 2002 as a sole source, 10-year contract. However, Raytheon protested, causing FAA to establish an inquiry that subsequently upheld the protest, with Raytheon becoming a Lockheed Martin team member when the $2.1 billion contract was finally awarded in 2003.
The inquiry report revealed, however, that neither FAA nor Lockheed Martin appeared sure how the upgrade was to be accomplished, other than it would be an “incremental decomposition,”a term that evaded clear definition.Three officials gave three different views, while FAA’s integrated product team lead for en route said she did not understand how it could be accomplished or how often one would have to go into the system at any or all of the ARTCCs to perform the incremental modifications. “There’s nothing practical about this,” the FAA integrated product team lead testified. “This is the most complex thing that the agency will ever undertake.”
Her ERAM product team lead also did not know how the “fairly complex technical task” of decomposing old software to get it to run on a new platform in a modern language might be performed. FAA did not conduct a risk assessment to examine the potential costs and benefits of incremental decomposition, nor did it know exactly what had to be incrementally decomposed, although it did acknowledge that “this will necessitate multiple transitions of deployed ERAM functionality.”
Yet, as the integrated product team lead explained, one must know what the software is in order to decompose it. Unfortunately, the Host system had been built in stages, with new functions and capabilities progressively added.
Consequently, it consisted of a set of separate hardware and software components physically interfaced together, but without a common design, infrastructure or software environment: a software “bowl of spaghetti” as both FAA’s integrated product team lead and the associate administrator for research and acquisitions described it to the inquiry.
I will agree with the project lead on this point: it’s an enormously complex system that’s bound to have a lot of bugs to work out. But don’t you get the feeling reading the above that the FAA made this more complex than it should have been? That it proceeded without a full understanding of what it wanted to accomplish? That they were riding by the seat of their pants so that errors were magnified?
The FAA’s chickens came home to roost today on the East Coast.