Delta Outage Spotlights Technology Risks


Delta’s computer outage on Jan. 29 was over by midnight, but its effects have extended into the week, not only resulting in 170 cancelations on Sunday, but grounding more than 100 flights on Monday and causing many delays. Adding to the frustration was the fact that the company’s mobile apps were also not working.

This latest incident follows another computer outage for Delta in mid-August, when flights were canceled for two days, leaving thousands of passengers stranded.

Such outages can be costly. A Southwest Airlines outage in July caused more than 2,000 flights to be canceled and cost about $54 million. The August Delta outage, which involved a fire, resulted in cancellation of 2,300 flights over three days and cost the airline $150 million in lost revenue, according to USA Today.

Jim Corridore, an analyst at CFRA Research, told USA Today on Monday that Delta’s computer outage puts a “spotlight on risks of airline technology infrastructure, much of which is old and patched with differing systems.” He said that airlines build new programming over old software, especially after a merger, when computer languages may differ. Programmers’ assumptions about how software will work are sometimes wrong.

While large companies such as Delta would have fewer outages with more testing of their systems, this is an expensive proposition.

According to USA Today:

Gil Hecht, CEO of Continuity Software, which tests computer systems for large banks and insurers, compared the construction of complex computer systems to a layer cake, with web servers, database software, storage and possibly interaction with other systems such as government computers that check whether passengers are allowed to fly.

“Testing should be done by every single layer and every single business service that participates in the critical infrastructure, and some of them are simply not under the airline’s control,” Hecht said.

He compared one way of testing to running a car into a tree to see whether the airbags work, which isn’t possible while keeping a computer system working. Instead, testing for a large financial institution or airline must confirm that each layer is configured to work well with all the others, he said.

“In order to do that, critical infrastructure operators must do much more testing, whether it’s manual by humans or by technology or by any means possible,” Hecht said. “Yes, it costs money. Quite a lot. But if more money and more effort will be driven into testing, we will have far less down time and data-loss events.”

Delta Limping Back to Normalcy

After two days of cancellations due to a system-wide outage, leaving thousands of customers stranded, Delta today announced it will return to normal operation by mid-to-late afternoon. It added a caveat, however, that “a chance of scattered thunderstorms expected in the eastern U.S. may have the potential to slow the recovery.”

Delta said that by late morning on Wednesday it had canceled 255 flights whileDelta 1,500 departed. About 800 flights were canceled on Tuesday and there were around 1,000 cancellations on Monday. It also extended its travel waiver and continued to provide hotel vouchers, of which more than 2,300 were issued Tuesday night in Atlanta alone.

“The technology systems that allow airport customer service agents to process check-ins, conduct boarding and dispatch aircraft are functioning normally with the bulk of delays and cancellations coming as a result of flight crews displaced or running up against their maximum allowed duty period following the outage,” Delta said.

The company’s chief operating officer, Gil West, said on Aug. 9:

Monday morning a critical power control module at our Technology Command Center malfunctioned, causing a surge to the transformer and a loss of power. The universal power was stabilized and power was restored quickly. But when this happened, critical systems and network equipment didn’t switch over to backups. Other systems did. And now we’re seeing instability in these systems. For example we’re seeing slowness in a system that airport customer service agents use to process check-ins, conduct boarding and dispatch aircraft. Delta agents today are using the original interface we designed for this system while we continue with our resetting efforts.

Reuters reported:

Like many large airlines, Delta uses its proprietary computer system for its bookings and operations, and the fact that other airlines appeared unaffected by the outage also pointed to the company’s equipment, said independent industry analyst Robert Mann.

Critical computer systems have backups and are tested to ensure high reliability, he said. It was not clear why those systems had not worked to prevent Delta’s problems, he said.

“That suggests a communications component or network component could have failed,” he said.

The airline has not yet detailed the financial impact of the event.