Perhaps Intrado Should Have Tested More
You could think of it as the "40 million call bug".
A newly-released study by the Federal Communications Commission explains how a software glitch caused a massive outage in 911 services last April 9th. At the center of the disruption was a system maintained by a Longmont Colorado-based company - Intrado. copyrightjoestrazzere
Each of the calls to the Intrado system had a counter assigned to it. Unfortunately, once the number of calls reached 40 million, the system ran out of counters and stopped accepting new calls.
Intrado's servers noticed that they had run out of counters. But the "out of counters" log entries were categorized as a "low priority" state, and thus no alerts were sent to any humans.
- When the 40 millionth call was received the 911 system stopped working
- Affected 81 call dispatch centers.
- More than 11 million people across seven states were affected
- Entire state of Washington was denied access to 911 services
- The outage continued for six hours
- More than 6,600 people tried, and failed, to reach help
- The out-of-counters condition was logged as a low-level incident because that was the default level
Perhaps Intrado thought that embedding a magic limit in the sofware was a good idea. Perhaps Intrado figured that the limit of 40 million counters would never be reached. Perhaps they figured that even when they ran out of counters, it wasn't very important. Perhaps all of the 6,600 people who couldn't reach 911 emergency services didn't really need help quickly.
Perhaps Intrado should have tested more.
This article originally appeared in my blog: All Things Quality
|My name is Joe Strazzere and I'm currently a Director of Quality Assurance.
I like to lead, to test, and occasionally to write about leading and testing.
Find me at http://AllThingsQuality.com/.