FREE BOOKS

Author's List

PREV. NEXT
|< 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 >> >|

rid itself of all its calls, drop everything temporarily, and re-boot its software from scratch. Starting over from scratch will generally rid the switch of any software problems that may have developed in the course of running the system. Bugs that arise will be simply wiped out by this process. It is a clever idea. This process of automatically re-booting from scratch is known as the "normal fault recovery routine." Since AT&T's software is in fact exceptionally stable, systems rarely have to go into "fault recovery" in the first place; but AT&T has always boasted of its "real world" reliability, and this tactic is a belt-and-suspenders routine. The 4ESS switch used its new software to monitor its fellow switches as they recovered from faults. As other switches came back on line after recovery, they would send their "OK" signals to the switch. The switch would make a little note to that effect in its "status map," recognizing that the fellow switch was back and ready to go, and should be sent some calls and put back to regular work. Unfortunately, while it was busy bookkeeping with the status map, the tiny flaw in the brand-new software came into play. The flaw caused the 4ESS switch to interact, subtly but drastically, with incoming telephone calls from human users. If--and only if--two incoming phone-calls happened to hit the switch within a hundredth of a second, then a small patch of data would be garbled by the flaw. But the switch had been programmed to monitor itself constantly for any possible damage to its data. When the switch perceived that its data had been somehow garbled, then it too would go down, for swift repairs to its software. It would signal its fellow switches not to send any more work. It would go into the fault-recovery mode for four to six seconds. And then the switch would be fine again, and would send out its "OK, ready for work" signal. However, the "OK, ready for work" signal was the VERY THING THAT HAD CAUSED THE SWITCH TO GO DOWN IN THE FIRST PLACE. And ALL the System 7 switches had the same flaw in their status-map software. As soon as they stopped to make the bookkeeping note that their fellow switch was "OK," then they too would become vulnerable to the slight chance that two phone-calls would hit them within a hundredth of a second. At approximately 2:25 P.M. EST on Monday, January 15, one of AT&T's 4ESS toll switching systems in New York City had an actual,

PREV. NEXT
|< 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 >> >|

Top keywords: