FREE BOOKS

Author's List




PREV.   NEXT  
|<   38   39   40   41   42   43   44   45   46   47   48   49   50   51   52   53   54   55   56   57   58   59   60   61   62  
63   64   65   66   67   68   69   70   71   72   73   74   75   76   77   78   79   80   81   82   83   84   85   86   87   >>   >|  
rid itself of all its calls, drop everything temporarily, and re-boot its software from scratch. Starting over from scratch will generally rid the switch of any software problems that may have developed in the course of running the system. Bugs that arise will be simply wiped out by this process. It is a clever idea. This process of automatically re-booting from scratch is known as the "normal fault recovery routine." Since AT&T's software is in fact exceptionally stable, systems rarely have to go into "fault recovery" in the first place; but AT&T has always boasted of its "real world" reliability, and this tactic is a belt-and-suspenders routine. The 4ESS switch used its new software to monitor its fellow switches as they recovered from faults. As other switches came back on line after recovery, they would send their "OK" signals to the switch. The switch would make a little note to that effect in its "status map," recognizing that the fellow switch was back and ready to go, and should be sent some calls and put back to regular work. Unfortunately, while it was busy bookkeeping with the status map, the tiny flaw in the brand-new software came into play. The flaw caused the 4ESS switch to interact, subtly but drastically, with incoming telephone calls from human users. If--and only if--two incoming phone-calls happened to hit the switch within a hundredth of a second, then a small patch of data would be garbled by the flaw. But the switch had been programmed to monitor itself constantly for any possible damage to its data. When the switch perceived that its data had been somehow garbled, then it too would go down, for swift repairs to its software. It would signal its fellow switches not to send any more work. It would go into the fault-recovery mode for four to six seconds. And then the switch would be fine again, and would send out its "OK, ready for work" signal. However, the "OK, ready for work" signal was the VERY THING THAT HAD CAUSED THE SWITCH TO GO DOWN IN THE FIRST PLACE. And ALL the System 7 switches had the same flaw in their status-map software. As soon as they stopped to make the bookkeeping note that their fellow switch was "OK," then they too would become vulnerable to the slight chance that two phone-calls would hit them within a hundredth of a second. At approximately 2:25 P.M. EST on Monday, January 15, one of AT&T's 4ESS toll switching systems in New York City had an actual,
PREV.   NEXT  
|<   38   39   40   41   42   43   44   45   46   47   48   49   50   51   52   53   54   55   56   57   58   59   60   61   62  
63   64   65   66   67   68   69   70   71   72   73   74   75   76   77   78   79   80   81   82   83   84   85   86   87   >>   >|  



Top keywords:

switch

 

software

 
fellow
 

recovery

 

switches

 

signal

 

scratch

 
status
 
systems
 
monitor

bookkeeping

 

garbled

 

process

 
hundredth
 

routine

 

incoming

 

perceived

 

repairs

 

damage

 

constantly


programmed
 

approximately

 
slight
 

chance

 
Monday
 

January

 

actual

 

switching

 
vulnerable
 
CAUSED

SWITCH

 

However

 
stopped
 

System

 

seconds

 

recognizing

 

normal

 

booting

 

automatically

 

clever


exceptionally

 
boasted
 

stable

 

rarely

 

simply

 
Starting
 

temporarily

 

generally

 
problems
 

system