The Therac-25 killed three people because of a race condition
An experienced typist could overflow a one-byte counter and deliver 100 times the prescribed radiation dose.
At 8:30 a.m. on March 21, 1986, Ray Cox lay on the treatment table at the East Texas Cancer Center in Tyler. He was there for a tumor on his shoulder. The Therac-25 medical linear accelerator was supposed to deliver a routine 180-rad electron beam. Cox felt "a tremendous force, like an electrical shock." A second jolt followed when the operator tried again. He walked out of the room with a burning back, and died five months later of radiation poisoning. The cause was a software race condition.
The Therac-25 was built by Atomic Energy of Canada Limited (AECL) starting in 1982. It had two operating modes: a low-power electron beam aimed directly at the patient, and a high-power x-ray beam where a tungsten target absorbed most of the dose. Mode selection was the software's responsibility for the first time. The earlier Therac-6 and Therac-20 had hardware interlocks that physically blocked the high-power beam unless the tungsten target was in place. AECL removed them in the Therac-25. The software, ported from the Therac-20 without independent review, was now the sole safeguard.
The bug worked like this. If an operator selected X-ray mode, then within about eight seconds pressed up-arrow and switched to electron mode, the software registered "set up complete" before the magnetic field had finished moving the target out of the beam path. A counter variable named Class3 was incremented every machine cycle. When it overflowed an 8-bit boundary back to zero, a downstream safety check thought no fault had occurred. The patient received an unattenuated electron beam at full x-ray intensity — about 100 times the prescribed dose.
What made the bug nearly invisible was that the same defect existed in the Therac-20. There it had been masked by a microswitch backstop: when the race fired, a fuse blew and the machine halted. AECL had logged the blown fuses for years without ever investigating their cause. When they removed the hardware interlocks for the Therac-25, they removed the only thing that had been hiding the bug.
AECL's response to the early reports was denial. After Katherine Yarbrough was injured at Kennestone Regional Oncology Center in Marietta, Georgia on June 3, 1985, the company told her physicians "no malfunction has been identified." They were not lying. Their internal tests could not reproduce the bug because their test typists were not fast enough. The fault required an experienced technician with muscle memory on the keyboard.
Nancy Leveson and Clark Turner published the canonical post-mortem in IEEE Computer in July 1993. Their finding was not just "race condition." AECL had not unit-tested the software, had no independent safety review, dismissed operator reports as user error, and treated the absence of bug reports as evidence of correctness. "The basic mistakes here involved poor software-engineering practices and building a machine that relies on the software for safe operation," they wrote. The whole field of safety-critical software engineering grew out of that paper.
Three confirmed deaths came from the Therac-25 — Ray Cox in Tyler, Verdon Kidd at the same Tyler clinic the following month, and one patient at Yakima Valley Memorial in late 1985. Several more were grievously injured. It is why your modern medical accelerator has hardware interlocks again. It is also why the FDA's 1996 software-validation guidance requires independent source-code review for life-critical devices, and why every safety-engineering syllabus opens with a one-byte counter named Class3.
The other lasting lesson is operational. After the Tyler incidents, the FDA, AECL, and the user community spent months in correspondence trying to characterize the bug. The most damning finding was not the race condition itself but the institutional inability to recognize one. AECL's engineers, asked again and again whether the machine could deliver an unattenuated beam in electron mode, kept answering no — because their fault tree said no. Their fault tree did not include software. The post-Therac convention that software faults must be enumerated alongside mechanical and electrical faults is now standard, IEC 62304 and ISO 14971 are built around it, and it took three deaths to get there.
The Therac-25 was decommissioned shortly after the Tyler incidents and AECL stopped making medical accelerators in the early 1990s. The source code, never independently reviewed, was never publicly released. What survives is the Leveson-Turner paper and a one-line cautionary tale about reusing software you have not re-verified for the system you are about to ship.
Make Recess yours.
Sign in to save the ones you loved, never see the same thing twice, and tell us what you want more of.