TECHNOLOGY · BITE · 2 MIN · INTERMEDIATE

The Apollo Computer Was Designed to Fail Loudly and Keep Going

When the 1202 alarm fired during the lunar descent, the software didn't crash. It dropped its lowest-priority task and finished the landing.

Three minutes from touchdown on July 20, 1969, the Apollo Guidance Computer in the lunar module flashed code 1202. A few seconds later, 1201. Both meant the same thing — "executive overflow" — the computer was being asked to do more work than it could schedule. Mission control in Houston had to decide, in seconds, whether to abort.

They didn't, because the computer wasn't actually failing. The rendezvous radar, left switched on as an abort precaution, was stealing a steady ~13% of the AGC's cycles with bogus interrupts. Buzz Aldrin had then asked the computer to display DELTAH, an altitude check, that added another 10% load. Past 100% the executive ran out of "core sets" and "VAC areas" — its workspace pools — and threw the alarm.

Margaret Hamilton's team at MIT had built the AGC software around priority scheduling. When the executive overflowed, it didn't halt; it discarded the lowest-priority pending tasks (the radar display being one) and reran the cycle from a clean state. Guidance and control kept running. The astronauts kept descending.

This was unusual for 1969. The dominant model for embedded software, where it existed at all, was a flat sequence of tasks: if one wedged, the system stopped. Hamilton had insisted that the AGC distinguish between essential and discardable work, and that the computer announce the overload rather than hide it. The 1202 alarm was the system saying, in effect, "I'm dropping things on purpose; don't trust the radar display, but the landing is fine."

#apollo-11#software-history#margaret-hamilton#real-time-systems#nasa

Sources

Wikipedia MIT News Apollo11Space

The Apollo Computer Was Designed to Fail Loudly and Keep Going

Make Recess yours.