How UTF-8 Was Designed on a Diner Placemat
Ken Thompson and Rob Pike sketched UTF-8 on a placemat in a New Jersey diner. It now carries 98% of the web.
On the evening of September 2, 1992, Ken Thompson and Rob Pike sat in a New Jersey diner with a problem. Plan 9, their operating system at Bell Labs, had to deal with a new proposed text standard — Unicode — that used 16-bit characters. Every existing ASCII file in the world would break. They needed an encoding that could represent Unicode without breaking the rest.
They sketched the answer on a placemat. Bytes starting with a zero bit would be plain ASCII, unchanged. Bytes starting with 110, 1110, or 11110 would kick off a 2-, 3-, or 4-byte sequence, with continuation bytes starting with 10. The design had two guarantees baked in: every existing ASCII file was already valid UTF-8, and the encoding was self-synchronizing — a truncated byte stream could find the next character in at most four bytes.
Thompson had the encoder and decoder running in Plan 9 by the next week. They called the scheme FSS-UTF at first; it was renamed UTF-8 a few months later. It was proposed at the January 1993 USENIX conference and standardized as RFC 2279 in 1998.
The main competing proposal, UTF-1, had neither property: every ASCII file needed re-encoding, and you could not reliably resynchronize after a lost byte. UTF-8 let operating systems that had never heard of Unicode pass Unicode text through unchanged. That is the property that made it win.
As of 2024, UTF-8 encodes roughly 98 percent of web pages. Pike has occasionally shown a photograph of the original placemat at talks — folded, coffee-stained, and legibly carrying the skeleton of the modern internet's entire text layer.
Make Recess yours.
Sign in to save the ones you loved, never see the same thing twice, and tell us what you want more of.