TECHNOLOGY · BITE · 2 MIN · INTERMEDIATE

How UTF-8 Was Designed on a Diner Placemat

Ken Thompson and Rob Pike sketched UTF-8 on a placemat in a New Jersey diner. It now carries 98% of the web.

On the evening of September 2, 1992, Ken Thompson and Rob Pike sat in a New Jersey diner with a problem. Plan 9, their operating system at Bell Labs, had to deal with a new proposed text standard — Unicode — that used 16-bit characters. Every existing ASCII file in the world would break. They needed an encoding that could represent Unicode without breaking the rest.

They sketched the answer on a placemat. Bytes starting with a zero bit would be plain ASCII, unchanged. Bytes starting with 110, 1110, or 11110 would kick off a 2-, 3-, or 4-byte sequence, with continuation bytes starting with 10. The design had two guarantees baked in: every existing ASCII file was already valid UTF-8, and the encoding was self-synchronizing — a truncated byte stream could find the next character in at most four bytes.

Thompson had the encoder and decoder running in Plan 9 by the next week. They called the scheme FSS-UTF at first; it was renamed UTF-8 a few months later. It was proposed at the January 1993 USENIX conference and standardized as RFC 2279 in 1998.

The main competing proposal, UTF-1, had neither property: every ASCII file needed re-encoding, and you could not reliably resynchronize after a lost byte. UTF-8 let operating systems that had never heard of Unicode pass Unicode text through unchanged. That is the property that made it win.

As of 2024, UTF-8 encodes roughly 98 percent of web pages. Pike has occasionally shown a photograph of the original placemat at talks — folded, coffee-stained, and legibly carrying the skeleton of the modern internet's entire text layer.

#unicode#utf-8#encoding#bell-labs#plan-9

Sources

Wikipedia University of Cambridge (mirror)

How UTF-8 Was Designed on a Diner Placemat

Make Recess yours.