How Git Stores Your Code Without Ever Storing Files
Git doesn't track files — it stores a directed acyclic graph of content hashes, and a file is just a pointer into that graph.
Git stores nothing as files on disk in the way you might expect. Every object in a Git repository — every version of every file you've ever committed — is stored in .git/objects/ as a compressed blob whose filename is a SHA-1 hash of its contents. The content is the address.
The object model has four types. A blob holds raw file content — no filename, no metadata, just bytes. A tree object is like a directory listing: it contains a set of entries that map names and permissions to the SHA-1 hashes of either blobs or other trees. A commit object records a tree hash (the snapshot of the project at that moment), the hashes of zero or more parent commits, and your author metadata. A tag points to a commit with an optional message.
The practical consequence of this design is elegant: if you have two identical files anywhere in a repository — say a LICENSE file copied into a subdirectory — Git stores the content exactly once. Both tree entries point to the same blob hash. This is not a compression trick; it's a structural property of the object model.
When you run git diff, Git isn't comparing file paths — it's comparing blob hashes. If the hashes match, the content is the same and there's nothing to show. If they differ, Git decompresses both blobs and runs a line-by-line comparison.
Linus Torvalds described Git's core insight in a 2007 talk at Google: "I'm a firm believer in content-addressable storage. The whole Git design is based around 'I don't care about files, I care about content.'" The DAG structure — directed acyclic graph of commit objects — is what makes branching free and history immutable.
Make Recess yours.
Sign in to save the ones you loved, never see the same thing twice, and tell us what you want more of.