TECHNOLOGY · BITE · 2 MIN · INTERMEDIATE

How Git Stores Your Code Without Ever Storing Files

Git doesn't track files — it stores a directed acyclic graph of content hashes, and a file is just a pointer into that graph.

Git stores nothing as files on disk in the way you might expect. Every object in a Git repository — every version of every file you've ever committed — is stored in .git/objects/ as a compressed blob whose filename is a SHA-1 hash of its contents. The content is the address.

The object model has four types. A blob holds raw file content — no filename, no metadata, just bytes. A tree object is like a directory listing: it contains a set of entries that map names and permissions to the SHA-1 hashes of either blobs or other trees. A commit object records a tree hash (the snapshot of the project at that moment), the hashes of zero or more parent commits, and your author metadata. A tag points to a commit with an optional message.

The practical consequence of this design is elegant: if you have two identical files anywhere in a repository — say a LICENSE file copied into a subdirectory — Git stores the content exactly once. Both tree entries point to the same blob hash. This is not a compression trick; it's a structural property of the object model.

When you run git diff, Git isn't comparing file paths — it's comparing blob hashes. If the hashes match, the content is the same and there's nothing to show. If they differ, Git decompresses both blobs and runs a line-by-line comparison.

Linus Torvalds described Git's core insight in a 2007 talk at Google: "I'm a firm believer in content-addressable storage. The whole Git design is based around 'I don't care about files, I care about content.'" The DAG structure — directed acyclic graph of commit objects — is what makes branching free and history immutable.

#git#version-control#data-structures#software-engineering

Sources

Pro Git / git-scm.com Google / YouTube

How Git Stores Your Code Without Ever Storing Files

Make Recess yours.