Published on

Git Internals, What Actually Happens When You Commit

Authors

Most developers use Git every day and understand it well enough to be productive, branch, commit, push, merge. But when something goes wrong, or when you need to do something unusual, the command you need feels like a magic incantation you found on Stack Overflow.

That changes once you understand the object model. Git is, at its core, a content-addressable filesystem.

The .git Directory

Every git repository has a .git directory. This is the entire repository, the working directory is just a checkout of whatever your current commit says the files should be.

.git/
  objects/   ← all your data lives here
  refs/      ← branches and tags (just pointers to commits)
  HEAD       ← which branch you're on
  index      ← the staging area

Four Object Types

Git stores everything as objects. Each object is identified by the SHA-1 hash of its contents. There are four types:

1. Blob

A blob stores raw file contents. Just the bytes, no filename, no metadata.

git hash-object README.md
# e.g. d670460b4b4aece5915caf5c68d12f560a9fe3e4

Two files with identical contents produce the same blob. Git deduplicates automatically.

2. Tree

A tree maps filenames to blobs (or other trees). It's a directory snapshot.

git cat-file -p HEAD^{tree}
# 100644 blob d670460...  README.md
# 100644 blob 83baae6...  index.html
# 040000 tree 92b8935...  src

Each line is mode type hash filename. Trees can contain other trees, which is how subdirectories work.

3. Commit

A commit points to a tree (the root snapshot), one or more parent commits, and has author metadata:

git cat-file -p HEAD
# tree 92b8935...
# parent a3f8c21...
# author Alex Peng <alexpeng014@gmail.com> 1675987200 -0500
# committer Alex Peng <alexpeng014@gmail.com> 1675987200 -0500
#
# Fix null pointer in user resolver

The commit doesn't store a diff. It stores a complete tree snapshot. The diff you see in git show is computed by comparing the commit's tree to its parent's tree.

4. Tag

An annotated tag is an object that points to a commit with a message. Lightweight tags are just refs (pointers), not objects.

Branches Are Just Pointers

A branch is a file in .git/refs/heads/ containing a single commit SHA:

cat .git/refs/heads/main
# a3f8c21d8b92e3f44bcde9a0f1234567890abcdef

When you commit, git creates a new commit object and updates this file to point to it. That's all branching is. This is why branches in git are cheap, there's no copying of files.

HEAD is a special pointer to whichever branch you're on:

cat .git/HEAD
# ref: refs/heads/main

In detached HEAD state, HEAD points directly to a commit SHA instead of a branch.

What Happens on Commit

  1. Git takes all staged files and creates blobs for any new/modified ones
  2. It builds a tree from the current index (staging area)
  3. It creates a commit object pointing to that tree and the previous commit
  4. It updates the current branch ref to point to the new commit

Nothing is ever deleted. Old objects stick around until garbage collection runs.

Why This Matters

Once you have this model, operations that seemed confusing become obvious:

  • git reset --soft HEAD~1: moves the branch pointer back one commit. The objects still exist. The working tree doesn't change.
  • git rebase: creates new commit objects (same changes, different parent), then moves the branch pointer. The old commits are still there until GC.
  • git reflog: the log of where HEAD has been, which lets you recover from almost any mistake, because the objects persist.

Knowing that commits are immutable objects linked by parent pointers, and branches are mutable pointers into that graph, makes git feel a lot less like magic and a lot more like a data structure.