Git Internals¶
So far, we have looked at how you can use Git to manage the different states of your code. Now we want to show you the data and storage models that underlie Git.
Data Model¶
You will be able to use Git more effectively once you understand the data model. Git manages four types of data:
Objects¶
All commits, trees, blobs, and tags in a Git repository are stored as Git objects, which never change after they are created and have a unique ID, such as
3a5c279ea2f5d18498b61c229571d2449305a0. This means that you can use an object’s ID to restore its contents at any time, as long as the object has not been deleted.See also
Commit¶
A commit is a snapshot of the entire Git repository that can be uniquely identified by a SHA value and contains at least the following information:
Directory structure of all files in this version of the repository and the contents of each file, stored as the tree ID of the top-level directory of the commit.
ID(s) of the parent commit(s). The first commit of a repository has no parent commits, regular commits have one parent commit, merge commits have two or more parent commits.
Author and time when the commit was created.
Committer and time when the commit was committed.
Commit message
Example:
$ git cat-file -p main tree 47cc0283b10bd5e4e8a0d61537d13bba3bfad916 parent 63825a43e213ef8a7904a8994976ac86284d32bd author veit <veit@cusy.io> 1770370977 +0100 committer veit <veit@cusy.io> 1770370977 +0100 :memo: Add links to Python speedLike all other objects, commits cannot be changed after they have been created. So if you want to change a commit with
git commit --amend, a new commit with the same parent is actually created. And even if you display a commit withgit show, the diff is only calculated at that point in time.See also
Tree¶
Representation of a directory in Git and can contain files or other trees (subdirectories). For each element in the tree, it lists the following:
File name
File type:
normal file
executable file
symbolic link
directory
gitlink (for submodules)
Object ID with the contents of the file, directory or gitlink
Example:
$ git cat-file -p main^{tree} 040000 tree 2f59a223f7dc767f4776e77762d208fa72bfd343 .dvc 040000 tree 75833fd33271db55b6f1c96915f60f98a60b51a0 .github 100644 blob 36d2dc5a5228cbf65b8cfe913565c9be49db1a3d .gitignore ... $ git cat-file -p 2f59a223f7dc767f4776e77762d208fa72bfd343 100644 blob 669784da1fe0818e9abb795f73b7faf393832f2e .gitignore 100644 blob 0a66f9a9ab72e3a99994803de8337f523b1b93d0 config $ git cat-file -p 36d2dc5a5228cbf65b8cfe913565c9be49db1a3d # SPDX-FileCopyrightText: 2019 cusy GmbH # # SPDX-License-Identifier: BSD-3-Clause ...Hint
The first column of a tree entry is roughly based on Unix file permissions, , but Git cannot actually manage Unix file permissions. Extensions such as etckeeper are required for this.
Blob¶
A blob object contains the contents of a file
With each commit, Git stores the entire contents of each file you have changed as a blob. For example, if you have a commit that changes two files in a repository, that commit creates two new blobs, so commits take up relatively little storage space even in very large repositories.
Tag¶
Tag objects contain at least the following fields:
of the object to which it refers
Type of the object to which it refers
Tag message
Tagger and tag date
Example:
$ git cat-file -p 24.3.0 object aa366cc9af3497544338482f82bdeb21f1dd3c21 type commit tag 24.3.0 tagger Veit Schiele <veit@cusy.io> 1732086922 +0100
References¶
References are a way to give commits a name that is easier to remember, such as for branches, tags, Remote branches, and so on. Git often uses
refas an abbreviation for such references. The most important references are:
.git/refs/heads/BRANCHNAME
- A branch refers to the ID of the latest commit on that
branch. To retrieve the history of commits on a branch, Git starts with the commit ID that the branch points to and then looks at the parent commits. References can refer to
an object ID, usually a commit ID
another symbolic reference
.git/refs/tags/TAGNAMEA tag refers to a commit ID, a tag object ID, or another object ID.
.git/HEADHEAD is where Git stores your current branch.
HEADcan be either
a symbolic reference to your current branch, for example
ref: refs/heads/main.a direct reference to a commit ID if there is no current branch, that is, in a detached HEAD state.
.git/refs/remotes/REMOTE/BRANCHNAMEA remote tracking branch refers to a commit ID. You can update it with
git fetchif necessary, and ifgit statusoutputsYour branch is up to date with 'origin/main', it refers to it.
refs/remotes/{REMOTE}/HEADis a symbolic reference to the default branch of the remote repository.See also
Index¶
List of files and their contents stored as blob. With
git add, you can add files to the index or update the contents of a file in the index.Unlike a tree, the index is a flat list of files. When you commit, Git converts the list of files in the index into a directory tree and uses that tree for the new commit. Each index entry has four fields:
One of the following four file types:
regular file
executable file
symbolic link
gitlink (for submodules)
Blob ID of the file or commit ID of the submodule
Staging number, usually
0. However, in the event of a merge conflict, there may be multiple versions of the same file name in the index.File path
Reflog¶
Every time a branch, a remote tracking branch <remote-branches>, or :term:`HEAD is updated, Git updates a log called reflog for that reference, for example in
.git/logs/refs/heads/main:0000000000000000000000000000000000000000 492e16edcf9cdb3371492be59735e517a17cc86c veit <veit@cusy.io> 1739549686 +0100 clone: from github.com:cusyio/Python4DataScience-de.git 492e16edcf9cdb3371492be59735e517a17cc86c c40bfa2a238e824b619f760494ce5ce0769851c3 veit <veit@cusy.io> 1739549907 +0100 commit: Update git docs c40bfa2a238e824b619f760494ce5ce0769851c3 fa39661bb7fa93b420870845cb174529e8d62552 veit <veit@cusy.io> 1739549971 +0100 rebase (finish): refs/heads/main onto b7214df753ecbd01acd90d8f3dcd359e02441249 ...Each entry in the reflog contains:
Commit ID
Commit ID of the subsequent commits
Author
Email address
Timestamp when the change was made
Log message, for example:
clone: from REMOTE-URL
commit: COMMIT-MESSAGE
rebase (finish): refs/heads/main onto BASIC-COMMIT-IDReflogs log changes made in your local repository. However, they are not shared in the remote repository.
See also
Storage model¶
See also
Packfiles¶
The format in which Git stores objects on the hard drive is called the loose object format. However, to save space, Git occasionally packs several of these objects into a single binary file called Packfile in order to save space and work more efficiently. You can also perform packing manually with
git pushorgit gc. This will delete most of your objects in.git/objects/and create a new pair of files:$ find .git/objects -type f .git/objects/pack/pack-e9282cda3898f806f7bd108a3675c9e4d236915c.pack .git/objects/pack/pack-e9282cda3898f806f7bd108a3675c9e4d236915c.idx
*.packcontains the contents of all objects that have been removed from your file system.
.idxcontains the offsets of this pack file, allowing you to quickly jump to a specific object.
Any remaining objects are blobs that are not referenced by any commit, known as dangling references , such as files in the working directory that were never added to a commit.
When Git packs objects, it looks for files with similar names and sizes and only stores the deltas from one version of the file to the next. With
git verify-pack, you can view the pack file and see how Git saved storage space:$ git verify-pack -v .git/objects/pack/pack-e9282cda3898f806f7bd108a3675c9e4d236915c.pack ... dd1827ebf73b22d9f5828eec005eda4d79520f57 blob 147 140 389838 0a66f9a9ab72e3a99994803de8337f523b1b93d0 blob 31 43 389978 1 dd1827ebf73b22d9f5828eec005eda4d79520f57 ... .git/objects/pack/pack-e9282cda3898f806f7bd108a3675c9e4d236915c.pack: ok
Blob
0a66f9arefers to the following blobdd1827e.The third column indicates the size of the object in the packfile, so you can see that
dd1827etakes up 147 bytes, while0a66f9aonly takes up 31 bytes.The current file is therefore stored unchanged, while the original version is stored as a delta. This allows faster access to the latest version of a file.
The general syntax of
git verify-pack -vis:
OBJECT-ID TYPE SIZE SIZE-IN-PACKFILE OFFSET-IN-PACKFILE [DEPTH BASE-ID]See also