Git for binary files¶
git diff can be configured so that it can also display meaningful diffs for
binary files.
… for Excel files¶
For this we need openpyxl and pandas:
$ uv add openpyxl pandas
Then we can use pandas.DataFrame.to_csv in
exceltocsv.py to convert the Excel files:
# SPDX-FileCopyrightText: 2023 cusy GmbH
#
# SPDX-License-Identifier: BSD-3-Clause
import sys
from io import StringIO
import pandas as pd
for sheet_name in pd.ExcelFile(sys.argv[1]).sheet_names:
output = StringIO()
print("Sheet: %s" % sheet_name)
pd.read_excel(sys.argv[1], sheet_name=sheet_name).to_csv(
output, header=True, index=False
)
print(output.getvalue())
Now add the following section to your global Git configuration
~/.config/git/config:
[diff "excel"]
textconv=python3 /PATH/TO/exceltocsv.py
binary=true
Finally, in the global ~/.config/git/attributes file, our excel
converter is linked to *.xlsx files:
*.xlsx diff=excel
… for PDF files¶
For this, pdftohtml is additionally required. It can be installed with
$ sudo apt install poppler-utils
$ brew install pdftohtml
Add the following section to the global Git configuration
~/.config/git/config:
[diff "pdf"]
textconv=pdftohtml -stdout
Finally, in the global ~/.config/git/attributes file, our pdf
converter is linked to *.pdf files:
*.pdf diff=pdf
Now, when git diff is called, the PDF files are first converted and then a
diff is performed over the outputs of the converter.
… for documents¶
Differences in documents can also be displayed. For this purpose Pandoc can be used, which can be easily installed with
Then add the following section to your global Git configuration
~/.config/git/attributes:
[diff "pandoc-to-markdown"]
textconv = pandoc --to markdown
cachetextconv = true
Finally, in the global ~/.config/git/attributes file, our
pandoc-to-markdown converter is linked to *.docx, *.odt and
*.rtf files:
*.docx diff=pandoc-to-markdown
*.odt diff=pandoc-to-markdown
*.rtf diff=pandoc-to-markdown
Tip
Jupyter Notebooks write to a JSON file *.ipynb, which is quite dense and difficult to read, especially with diffs. The Markdown representation of Pandoc simplifies this:
*.ipynb diff=pandoc-to-markdown
The same procedure can be used to obtain useful diffs from other binaries, for
example *.zip, *.jar and other archives with unzip or for changes in
the meta information of images with exiv2. There are also conversion tools
for converting *.odt, *.doc and other document formats into plain text.
For binary files for which there is no converter, strings are often sufficient.
… for media files¶
ExifTool can be used to convert the metadata of media files to text.
$ sudo apt install libimage-exiftool-perl
$ brew install exiftool
> choco install exiftool
See also
You can then add the following section to the global Git configuration file
~/.config/git/config:
[diff "exiftool"]
textconv = exiftool --composite -x 'Exiftool:*'
cachetextconv = true
xfuncname = "^-.*$"
Finally, in ~/.config/git/attributes the exiftool converter is
linked to file endings of media files:
*.avif diff=exiftool
*.bmp diff=exiftool
*.gif diff=exiftool
*.jpeg diff=exiftool
*.jpg diff=exiftool
*.png diff=exiftool
*.webp diff=exiftool
See also
exiftool can process many more media files. You can find a complete list
in Supported File Types.