A diff is a description of the changes between two versions of a file. When you run git diff or look at a pull request, you see added lines in green (prefixed with +) and removed lines in red (prefixed with -). This output is human-readable, but it is generated by an algorithm that solves a non-trivial problem: given two sequences of lines, find the smallest set of changes that transforms one into the other.
The Longest Common Subsequence Problem
At the core of most diff algorithms is the Longest Common Subsequence (LCS) problem. Given two sequences, find the longest subsequence of elements that appears in both in the same order (but not necessarily consecutively). The elements that appear in both sequences are the lines that did not change. The elements that appear only in the first sequence are deletions; elements only in the second are additions.
For files with thousands of lines, computing the exact LCS is expensive — the naive algorithm is O(n*m) in time and space. Practical diff tools use optimized algorithms. The Myers diff algorithm, used by git, runs in O(n+d) time where d is the number of differences, making it fast when files are similar (small d) even if they are long.
The Unified Diff Format
The output format you see from git diff is called unified diff format. It starts with --- (old file) and +++ (new file) headers. Then it shows one or more hunks — sections of the file where changes occur. Each hunk starts with a @@ line that shows the line number ranges in the old and new file, followed by context lines (no prefix), deleted lines (-), and added lines (+). By default, three lines of context are shown around each change.
Conflict Markers
When git cannot automatically merge two branches because both modify the same region of a file, it inserts conflict markers. <<<<<<< HEAD shows the start of your changes, ======= separates them from the incoming changes, and >>>>>>> branch-name marks the end. Everything between the markers is the content of both versions, and you must edit the file to resolve the conflict by keeping the correct content and removing the markers.
Word-Level and Character-Level Diffs
Standard git diffs operate at the line level — an entire line is marked as changed even if only one word within it was modified. For prose documents or long lines, this can make changes hard to read. git diff --word-diff computes the diff at the word level, highlighting which words changed within a line. Many code review tools go further and highlight the specific characters that changed within a modified line, which is the most precise way to review small edits.