A Case for Snapshot Testing
Have you ever written a test only to realize the expected output is tedious to specify—or worse, changes with every legitimate feature update? Traditional unit tests force you to manually define every expectation. Snapshot testing flips this around: capture the output once, then let changes speak for themselves. It’s low friction, high value, and surprisingly simple to implement.
Snapshot testing
The process of snapshot testing (also known as golden file testing) works as follows: first, generate a “ground truth” or golden file. Check that the test is green. Now, make the changes you want—maybe add a new feature. Run your test again, which checks the output of the program against the golden file. Either it looks the same, or you can now see the difference between them.
Now you can decide whether the change is intentional or whether you have found an error. The best part: you write the test once and never have to manually update expectations again. The output updates and changes automatically whenever you “promote” the new version.
It is a low-cost and, with a bit of discipline, a high-value addition to any project that has predictable output for known input. Furthermore, you can use snapshot tests to set up a feedback loop for agents in seconds.
Why snapshot testing?
Testing is a great thing. Let’s take unit tests. You know the input and you know the output of a program, so why not write a unit test for it? It ensures things work after every iteration and prevents regressions.
This works great until you recognize that you have to change the expected output—perhaps a user wants flashy headings in their terminal. This is tedious, and it might discourage you from writing unit tests in the future. Many developers will only write a unit test when they are certain that the expected output won’t change, or they spend too much time testing only the “right” things: the atomic components that won’t change much. Or, fearing the output might change, they might test a whole lot less.
But wait—what if you could have it all? You can have tests that automatically adapt to your program while still catching regressions. These tests can be checked into version control to serve as living documentation of how the program works.
Especially now, you can point an AI agent at a test and say: “make this change, and check that it didn’t mess up the program using the test suite.” Tests provide the security that the program is still doing what you want, but they shouldn’t be a time sink.
When requirements change, traditional tests can make your life miserable because you have to rewrite every manual expectation. You might say, “Well, let the coding agent write the tests.” I would say yes and no to that. While an agent writes tests fast, they don’t always provide security. Agents and LLMs don’t understand program behavior like humans do; they might check that an array has 10 members just to have a checkmark, even if that array is a variable configuration that changes constantly. That results in high coverage but zero visibility. You want valuable tests that find bugs before production, not just a “magical” 80% coverage number.
The challenge is finding the right spot to test without wasting time. That is where snapshot tests come in.
The Workflow
The workflow looks like this:
# have a test for add :: (a: int, b: int) -> string;
$ cat golden_file.txt
Input: 2, 3
Output: The answer to 2 + 3 is 5
$ snaptest add_test
✔ all tests pass
# you change the output in your code
$ snaptest add_test
❌ the tests failed:
=== add_test
< Output: The answer to 2 + 3 is 5
---
> Output: The sum of 2 + 3 is 5
$ snaptest promote
# here you declared the new output is correct
$ snaptest add_test
✔ all tests pass
Implementation
And how can we accomplish this? It is astonishingly easy:
#!/bin/bash
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
TEST_DIR="$SCRIPT_DIR/<path to your test dir>"
BINARY="$SCRIPT_DIR/<path to your executable or command to execute>"
FIXTURE_FILE="$TEST_DIR/test.out.fixture"
CI_MODE="${CI_MODE:-false}"
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
echo -e "${YELLOW}Generating current output...${NC}"
OUTPUT=$("$BINARY" -path "$TEST_DIR" 2>&1)
echo ""
if [ ! -f "$FIXTURE_FILE" ]; then
echo -e "${YELLOW}No fixture found. Creating first fixture...${NC}"
echo "$OUTPUT" > "$FIXTURE_FILE"
echo -e "${GREEN}Fixture created at: $FIXTURE_FILE${NC}"
echo "Run the test again to verify output."
exit 0
fi
echo -e "${YELLOW}Comparing with fixture...${NC}"
echo ""
if diff -u <(cat "$FIXTURE_FILE") <(echo "$OUTPUT") > /dev/null 2>&1; then
echo -e "${GREEN}✓ Test passed! Output matches fixture.${NC}"
exit 0
else
echo -e "${RED}✗ Test failed! Output differs from fixture.${NC}"
echo ""
echo -e "${YELLOW}Diff (showing changes):${NC}"
echo "---"
diff -u --color=always <(cat "$FIXTURE_FILE") <(echo "$OUTPUT") || true
echo "---"
echo ""
if [ "$CI_MODE" = "true" ]; then
echo -e "${RED}CI mode: fixture not updated. Test failed.${NC}"
exit 1
fi
read -p "Do you want to update the fixture? (y/n) " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
echo "$OUTPUT" > "$FIXTURE_FILE"
echo -e "${GREEN}✓ Fixture updated at: $FIXTURE_FILE${NC}"
exit 0
else
echo -e "${YELLOW}Fixture not updated. Test failed.${NC}"
exit 1
fi
fi
The script works as follows:
- Check for fixture: If no fixture exists for the input, we create one and notify the user.
- Comparison: If it exists, we run the program and match the current output against the saved fixture.
- User Feedback: If they match, the test passes. If they differ, the user is shown a diff and asked if the change was intentional.
- Promotion: If intentional, the fixture is updated. If not, the developer must fix the code to make the test green.
- CI/CD: The
CI_MODEflag ensures it fails immediately in automated environments without waiting for user input.
Key Advantages
- Framework Agnostic: You don’t need a specific library. As shown above, a simple shell script can handle it.
- Version Control: Golden files are just plain text. You can track exactly how output has evolved over time via Git.
- Flexibility: Adding metadata or scrubbing non-deterministic data (like timestamps) is just a matter of a few more lines of scripting.
- Scrubbing Example: You can handle timestamps or UUIDs by filtering them out before comparison:
OUTPUT=$(echo "$OUTPUT" | sed 's/"timestamp":"[^"]*"//g').
When NOT to use snapshots
Snapshots are powerful, but they’re not a silver bullet. Skip them if:
- Output is truly non-deterministic. If you’re scrubbing 90% of the file to get a match, you’ve lost the value.
- Files are large or binary. Diffs become unreadable and Git history becomes bloated. Use structured comparisons instead.
- Security-sensitive data. Never commit snapshots containing API keys, passwords, or PII.
- The output changes constantly. If you find yourself promoting every single run without looking at the diff, you aren’t testing—you’re just recording noise.
Ready-to-use Frameworks
Before rolling your own, check if your ecosystem has a battle-tested library:
- Jest (JavaScript/TypeScript)
- insta (Rust)
- golden (Go)
- pytest-snapshot (Python)
Real-world example
Imagine a CLI tool that formats code. With 10 input files, the snapshot workflow is:
- Run the formatter on each input and save to fixtures.
- When the formatting logic changes, run the tests. Diffs show exactly how the indentation or spacing changed across all 10 files.
- If the changes are correct, promote them all.
- Commit the new golden files.
No manual labor. Just capture, compare, and promote. For a great example of this in the wild, look at Ruff, the Python linter/formatter, which uses insta for its extensive test suite.
Summary
Snapshot testing simplifies maintenance by comparing program output against golden files. Instead of manually writing expected outputs, you generate them once and update them when changes are intentional. This reduces friction, catches regressions early, and adapts gracefully to changing requirements.