Here is a quick method for checking parity between two directories.
Let’s say we have two directories
dir2. They are large and have thousands of files and subdirectories. How can we check that they have the same contents? I found myself in this situation recently during a server migration.
Method 1: rsync
To move the files in the first place we can do
rsync -av --checksum path/to/dir1/ path/to/dir2 which will copy the contents of
a into the
b directory. Use of the checksum flag means that the hashes are checked during transfer. If
b is a new directory, or you use the –delete flag to erase anything else in
b then this means the transfer is done and that it is checked. Barring any errors both directories will be the same. If you want to just see what would have happened with the command, use –dry-run.
rsync command can be run again to check that everything went OK and that there are no more files to update or change. The problem is that with a large directory that has many files and subdirectories, this is slow, especially over a network connection.
Method 2: diff
Using diff to check for differences between the contents of
dir2 is a great way to check parity.
diff -rq path/to/dir1 path/to/dir2
This will list any files that are present/absent in one directory or the other and it will also show which files have changed.
Again the problem here is speed. If the diff is done in the shell of a computer with two network volumes, particularly over a slow connection, this can take a long time.
Method 3: python
Python has a library filecmp which could help us.
import filecmp dir1 = 'path/to/dir1' dir2 = 'path/to/dir2' filecmp.dircmp(dir1,dir2).report_full_closure()
We can get lists of contents and it works fast, but the results are difficult to make sense of.
Method 4: easy/quick method
Let’s say we are pretty sure that the directories are the same, but we just want to make sure. Imagine you did the migration a few days ago but weren’t sure if anything has changed in
dir2 in that time. If others have read-write access to
dir2 there might be files added or removed that you don’t know about.
Here’s a simple solution in zsh.
Get a list of files in both directories and put them in two text files on the desktop.
cd path/to/dir1 find -L . > ~/Desktop/dir1.txt cd path/to/dir2 find -L . > ~/Desktop/dir2.txt
Change directory means that the relative paths of all files and folder listed in the the text file are comparable. The two outputs we get for a dummy set of files (created using this script) looks like this
. ./not_the_same ./dir_only_in_dir1 ./file_only_in_dir1 ./common_file ./file_in_dir1 ./common_dir ./common_dir/dir2 ./common_dir/dir2/not_the_same ./common_dir/dir2/common_file ./common_dir/dir2/file_in_dir1 ./common_dir/dir2/dir_only_in_dir2 ./common_dir/dir2/common_dir ./common_dir/dir2/file_only_in_dir2 ./common_dir/dir1 ./common_dir/dir1/not_the_same ./common_dir/dir1/dir_only_in_dir1 ./common_dir/dir1/file_only_in_dir1 ./common_dir/dir1/common_file ./common_dir/dir1/file_in_dir1 ./common_dir/dir1/common_dir
. ./not_the_same ./common_file ./file_in_dir1 ./dir_only_in_dir2 ./common_dir ./common_dir/dir2 ./common_dir/dir2/not_the_same ./common_dir/dir2/common_file ./common_dir/dir2/file_in_dir1 ./common_dir/dir2/dir_only_in_dir2 ./common_dir/dir2/common_dir ./common_dir/dir2/file_only_in_dir2 ./common_dir/dir1 ./common_dir/dir1/not_the_same ./common_dir/dir1/dir_only_in_dir1 ./common_dir/dir1/file_only_in_dir1 ./common_dir/dir1/common_file ./common_dir/dir1/file_in_dir1 ./common_dir/dir1/common_dir ./file_only_in_dir2
This step is seriously fast. On the large directories I was scanning, it took just a few minutes for
find versus hours for
Now we can sort the text files into alphabetical order and put the contents each into a new file:
cd ~/Desktop sort dir1.txt > dir1sort.txt sort dir2.txt > dir2sort.txt
Then we can just
diff the text files by doing
diff dir1sort.txt dir2sort.txt
However, I really like diff2html for a more visual way to observe the output. Using:
diff -u dir1sort.txt dir2sort.txt | diff2html -i stdin
We get this:
The sorting step helps with diff, because contiguous blocks of flies in subdirectories will be grouped together following the sort.
The example I’m showing here is very simple and the power of this method comes when a complicated directory tree needs to be quickly compared.
This post is part of an occasional series of tech tips.