Saturday, June 23, 2007

Comparing two files using comm

comm compares contents of two files. It has 3 columns available in its output — the lines only in file 1, the lines only in file 2, and the lines in both. You'll need to sort both files first.

sort test1.txt > test1-sorted.txt
sort test2.txt > test2-sorted.txt

This will show you lines only in test1.txt:

comm -23 test1-sorted.txt test2-sorted.txt

This will show you lines only in test2.txt:

comm -13 test1-sorted.txt test2-sorted.txt

This will show you lines only common to both files:

comm -12 test1-sorted.txt test2-sorted.txt

We can also do some neat tricks with uniq/sort -u:

cat test1.txt | sort > test1-sorted.txt
cat test1.txt | sort -u > test1-sorted-u.txt

This will show you lines only in test1-sorted-u.txt, which means those are the lines that appear multiple times in your original test1.txt file:

comm -13 test1-sorted.txt test1-sorted-u.txt

Neat, huh?

No comments: