This snippet will remove duplicate lines:
$ awk '!seen[$0]++'
But how?! Let’s break it down:
The basic awk command consists of two parts: condition {action}. In case of awk '!seen[$0]++', only the condition
part is specified. So the action part defaults to simply printing the line for which the condition holds true.
$0 is a special variable that holds the entire line (while $1..n hold records separated by a delimiter)
seen[$0] - we are declaring a dictionary, which uses the line content as the key
seen[$0]++ - every time we encounter a line, we increment the value for its key:
- Before the first occurrence, the value will be blank (falsy).
- After each occurrence it will become 1, 2, 3… - a truthy value.
!seen[$0]++ - we use a negation operator, so if the line was NOT seen before (falsy value), it will be printed.
Don’t believe it? Try it out yourself :)
$ awk '!seen[$0]++' << EOF
banana
orange
apple
orange
apple
EOF