This snippet will remove duplicate lines:

$ awk '!seen[$0]++'

But how?! Let’s break it down:

The basic awk command consists of two parts: condition {action}. In case of awk '!seen[$0]++', only the condition part is specified. So the action part defaults to simply printing the line for which the condition holds true.

$0 is a special variable that holds the entire line (while $1..n hold records separated by a delimiter)

seen[$0] - we are declaring a dictionary, which uses the line content as the key

seen[$0]++ - every time we encounter a line, we increment the value for its key:

  • Before the first occurrence, the value will be blank (falsy).
  • After each occurrence it will become 1, 2, 3… - a truthy value.

!seen[$0]++ - we use a negation operator, so if the line was NOT seen before (falsy value), it will be printed.

Don’t believe it? Try it out yourself :)

$ awk '!seen[$0]++' << EOF
banana
orange
apple
orange
apple
EOF