One minute
Clever Way to Remove Duplicate Lines With AWK
This snippet will remove duplicate lines:
$ awk '!seen[$0]++'
But how?! Let’s break it down:
The basic awk
command consists of two parts: condition {action}
. In case of awk '!seen[$0]++'
, only the condition
part is specified. So the action part defaults to simply printing the line for which the condition holds true.
$0
is a special variable that holds the entire line (while $1..n
hold records separated by a delimiter)
seen[$0]
- we are declaring a dictionary, which uses the line content as the key
seen[$0]++
- every time we encounter a line, we increment the value for its key:
- Before the first occurrence, the value will be blank (falsy).
- After each occurrence it will become 1, 2, 3… - a truthy value.
!seen[$0]++
- we use a negation operator, so if the line was NOT seen before (falsy value), it will be printed.
Don’t believe it? Try it out yourself :)
$ awk '!seen[$0]++' << EOF
banana
orange
apple
orange
apple
EOF
Read other posts
comments powered by Disqus