Oneliner to remove duplicates while maintaining the original order

If you have got a list like this:

one
two
one
three
one
two
four

and you want to remove the duplicates from the list, chances are that you will end up with this result:

four
one
three
two

Because you are using a command like:

sort -u < list.txt

or the longer form:

cat list.txt | sort | uniq

There is an easy way to keep the original order of the list and remove the duplicates in an oneliner.
For this you need to number the entries in the list with this command:

nl list.txt

If you don't have nl on your system, you can use cat -n or whatever tickles your fancy.
This will give you the list:

     1  one
     2  two
     3  one
     4  three
     5  one
     6  two
     7  four

We will use the numbering to restore the original order when we are done removing the duplicates.
Next thing is to sort the list on the second field:

nl list.txt | sort -k2

     7  four
     1  one
     3  one
     5  one
     4  three
     2  two
     6  two

and tell sort to remove the lines with duplicate fields:

nl list.txt | sort -k2 -u

     7  four
     1  one
     4  three
     2  two

All that is left is to restore the original order:

nl list.txt | sort -k2 -u | sort -n

     1  one
     2  two
     4  three
     7  four

and get rid of our inserted numbering:

nl list.txt | sort -k2 -u | sort -n | cut -f2-

one
two
three
four

Imagine trying to do this on a Windows box, I wouldn't know where to start 😉

Author: Ewald

The grey haired professor View all posts by Ewald

Author: Ewald

Leave a Reply Cancel reply