If you have got a list like this:
one two one three one two four
and you want to remove the duplicates from the list, chances are that you will end up with this result:
four one three two
Because you are using a command like:
sort -u < list.txt
or the longer form:
cat list.txt | sort | uniq
There is an easy way to keep the original order of the list and remove the duplicates in an oneliner.
For this you need to number the entries in the list with this command:
nl list.txt
If you don't have nl on your system, you can use cat -n or whatever tickles your fancy.
This will give you the list:
1 one 2 two 3 one 4 three 5 one 6 two 7 four
We will use the numbering to restore the original order when we are done removing the duplicates.
Next thing is to sort the list on the second field:
nl list.txt | sort -k2
7 four 1 one 3 one 5 one 4 three 2 two 6 two
and tell sort to remove the lines with duplicate fields:
nl list.txt | sort -k2 -u
7 four 1 one 4 three 2 two
All that is left is to restore the original order:
nl list.txt | sort -k2 -u | sort -n
1 one 2 two 4 three 7 four
and get rid of our inserted numbering:
nl list.txt | sort -k2 -u | sort -n | cut -f2-
one two three four
Imagine trying to do this on a Windows box, I wouldn't know where to start 😉