Monday, May 16, 2011

gcol == awk++

A while back Will showed you how to ditch Excel for awk, a handy Unix command line tool for extracting certain rows and columns from a text file. While I was browsing the documentation on the previously mentioned PLINK/SEQ library, I came across gcol, another utility for extracting columns from a tab-delimited text file. It can't do anything that awk can't, but it's easier and more intuitive to use for simple text munging tasks. Take a quick look at the gcol examples to see what I mean. And remember, if you need to convert a CSV to a tab-delimited file, just use sed with a Perl regexp: sed -r 's/,/\t/g' myfile.csv

For a demonstration of several other "data science hand tools", check out this post at O'Reilly that covers other handy Unix utilities such as grep, colrm, awk, find, xargs, sort, uniq, and others.

gcol - get columns text utility

No comments:

Post a Comment

Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.