Tuesday, July 6, 2010

Convert PLINK output to CSV Revisited

A while back, Stephen wrote a very nice post about converting PLINK output to a CSV file. If you are like me, you have used this a thousand times -- enough to get tired of typing lots of SED commands.

I just crafted a little BASH script that accomplishes the same effect with a single easy to type command. Insert the following text into your .bashrc file. This file is generally hidden in your UNIX home directory (you can see it if you type 'ls -al').

This version converts the infile to a tab-delimited output.

function cleanplink
{
sed -r 's/\s+/\t/g' $1 | sed -r 's/^\t//g' | sed -r 's/NA/\\N/g' > $1.txt
}

And this version converts to a CSV file.


function cleanplink
{
sed -r 's/\s+/,/g' $1 | sed -r 's/^,//g' | sed -r 's/NA/\\N/g' > $1.csv
}


I also converted the "NA" to a Null value for easy loading into MySQL, however you can remove that bit if you'd like:

function cleanplink
{
sed -r 's/\s+/,/g' $1 | sed -r 's/^,//g' > $1.csv
}


You use this function as follows:

bush@queso:~$ cleanplink plinkresults.assoc

and it produces a file with the same name, but with a ".csv" or a ".txt" on the end.

5 comments:

  1. Thank you - I always enjoy it so much learning from my favourite blog things I need in my work!

    If it is about importing plink fixed-width-output-files directly into R (without editing the output file before), I simply use the command
    read.table() including the two parameters sep="", strip.white=T

    Greetings! Holger

    ReplyDelete
  2. Thanks Holger. read.table() works just fine for the default PLINK output, but trying to load variable space delimited files into a MySQL database, or even for opening up small output files in excel, it can be very useful to have the output comma delimited. Glad you enjoy reading GGD!

    ReplyDelete
  3. First of all, thank you guys so much for doing this!

    And a stupid question, how one does the conversion in Windows. Also, how to view the .assoc files

    ReplyDelete
  4. Yauheniya: I use perl to do what Will showed you how to do here. I'll post this week how to do this conversion using perl, which will work on windows.

    ReplyDelete
  5. You can also combine the two sed command without having to pipe the output.
    From this:
    sed -r 's/\s+/,/g' $1 | sed -r 's/^,//g' > $1.csv
    to this:
    sed -r 's/\s+/,/g;s/^,//g' $1 > $1.csv

    It might be faster, at least save some typing.

    ReplyDelete

Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.