I just crafted a little BASH script that accomplishes the same effect with a single easy to type command. Insert the following text into your .bashrc file. This file is generally hidden in your UNIX home directory (you can see it if you type 'ls -al').
This version converts the infile to a tab-delimited output.
function cleanplink
{
sed -r 's/\s+/\t/g' $1 | sed -r 's/^\t//g' | sed -r 's/NA/\\N/g' > $1.txt
}
And this version converts to a CSV file.
function cleanplink
{
sed -r 's/\s+/,/g' $1 | sed -r 's/^,//g' | sed -r 's/NA/\\N/g' > $1.csv
}
I also converted the "NA" to a Null value for easy loading into MySQL, however you can remove that bit if you'd like:
function cleanplink
{
sed -r 's/\s+/,/g' $1 | sed -r 's/^,//g' > $1.csv
}
You use this function as follows:
bush@queso:~$ cleanplink plinkresults.assoc
and it produces a file with the same name, but with a ".csv" or a ".txt" on the end.
Thank you - I always enjoy it so much learning from my favourite blog things I need in my work!
ReplyDeleteIf it is about importing plink fixed-width-output-files directly into R (without editing the output file before), I simply use the command
read.table() including the two parameters sep="", strip.white=T
Greetings! Holger
Thanks Holger. read.table() works just fine for the default PLINK output, but trying to load variable space delimited files into a MySQL database, or even for opening up small output files in excel, it can be very useful to have the output comma delimited. Glad you enjoy reading GGD!
ReplyDeleteFirst of all, thank you guys so much for doing this!
ReplyDeleteAnd a stupid question, how one does the conversion in Windows. Also, how to view the .assoc files
Yauheniya: I use perl to do what Will showed you how to do here. I'll post this week how to do this conversion using perl, which will work on windows.
ReplyDeleteYou can also combine the two sed command without having to pipe the output.
ReplyDeleteFrom this:
sed -r 's/\s+/,/g' $1 | sed -r 's/^,//g' > $1.csv
to this:
sed -r 's/\s+/,/g;s/^,//g' $1 > $1.csv
It might be faster, at least save some typing.