Last week Will showed you a
bash script version of a
sed command covered here a while back that would convert PLINK output from the default variable space-delimited format to a more database-loading-friendly tab or comma delimited file. A commenter asked how to do this on windows, so I'll share the way I do this using a perl script which you can use on windows after installing
ActivePerl. First copy the code below and save the file as cleanplink.pl somewhere in your path.
#!/usr/bin/perl
# cleanplink.pl
# (c) Stephen D. Turner 2010 http://www.stephenturner.us/
# This is free open-source software.
# See http://gettinggeneticsdone.blogspot.com/p/copyright.html
my $help = "\nUsage: $0 <input whitespace file> <tab or comma>\n\n";
die $help if @ARGV<2;
$delimiter=pop(@ARGV);
die $help unless ($delimiter=~/tab/i|$delimiter=~/comma/i);
@inputfiles=@ARGV;
if ($delimiter =~ /comma/i) {
foreach (@inputfiles) {
open (IN,"<$_");
open (OUT,">$_.csv");
while (<IN>) {
chomp;
$_ =~ s/^\s+//; #Trim whitespace at beginning
$_ =~ s/\s+$//; #Trim whitespace at end
$_ =~ s/\s+/,/g; #Remaining whitespace into commas
#$_ =~ s/NA/-9/g;#If you want to recode NA as -9
print OUT "$_\n";
}
}
} elsif ($delimiter =~ /tab/i) {
foreach (@inputfiles) {
open (IN,"<$_");
open (OUT,">$_.tab");
while (<IN>) {
chomp;
$_ =~ s/^\s+//; #Trim whitespace at beginning
$_ =~ s/\s+$//; #Trim whitespace at end
$_ =~ s/\s+/\t/g;#Remaining whitespace into commas
#$_ =~ s/NA/-9/g;#If you want to recode NA as -9
print OUT "$_\n";
}
}
} else {
die $help;
}
Run the program with the first argument(s) as the PLINK output file(s) you want to convert, and the last argument being either "comma" or "tab" without the quotes. It'll create another file in the current directory ending with either .csv or .tab. Look below to see cleanplink.pl in action.
turnersd@provolone:~/tmp$ ls
plink.qassoc
turnersd@provolone:~/tmp$ cat plink.qassoc
CHR SNP BP NMISS BETA SE R2 T P
1 rs3094315 742429 3643 -0.2461 0.2703 0.0002275 -0.9102 0.3628
1 rs12562034 758311 3644 -0.1806 0.3315 8.149e-05 -0.5448 0.5859
1 rs3934834 995669 3641 0.04591 0.2822 7.271e-06 0.1627 0.8708
1 rs9442372 1008567 3645 0.1032 0.2063 6.868e-05 0.5002 0.6169
1 rs3737728 1011278 3644 0.1496 0.2268 0.0001195 0.6598 0.5094
1 rs6687776 1020428 3645 -0.5378 0.2818 0.000999 -1.909 0.05639
1 rs9651273 1021403 3643 0.2002 0.2264 0.0002149 0.8847 0.3764
1 rs4970405 1038818 3645 -0.4994 0.3404 0.0005903 -1.467 0.1425
1 rs12726255 1039813 3645 -0.4515 0.2956 0.0006398 -1.527 0.1268
turnersd@provolone:~/tmp$ cleanplink.pl plink.qassoc comma
turnersd@provolone:~/tmp$ ls
plink.qassoc plink.qassoc.csv
turnersd@provolone:~/tmp$ cat plink.qassoc.csv
CHR,SNP,BP,NMISS,BETA,SE,R2,T,P
1,rs3094315,742429,3643,-0.2461,0.2703,0.0002275,-0.9102,0.3628
1,rs12562034,758311,3644,-0.1806,0.3315,8.149e-05,-0.5448,0.5859
1,rs3934834,995669,3641,0.04591,0.2822,7.271e-06,0.1627,0.8708
1,rs9442372,1008567,3645,0.1032,0.2063,6.868e-05,0.5002,0.6169
1,rs3737728,1011278,3644,0.1496,0.2268,0.0001195,0.6598,0.5094
1,rs6687776,1020428,3645,-0.5378,0.2818,0.000999,-1.909,0.05639
1,rs9651273,1021403,3643,0.2002,0.2264,0.0002149,0.8847,0.3764
1,rs4970405,1038818,3645,-0.4994,0.3404,0.0005903,-1.467,0.1425
1,rs12726255,1039813,3645,-0.4515,0.2956,0.0006398,-1.527,0.1268
turnersd@provolone:~/tmp$ cleanplink.pl plink.qassoc tab
turnersd@provolone:~/tmp$ ls
plink.qassoc plink.qassoc.csv plink.qassoc.tab
turnersd@provolone:~/tmp$ cat plink.qassoc.tab
CHR SNP BP NMISS BETA SE R2 T P
1 rs3094315 742429 3643 -0.2461 0.2703 0.0002275 -0.9102 0.3628
1 rs12562034 758311 3644 -0.1806 0.3315 8.149e-05 -0.5448 0.5859
1 rs3934834 995669 3641 0.04591 0.2822 7.271e-06 0.1627 0.8708
1 rs9442372 1008567 3645 0.1032 0.2063 6.868e-05 0.5002 0.6169
1 rs3737728 1011278 3644 0.1496 0.2268 0.0001195 0.6598 0.5094
1 rs6687776 1020428 3645 -0.5378 0.2818 0.000999 -1.909 0.05639
1 rs9651273 1021403 3643 0.2002 0.2264 0.0002149 0.8847 0.3764
1 rs4970405 1038818 3645 -0.4994 0.3404 0.0005903 -1.467 0.1425
1 rs12726255 1039813 3645 -0.4515 0.2956 0.0006398 -1.527 0.1268
perl -lpe 's/"/""/g; s/^|$/"/g; s/\t/","/g < input > output
ReplyDeletehttp://stackoverflow.com/questions/2535255/fastest-way-convert-tab-delimited-file-to-csv-in-linux
That should work on Windows too. If you quote everything you wouldn't need to trim spaces. And for Linux/Mac
tr \\t , < input > output
although you will need some modifications to trim spaces as this is not quoted
Extremely useful, many many thanks!!!
ReplyDeleteNice! PLINK fixed-width format is awful. Thanks!!!
ReplyDelete