Wednesday, September 16, 2009

R clinic this week: Regression Modeling Strategies in R

At this week's R clinic Frank Harrell will unveil the new rms (Regression Modeling Strategies) package that is a replacement for the R Design package.  He will demonstrate the differences with Design, especially related to enhanced graphics for displaying effects in regression models.  Frank will also discuss the implementation of quantile regression in rms.  The rms package website has links to the manual, examples of graphical output, and printable reference cards for many of the package's commands.  It also makes a point that many of rms's graphics capabilities are modular and will play nicely with previously mentioned ggplot2.

To install the rms package, start R and type:

install.packages("rms", dependencies=TRUE)

Then to load it any time thereafter,

library(rms)

The R clinic is held by the Vanderbilt biostatistics department every Thursday 2-3pm and free to anyone who wants to attend.  More information here.

Monday, September 14, 2009

Find the function you're looking for in R

Any R user no matter what level of experience has had trouble finding the package or the function to do what you want to do and then figuring out how to use it.  The sos package in R just made that a lot easier.

First, fire up R, then install the sos package (don't omit the quotes):

install.packages("sos")

It'll ask you to choose a mirror.  Choose the closest one.  After it installs, load the package (omit the quotes this time):

library(sos)

This loaded all the functions that come with the sos package, including a particularly useful one called findFn.  It scans the "function" entries in Jonathan Baron's "R site search" database.  Give it a try, using "epistasis" with the quotes as the keyword.

findFn("epistasis")

This should open up a web browser that displays relevant functions, the package you need to download (using the above procedure) to use the function, and a link to the help page for that function.


You can also use ??? as an alias for findFn.  Try it like this (use the quotes):

???"genome wide"

Once you have the sos package installed, type vignette("sos") for more information on how to use various functions in this package.

If you still can't find what you're looking for, check out my previous post on finding help on R, and if all else fails, don't forget about Theresa Scott's free weekly R clinic / Q&A sessions.

Thursday, September 10, 2009

Machine Learning in R

Revolutions blog recently posted a link to R code by Joshua Reich with self-contained examples of using machine learning techniques in R, including various clustering methods (k-means, nearest neighbor, and kernel), recursive partitioning (CART), principle components analysis, linear discriminant analysis, and support vector machines.  This post also links to some slides that go over the basics of machine learning.  Looks like a good place to start learning about ML before handrolling your own code.

Be sure to check out one of Will's previous post on hierarchical clustering in R.

Revolutions: Machine learning in R, in a nutshell

Wednesday, September 9, 2009

Sync your home directories on ACCRE and the local Linux servers (a.k.a. "the cheeses")

Vanderbilt ACCRE users with PCs only...

If you use ACCRE to run multi-processor jobs you'll be glad to know that they now allow you to map your home directory to your local desktop using Samba (so you can access your files through My Computer as you normally would with local files).  Just submit a help request on their website and they'll get you set up.

Now if you have both your ACCRE home and your home on the cheeses mapped, you can easily sync the files between the two.  Download Microsoft's free SyncToy to do the job.  It's pretty dead simple to set up, and one click will synchronize files between the two servers.


I didn't want to synchronize everything, so I set it up to only sync directories that contain perl scripts and other programs that I commonly use on both machines.  SyncToy also seems pretty useful for backing up your files too.

Microsoft SyncToy

Ask ACCRE to let you map your home

Tuesday, September 8, 2009

Get the full path to a file in Linux / Unix

In the last post I showed you how to point to a file in windows and get the full path copied to your clipboard.  I wanted to come up with something similar for a Linux environment.  This is helpful on Vampire/ACCRE because you have to fully qualify the path to every file you use when you submit jobs with PBS scripts. So I wrote a little perl script:

#!/usr/bin/perl
chomp($pwd=`pwd`);
print "$pwd\n" if @ARGV==0;
foreach (@ARGV) {print "$pwd/$_\n";}

You can copy this from me, just put it in your bin directory, like this:

cp /home/turnersd/bin/path ~/bin

Make it executable, like this:

chmod +x ~/bin/path

Here it is in action. Let's say I wanted to print out the full path to all the .txt files in the current directory.  Call the program with arguments as the files you want to print the path to:


[turnersd@vmps21 replicated]$ ls
parseitnow.pbs
parsing_program.pl
replic.txt
tophits.txt
 
[turnersd@vmps21 replicated]$ path *.txt
/projects/HDL/epistasis/replicated/replic.txt
/projects/HDL/epistasis/replicated/tophits.txt


Sure, it's only a little bit quicker than typing pwd, copying that, then spelling out the filenames.  But if you have long filenames or lots of filenames you want to copy, this should get things done faster.  Enjoy.

Friday, September 4, 2009

ClipPath copies filename and path from windows for loading into R

I wish I would have discovered this long ago.  Loading data into R or MySQL requires you to specify the full path to the file.  If you do this on a Windows machine there are two annoyances.  First, if you save something to your desktop the path to your desktop is really long.  Second, windows by default uses backslashes "\" in the file path, while R or other software requires forward slashes "/".  ClipPath is a tiny program that adds an entry to your right-click menu to copy the full file path with a forward slash, then you can paste the filename into whatever program you're using.



Download the zipfile from the website below (here's a direct link to the zip file).  Extract it's contents, right click on ClipPath.inf, and choose install.  You can always uninstall later through the control panel.

ClipPath Shell Extension

Thursday, September 3, 2009

GGD posts now printer-friendly

A quick announcement about a formatting fix here - you can now print posts from GGD a little more cleanly.  When you print it should no longer include the title and sidebar, so most posts should now only use a page or two.

Wednesday, September 2, 2009

PCG Journal Club Articles, 8/28

Here are citations for the articles discussed at our most recent meeting (August 28). I have also appended a link to Nature Reviews: Genetics new series, Fundamental concepts in genetics, at the end. Our next meeting is scheduled for September 11.

~Julia


Gurwitz D, Fortier I, Lunshof JE, Knoppers BM. Research ethics: Children and population biobanks. Science. 2009 Aug 14; 325(5942):818-9

Hastings PJ, Lupski JR, Rosenberg SM, Ira G. Mechanisms of change in gene copy number. Nat Rev Genet. 2009 Aug; 10(8):551-64.

Holsinger KE, Weir BS. Genetics in geographically structured populations: defining, estimating and interpreting F(ST). Nat Rev Genet. 2009 Sep; 10(9):639-50.

Koga Y, Pelizzola M, Cheng E, Krauthammer M, Sznol M, Ariyan S, Narayan D, Molinaro AM, Halaban R, Weissman SM. Genome-wide screen of promoter methylation identifies novel markers in melanoma. Genome Res. 2009 Aug; 19(8):1462-70.

Monier A, Pagarete A, de Vargas C, Allen MJ, Read B, Claverie JM, Ogata H. Horizontal gene transfer of an entire metabolic pathway between a eukaryotic alga and its DNA virus. Genome Res. 2009 Aug; 19(8):1441-9.

Raveh-Sadka T, Levo M, Segal E. Incorporating nucleosomes into thermodynamic models of transcription regulation. Genome Res. 2009 Aug; 19(8):1480-96.

Rosenberg NA, Vanliere JM. Replication of genetic associations as pseudoreplication due to shared genealogy. Genet Epidemiol. 2009 Sep; 33(6):479-87.

Schmitz D, Netzer C, Henn W. An offer you can't refuse? Ethical implications of non-invasive prenatal diagnosis. Nat Rev Genet. 2009 Aug; 10(8):515.


Fundamental concepts in genetics: http://www.nature.com/nrg/series/fundamental/index.html
Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.