Friday, September 25, 2009
What happens when a consumer genetics company goes bankrupt?
Bankruptcy law authorizes the sale of the assets of a business in bankruptcy, and genomic data is likely the most valuable asset of any DTC genomics company. First the authors dissect the privacy policy and terms of service for three major DTC companies: 23andMe, deCODE Genetics, and TruGenetics. Next there's a discussion of how the legal system would treat a DTC genomics company's bankruptcy. The series wraps up with a brief discussion of how this ultimately affects the average DTC genomics cutomer.
Genomics Law Report: What happens if a DTC Genomics Company Goes Belly-Up?
Wednesday, September 23, 2009
JBrowse: a JavaScript Based Genome Browser
Genome Browsers are nothing new, but JBrowse is a new JavaScript based genome browser that uses information from the UCSC genome browser and has the look and feel of Google Maps. It's extremely easy to zoom in and out and scroll around because all the "work" is being done by your computer rather than some server farm thousands of miles away. OpenHelix is calling it a gamechanger, and they have a nice video demonstration showing off some of JBrowse's features. Click the Drosophila or Homo sapiens genome and give JBrowse a spin for yourself!
The JBrowse genome browser
Monday, September 21, 2009
Comparison of plots using Stata, R base, R lattice, and R ggplot2, Part I: Histograms
First I'll start with the three graphing systems in R: base, lattice, and ggplot2. If you don't have the last two packages installed, go ahead and download them:
Now load these two packages, and download this fake dataset I made up containing 100 samples each from three different genotypes ("geno") and a continuous outcome ("trait")
Now let's get started...
R: base graphics
R: lattice
R: ggplot2
qplot(trait,data=mydat,facets=geno~.)
# Update Tuesday, September 22, 2009
# A commenter mentioned that this code did not work.
# If the above code does not work, try explicitly
# stating that you want a histogram:
qplot(trait,geom="histogram",data=mydat,facets=geno~.)
Stata
insheet using "http://people.vanderbilt.edu/~stephen.turner/ggd/2009-09-21-histodemo.csv", comma clear
histogram trait, by(geno, col(1))
Commentary
In my opinion ggplot2 is the clear winner. Again I'll concede that all of the above graphing systems give you an incredible amount of control of every aspect of the graph, but I'm only looking for what gives me the best out-of-the-box default plot using the shortest command possible. R's base graphics give you a rather spartan plot, with very wide bins. It also requires 4 lines of code. (If you can shorten this, please comment). By default, the base graphics system gives you counts (frequency) on the vertical axis. The lattice package in R does a little better perhaps, but the default color scheme is visually less than stellar. Also, I'm not sure why the axis labels switch sides every other plot, and the ticks on top of the plot are probably unnecessary. I still think the bins are too wide. You lose some information especially on the bottom plot towards the right tail. The vertical axis is proportion of total. Stata's default plot looks very similar to lattice, but again uses a very unattractive color scheme. It uses density for the vertical axis, which may not mean much to non-statisticians. The default plot made by ggplot2 is just hands-down good-looking. There are no unnecessary lines delimiting the bins, and the binwidth is appropriate. The vertical axis represents counts. The black bars on the light-gray background have a good data-ink ratio. And it required the 2nd shortest command, only 3 characters longer than the Stata equivalent.
I'm ordering the ggplot2 book (Amazon, ~$50), so as I figure out how to do more with ggplot2 I'll post more comparisons like this. If you use SPSS, SAS, MATLAB, or something else, post the code in a comment here and send me a picture or link to the plot and I'll post it here.
Wednesday, September 16, 2009
PCG Journal Club Articles, 9/11
~Julia
Kim S, Xing EP. Statistical estimation of correlated genome associations to a quantitative trait network. PLoS Genet. 2009 Aug; 5(8):e1000587.
Zamar D, Tripp B, Ellis G, Daley D. Path: a tool to failitate pathway-based genetic association analysis. Bioinformatics. 2009 Sep 15; 25(18):2444-6
R clinic this week: Regression Modeling Strategies in R
To install the rms package, start R and type:
Then to load it any time thereafter,
The R clinic is held by the Vanderbilt biostatistics department every Thursday 2-3pm and free to anyone who wants to attend. More information here.
Monday, September 14, 2009
Find the function you're looking for in R
First, fire up R, then install the sos package (don't omit the quotes):
It'll ask you to choose a mirror. Choose the closest one. After it installs, load the package (omit the quotes this time):
This loaded all the functions that come with the sos package, including a particularly useful one called findFn. It scans the "function" entries in Jonathan Baron's "R site search" database. Give it a try, using "epistasis" with the quotes as the keyword.
This should open up a web browser that displays relevant functions, the package you need to download (using the above procedure) to use the function, and a link to the help page for that function.
You can also use ??? as an alias for findFn. Try it like this (use the quotes):
Once you have the sos package installed, type vignette("sos") for more information on how to use various functions in this package.
If you still can't find what you're looking for, check out my previous post on finding help on R, and if all else fails, don't forget about Theresa Scott's free weekly R clinic / Q&A sessions.
Thursday, September 10, 2009
Machine Learning in R
Be sure to check out one of Will's previous post on hierarchical clustering in R.
Revolutions: Machine learning in R, in a nutshell
Wednesday, September 9, 2009
Sync your home directories on ACCRE and the local Linux servers (a.k.a. "the cheeses")
If you use ACCRE to run multi-processor jobs you'll be glad to know that they now allow you to map your home directory to your local desktop using Samba (so you can access your files through My Computer as you normally would with local files). Just submit a help request on their website and they'll get you set up.
Now if you have both your ACCRE home and your home on the cheeses mapped, you can easily sync the files between the two. Download Microsoft's free SyncToy to do the job. It's pretty dead simple to set up, and one click will synchronize files between the two servers.
I didn't want to synchronize everything, so I set it up to only sync directories that contain perl scripts and other programs that I commonly use on both machines. SyncToy also seems pretty useful for backing up your files too.
Microsoft SyncToy
Ask ACCRE to let you map your home
Tuesday, September 8, 2009
Get the full path to a file in Linux / Unix
#!/usr/bin/perl
chomp($pwd=`pwd`);
print "$pwd\n" if @ARGV==0;
foreach (@ARGV) {print "$pwd/$_\n";}
You can copy this from me, just put it in your bin directory, like this:
Make it executable, like this:
chmod +x ~/bin/path
Here it is in action. Let's say I wanted to print out the full path to all the .txt files in the current directory. Call the program with arguments as the files you want to print the path to:
[turnersd@vmps21 replicated]$ ls
parseitnow.pbs
parsing_program.pl
replic.txt
tophits.txt
[turnersd@vmps21 replicated]$ path *.txt
/projects/HDL/epistasis/replicated/replic.txt
/projects/HDL/epistasis/replicated/tophits.txt
Sure, it's only a little bit quicker than typing pwd, copying that, then spelling out the filenames. But if you have long filenames or lots of filenames you want to copy, this should get things done faster. Enjoy.
Friday, September 4, 2009
ClipPath copies filename and path from windows for loading into R
Download the zipfile from the website below (here's a direct link to the zip file). Extract it's contents, right click on ClipPath.inf, and choose install. You can always uninstall later through the control panel.
ClipPath Shell Extension
Thursday, September 3, 2009
GGD posts now printer-friendly
Wednesday, September 2, 2009
PCG Journal Club Articles, 8/28
~Julia
Gurwitz D, Fortier I, Lunshof JE, Knoppers BM. Research ethics: Children and population biobanks. Science. 2009 Aug 14; 325(5942):818-9
Hastings PJ, Lupski JR, Rosenberg SM, Ira G. Mechanisms of change in gene copy number. Nat Rev Genet. 2009 Aug; 10(8):551-64.
Holsinger KE, Weir BS. Genetics in geographically structured populations: defining, estimating and interpreting F(ST). Nat Rev Genet. 2009 Sep; 10(9):639-50.
Koga Y, Pelizzola M, Cheng E, Krauthammer M, Sznol M, Ariyan S, Narayan D, Molinaro AM, Halaban R, Weissman SM. Genome-wide screen of promoter methylation identifies novel markers in melanoma. Genome Res. 2009 Aug; 19(8):1462-70.
Monier A, Pagarete A, de Vargas C, Allen MJ, Read B, Claverie JM, Ogata H. Horizontal gene transfer of an entire metabolic pathway between a eukaryotic alga and its DNA virus. Genome Res. 2009 Aug; 19(8):1441-9.
Raveh-Sadka T, Levo M, Segal E. Incorporating nucleosomes into thermodynamic models of transcription regulation. Genome Res. 2009 Aug; 19(8):1480-96.
Rosenberg NA, Vanliere JM. Replication of genetic associations as pseudoreplication due to shared genealogy. Genet Epidemiol. 2009 Sep; 33(6):479-87.
Schmitz D, Netzer C, Henn W. An offer you can't refuse? Ethical implications of non-invasive prenatal diagnosis. Nat Rev Genet. 2009 Aug; 10(8):515.