Thursday, January 13, 2011

So long Vanderbilt, and thanks for all the fish!

After finishing the final revisions on my dissertation I was reminded of this spot-on graphical guide to what a Ph.D. is really all about.

Now that I'm finished, I'm leaving Vanderbilt to start a postdoc in genetic epidemiology with Dr. Loic Le Marchand at the University of Hawaii Cancer Center. Posts may be sparse over the next few weeks, but I plan on blogging as usual once I'm set up at my postdoc. Because I won't have the same level of statistical and bioinformatics support in Hawaii that I have now, I'll have much to figure out on my own, so I'll have even more to write about here. But for now, enjoy this Illustrated guide to a Ph.D., reproduced with permission from Matt Might, and follow me on Twitter (@genetics_blog).

...

Imagine a circle that contains all of human knowledge:



By the time you finish elementary school, you know a little:




By the time you finish high school, you know a bit more:




With a bachelor's degree, you gain a specialty:





A master's degree deepens that specialty:






Reading research papers takes you to the edge of human knowledge:



Once you're at the boundary, you focus:




You push at the boundary for a few years:





Until one day, the boundary gives way:







And, that dent you've made is called a Ph.D.:




Of course, the world looks different to you now:




So, don't forget the bigger picture:





Keep pushing!

Monday, January 10, 2011

R function for extracting F-test P-value from linear model object

I thought it would be trivial to extract the p-value on the F-test of a linear regression model (testing the null hypothesis R²=0). If I fit the linear model: fit<-lm(y~x1+x2), I can't seem to find it in names(fit) or summary(fit). But summary(fit)$fstatistic does give you the F statistic, and both degrees of freedom, so I wrote this function to quickly pull out the p-value from this F-test on a lm object, and added it to my R profile. If there's a built-in R function to do this, please comment!

# Function to extract the overall ANOVA p-value out of a linear model object
lmp <- function (modelobject) {
if (class(modelobject) != "lm") stop("Not an object of class 'lm' ")
f <- summary(modelobject)$fstatistic
p <- pf(f[1],f[2],f[3],lower.tail=F)
attributes(p) <- NULL
return(p)
}
# simulate some data
set.seed(42)
n=20
d=data.frame(x1=rbinom(n,2,.5), x2=rbinom(n,2,.5))
d=transform(d, y=x1+x2+rnorm(n))
#fit the linear model
fit=lm(y ~ x1 + x2, data=d)
summary(fit) #shows that the F-test is 0.006641
names(summary(fit)) #can't access that p-value using this!
names(fit) # this doesn't work either
lmp(fit) # uses the above function to capture the F-test p-value.
Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.