Wednesday, December 30, 2009

Use plyr instead of _apply() in R

I've covered plyr once before, showing you how to get means and variances for two quantitative traits across multilocus genotypes. JD Long over at Cerebral Mastication recently posted a nice screencast illustrating how plyr "just works" as an alternative to R's family of apply commands.  There's a set of R functions (apply, sapply, lapply, tapply, eapply, and rapply) that can apply a command or function to your data and return a hopefully useful result.  However, for the non-programmers among us, choosing which apply function to use and how to use it can be mind-bogglingly confusing.  I've never gotten one of these functions to work as I wanted it to the first time around, and I often end up writing loops where the vectorized operation would be much faster.

Enter plyr.

As I mentioned previously, the plyr functions (ddply, in particular), are intuitive, usually returning the result that you wanted. The ddply function splits up your dataset based on one or more grouping variables, applies some function or statistic, and summarizes the results returning a dataframe.

Here's JD Long's screencast showing how plyr makes a task like this easy where the apply function fails.



Cerebral Mastication - Struggling with apply() in R

1 comment:

  1. Thanks for picking up my post. This was my first attempt at an R screencast. I hope to be doing more in the future, so stay tuned. Good job with the blog. I end up reading it twice now that you guys are on the R Bloggers feed! :)

    -JD

    ReplyDelete

Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.