Wednesday, July 24, 2013

Archival, Analysis, and Visualization of #ISMBECCB 2013 Tweets

As the 2013 ISMB/ECCB meeting is winding down, I archived and analyzed the 2000+ tweets from the meeting using a set of bash and R scripts I previously blogged about.

The archive of all the tweets tagged #ISMBECCB from July 19-24, 2013 is and will forever remain here on Github. You'll find some R code to parse through this text and run the analyses below in the same repository, explained in more detail in my previous blog post.

Number of tweets by date:


Number of tweets by hour:


Most popular hashtags, other than #ismbeccb. With separate hashtags for each session, this really shows which other SIGs and sessions were well-attended. It also shows the popularity of the unofficial ISMB BINGO card.


Most prolific users. I'm not sure who or what kind of account @sciencstream is - seems like spam to me.


And the obligatory word cloud:


Friday, July 12, 2013

Course Materials from useR! 2013 R/Bioconductor for Analyzing High-Throughput Genomic Data

At last week's 2013 useR! conference in Albacete, Spain, Martin Morgan and Marc Carlson led a course on using R/Bioconductor for analyzing next-gen sequencing data, covering alignment, RNA-seq, ChIP-seq, and sequence annotation using R. The course materials are online here, including R code for running the examples, the PDF vignette tutorial, and the course material itself as a package.



Course Materials from useR! 2013 R/Bioconductor for Analyzing High-Throughput Genomic Data

Tuesday, July 2, 2013

Customize your .Rprofile and Keep Your Workspace Clean

Like your .bashrc, .vimrc, or many other dotfiles you may have in your home directory, your .Rprofile is sourced every time you start an R session. On Mac and Linux, this file is usually located in ~/.Rprofile. On Windows it's buried somewhere in the R program files. Over the years I've grown and pruned my .Rprofile to set various options and define various "utility" functions I use frequently at the interactive prompt.

One of the dangers of defining too many functions in your .Rprofile is that your code becomes less portable, and less reproducible. For example, if I were to define adf() as a shortcut to as.data.frame(), code that I send to other folks using adf() would return errors that the adf object doesn't exist. This is a risk that I'm fully aware of in regards to setting the option stringsAsFactors=FALSE,  but it's a tradeoff I'm willing to accept for convenience. Most of the functions I define here are useful for exploring interactively. In particular, the n() function below is handy for getting a numbered list of all the columns in a data frame; lsp() and lsa() list all functions in a package, and list all objects and classes in the environment, respectively (and were taken from Karthik Ram's .Rprofile); and the o() function opens the current working directory in a new Finder window on my Mac. In addition to a few other functions that are self-explanatory, I also turn off those significance stars, set a default CRAN mirror so it doesn't ask me all the time, and source in the biocLite() function for installing Bioconductor packages (note: this makes R require web access, which might slow down your R initialization).

Finally, you'll notice that I'm creating a new hidden environment, and defining all the functions here as objects in this hidden environment. This allows me to keep my workspace clean, and remove all objects from that workspace without nuking any of these utility functions.

I used to keep my .Rprofile synced across multiple installations using Dropbox, but now I keep all my dotfiles in a single git-versioned directory, symlinked where they need to go (usually ~/). My .Rprofile is below: feel free to steal or adapt however you'd like.

## See http://gettinggeneticsdone.blogspot.com/2013/06/customize-rprofile.html
## Load packages
library(BiocInstaller)
## Don't show those silly significanct stars
options(show.signif.stars=FALSE)
## Do you want to automatically convert strings to factor variables in a data.frame?
## WARNING!!! This makes your code less portable/reproducible.
options(stringsAsFactors=FALSE)
## Get the sqldf package to play nicely on OSX. No longer necessary with R 3.0.0
## From http://stackoverflow.com/questions/8219747/sqldf-package-in-r-querying-a-data-frame
## options(sqldf.driver="SQLite")
# options(gsubfn.engine = "R")
## Don't ask me for my CRAN mirror every time
options("repos" = c(CRAN = "http://cran.rstudio.com/"))
## Create a new invisible environment for all the functions to go in so it doesn't clutter your workspace.
.env <- new.env()
## Returns a logical vector TRUE for elements of X not in Y
.env$"%nin%" <- function(x, y) !(x %in% y)
## Returns names(df) in single column, numbered matrix format.
.env$n <- function(df) matrix(names(df))
## Single character shortcuts for summary() and head().
.env$s <- base::summary
.env$h <- utils::head
## ht==headtail, i.e., show the first and last 10 items of an object
.env$ht <- function(d) rbind(head(d,10),tail(d,10))
## Show the first 5 rows and first 5 columns of a data frame or matrix
.env$hh <- function(d) if(class(d)=="matrix"|class(d)=="data.frame") d[1:5,1:5]
## Read data on clipboard.
.env$read.cb <- function(...) {
ismac <- Sys.info()[1]=="Darwin"
if (!ismac) read.table(file="clipboard", ...)
else read.table(pipe("pbpaste"), ...)
}
## Strip row names from a data frame (stolen from plyr)
.env$unrowname <- function(x) {
rownames(x) <- NULL
x
}
## List objects and classes (from @_inundata, mod by ateucher)
.env$lsa <- function() {
{
obj_type <- function(x) class(get(x, envir = .GlobalEnv)) # define environment
foo = data.frame(sapply(ls(envir = .GlobalEnv), obj_type))
foo$object_name = rownames(foo)
names(foo)[1] = "class"
names(foo)[2] = "object"
return(unrowname(foo))
}
## List all functions in a package (also from @_inundata)
.env$lsp <-function(package, all.names = FALSE, pattern) {
package <- deparse(substitute(package))
ls(
pos = paste("package", package, sep = ":"),
all.names = all.names,
pattern = pattern
)
}
## Open Finder to the current directory on mac
.env$macopen <- function(...) if(Sys.info()[1]=="Darwin") system("open .")
.env$o <- function(...) if(Sys.info()[1]=="Darwin") system("open .")
## Attach all the variables above
attach(.env)
## .First() run at the start of every R session.
## Use to load commonly used packages?
.First <- function() {
# library(ggplot2)
cat("\nSuccessfully loaded .Rprofile at", date(), "\n")
}
## .Last() run at the end of the session
.Last <- function() {
# save command history here?
cat("\nGoodbye at ", date(), "\n")
}
view raw .Rprofile.r hosted with ❤ by GitHub
Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.