Getting Genetics Done: RStudio Conference 2017 Recap

The first ever RStudio conference was held January 11-14, 2017 in Orlando, FL. For anyone else like me who spends hours each working day staring into an RStudio session, the conference was truly excellent. The speaker lineup was diverse and covered lots of areas related to development in R, including the tidyverse, the RStudio IDE, Shiny, htmlwidgets, and authoring with RMarkdown.

This is not a complete list by any means — with split sessions I could only go to half the talks at most. Here are some noncomprehensive notes and links to slides and resources for some of the awesome things are doing with R and RStudio that I learned about at the RStudio Conference.

Hadley Wickham kicked off the meeting with a keynote on doing data science in R. The talk focused on the tidyverse, and the notion of splitting functions into commands that do something, as compared to queries that calculate something, and how it’s generally a good idea to keep these different functionalties contained in their own separate functions. (Contrast this to things like lm that both computes values and does things, like printing those values to the screen, making it difficult to capture (see broom).

I asked Hadley after his talk about strategies to reduce issues getting Bioconductor data structures to play nicely with tidyverse tools. Within minutes David Robinson released a new feature in the fuzzyjoin package that leverages IRanges within this tidyverse-friendly package for efficiently doing things like joining on genomic intervals.

Another #rstudioconf-inspired addition to fuzzyjoin:

genome_join, for overlapping intervals on the same chromosome@genetics_blog #rstats pic.twitter.com/oUctyNYc09
— David Robinson (@drob) January 13, 2017

Charlotte Wickham’s 2-hour purrr tutorial was awesome. Here’s a link to a shared dropbox folder with code, challenges, slides, data, etc. The purrr package is a core package in the tidyverse, and I’ll be replacing many of the base ?apply and plyr ??ply functions that I still use here and there. The map_* functions are integral to working with nested list-columns in dplyr, and I think I’m finally starting to grok how to work with these.

Jenny Bryan gave a great talk on list columns. You can see her slides here. Jenny also put together this excellent tutorial with lots of worked examples and code snippets. And if you need some example list data structures for more practice or for teaching that aren’t foo/bar/iris/mtcars-level boring, see her repurrrsive package. Related to this, for more on list columns and purrr map functions, start reading at the “Many Models” section of Hadley’s R for Data Science book.

Julia Silge, data scientist at Stack Overflow, gave a great introduction to tidy text mining with R. You can read Julia and David’s Tidy Text Mining with R book here online (the book was authored in Rmarkdown using bookdown!).

Andrew Flowers, data journalist and former writer at FiveThirtyEight gave the second day’s keynote address on finding and telling stories using R. He gave a series of examples illustrating six motivating features that make data stories worth telling, along with potential danger inherent to each one:

Novelty (potential danger: triviality)
Outlier (spurious result; see also, p-hacking)
Archetype (oversimplification)
Trend (variance)
Debunking (confirmation bias)
Forecast (overfitting)

Yihui Xie led a two-hour tutorial on advanced RMarkdown. You can see his slides here. The rticles package has LaTeX Journal Article Templates for R Markdown for various journals. The tufte package now supports both PDF and HTML output. See an example here. Yihui’s xaringan package ports the remark.js library for slideshows into R. Careful. Yihui warns that you may not sleep after learning about how cool remark.js is. Yihui showed an early version of the in-development blogdown package that can build blog-aware static websites using the blazing-fast and well-documented Hugo static site generator. Finally, the bookdown package is just awesome. It takes multiple RMarkdown documents as input and renders into multiple output formats (screen-readable ebook, PDF, epub, etc.). It looks great for writing books and technical documentation with pushbutton publishing to multiple output formats with some nice built-in styles out of the box. Some examples:

bookdown.org/yihui/bookdown — The bookdown book, written in RMarkdown with bookdown. (whoa, meta)
r4ds.had.co.nz — Garrett Grolemund and Hadley Wickham’s R for Data Science book.
tidytextmining.com — Julia and David’s book on text mining
moderndive.com — an open-source introductory statistics class textbook

Finally, a few gems from other talks that I jotted down:

Chester Ismay gave a great talk on teaching introductory statistics using R, with the open-source course textbook written in RMarkdown using bookdown.
Bob Rudis talked about using pipes (%>%), and pipes within pipes, and best piping practices. See his slides here.
Hilary Parker talked about the idea of an analysis development, (and analysis developers), drawing similarities to software development/developers. Hilary discussed this once before on the excellent podcast that she and Roger Peng host, and you can probably find it in their Conversations On Data Science ebook that summarize and transcribe these conversations.
Simon Jackson introduced corrr package for exploring and manipulating correlations and correlation matrices in a tidy way.
Gordon Shotwell introduced the easymake package that generates Makefiles from a data frame using R.
Karthik Ram quickly introduced several of the (many) rOpenSci packages related to data publication, data access, scientific literature access, scalable & reproducible computing, databases, visualization, taxonomy, geospatial analysis, and many utility tools for data analysis and manipulation.

With split sessions I missed more than half the talks. Lots of people here are active on Twitter, and you can catch many more notes and tidbits on the #rstudioconf hashtag. The meeting was superbly organized, I learned a ton, and I enjoyed meeting in person many of the folks I follow on Twitter and elsewhere online. A few days of 80-degree weather in mid-January didn’t hurt either. I’ll definitely be coming again next year. Kudos to the rstudio::conf organizers and speakers!

All the talks were recorded and will supposedly find their way to rstudio.com at some point soon. I’ll update this post with a link when that happens.

Update Feb 16, 2017: All the talks have now been posted online here under the rstudio::conf2017 heading.

Day 1:

Day 2:

Getting Genetics Done

This blog has moved!

Saturday, January 14, 2017

RStudio Conference 2017 Recap

1 comment: