Thursday, August 6, 2015

Compiling RMarkdown from a Helper R Script

The problem

I was looking for a way to compile an RMarkdown document and have the filename of the resulting PDF or HTML document contain the name of the input data that it processed. That is, if I compiled the analysis.Rmd file, where in that file it did some analysis and reporting on data001.txt, I’d want the resulting filename to look something like data001.txt.analysis.html. Or even better, to stick in a timestamp with the date, so if the analysis was compiled today, August 6 2015, the resulting filename would be data001.txt.2015-08-06.html. I also wanted to implement the entire solution in R, not relying on fiddly makefiles or scripts that may behave differently depending on the OS/environment.
I found a near-solution as described on this SO post and detailed on this follow-up blog post, but neither really addressed my problem.

The solution

The simplest solution I could come up with involved creating two files:
  1. A .Rmd file that would actually do all the analysis and generate the compiled report.
  2. A second .R script to be used as a config file. Here you’d specify the input data (and potentially other analysis parameters).
By default, when calling rmarkdown::render() from an R script, the environment in which the code chunks are to be evaluated during knitting uses parent.frame() by default, so anything you define in the .R config file will get passed on to the .Rmd that is to be compiled.
Here’s what it looks like in practice.
First, the analysis.Rmd file that actually runs the analysis:
 ---
 title: "Analysis Markdown document"
 author: "Stephen Turner"
 date: "August 6, 2015"
 output: html_document
 ---

 This is the Rmarkdown document that runs the analysis.
 Some narrative text goes here. 
 Maybe we'll do some analysis here. The `infile` variable is passed 
 in from the config script. You could pass in other variables too.

 ```{r}
 # check that you defined infile from the config and that 
 # the file actually exists in the current directory
 stopifnot(exists("infile"))

 stopifnot(file.exists(infile))

 # read in the data
 x = read.table(infile)

 # do some stuff, make a plot, etc.
 result = mean(x$value)
 hist(x$value)
 ```

 Here is some conclusion narrative text. Maybe show some notes:

 - Input file used for this report: `r infile`
 - This report was compiled: `r Sys.Date()`
 - The mean of the `value` column is: `r result`

 Also, never forget to show your...

 ```{r}
 sessionInfo()
 ``` 
And the config.R helper script:
#-------- define the input filename --------#
infile = "data001.txt"
#----- Now just hit the source button! -----#

# check that the input file actually exists!
stopifnot(file.exists(infile))

# create the output filename
outfile = paste(infile, Sys.Date(), "analysis.html", sep=".")

# compile the document
rmarkdown::render(input="analysis.Rmd", output_file=outfile)
All I’d need to now is open up the config.R script, edit the infile variable, and hit the source button in RStudio. This runs the analysis.Rmd as shown above for the input (data001.txt in this example) and saves the resulting compiled report as data001.txt.2015-08-06.analysis.html.
(Crosspost at RPubs).
Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.