Tuesday, September 9, 2025

Repost: Make your development environment portable and reproducible

Reposted from the original at https://blog.stephenturner.us/p/development-environment-portable-reproducible.

You upgrade your old Intel Macbook Pro for a new M4 MBP. You’re setting up a new cloud VM on AWS after migrating away from GCP. You get an account on your institution’s new HPC. You have everything just so in your development environment, and now you have to remember how to set everything up again.

I just started a new position, and I’m doing this right now.

Setting up a reproducible and portable development environment that works seamlessly across different machines and cloud platforms can save you time and headaches. These are a few of the strategies I use1 to quickly reproduce my development environment across machines.

  1. Dotfiles in a GitHub repo

  2. New VM setup script in a GitHub repo

  3. R “verse” package on GitHub

  4. Dev containers in VS Code

Keep your dotfiles in a private GitHub repo

Dotfiles are the hidden configuration files in your home directory. Examples include .vimrc for Vim, .tmux.conf for tmux, or .bashrc for your shell environment. I have a long list of aliases and little bash functions in a .aliases.sh file that my .bashrc sources. I also have a .dircolors, global .gitignore, a .gitconfig, and a minimal .Rprofile.

Keeping these files in a GitHub repository makes it easy to quickly reproduce your development environment on another machine. If you search GitHub for “dotfiles” or look at the awesome-dotfiles repo, you’ll see many people keep their dotfiles in a public repo. I use a private repo, because I’m too scared I might accidentally commit secrets, such as API tokens in my .Renviron or PyPI credentials in .pypirc.

Whenever you get a new machine or VM, getting things set up is easy:

# Your private dotfiles repo
git clone https://github.com/<yourusername>/dotfiles
cd ~/dotfiles

# A script to symlink things to your home
./install.sh  

Keep a fresh cloud VM setup script

I started playing with computers in the 1990s. I’ve experienced enough hard drive failures, random BSODs, and other critical failures, that I treat my computer as if it could spontaneously combust at any moment and I could immediately lose all of my unsaved, un-backed-up work at any moment. I treat my cloud VMs the same way, as if they’re disposable (many times they are disposable, by design).

Imagine you launch a new cloud VM starting from a clean Ubuntu image. Now you need all the tools you use every day on this machine - vim, tmux, RStudio, conda, Docker, gcloud/gsutil, etc. Additionally, while I use conda to create virtual environments for installing tools for specific tasks, there are some domain-specific tools I use so often every day for exploratory analysis that I actually prefer having a local installation on the machine — things like bedtools, seqtk, samtools, bcftools, fastp, Nextflow, and a few others — instead of having to load a conda environment or use Docker every time I want to do something simple.

I keep a script on GitHub that will install all the software I need on a fresh VM. Here’s an example setup script I use as a GitHub gist.

I know this isn’t completely Reproducible™ in the sense that a Docker container might be, because I’m not controlling the version of every tool and library I’m installing, but it’s good enough to get me up and running for development and interactive data analysis and exploration.

R: Custom “verse” package on GitHub

The tidyverse is probably the best known meta-package that installs lots of other packages for data science. Take a look at the tidyverse package DESCRIPTION file. When you run install.packages("tidyverse"), it will install all the packages listed in the Imports field, including dplyr, tidyr, purrr, ggplot2, and others.

You can use this pattern to create your own “verse” package that installs all your favorite packages. This is helpful for setting up a new machine, or re-installing all the R packages you use whenever you upgrade to a new major version of R.

Take a look at my Tverse package on GitHub at github.com/stephenturner/Tverse, specifically at the DESCRIPTION file. In the Imports field I include all the packages I know I’ll use routinely. Note that this also includes several Bioconductor packages (which requires including the biocViews: directive in the DESCRIPTION), as well as one of my favorite packages, breakerofchains, that is only available from GitHub (requiring the Remotes: entry).

Once this package is pushed to GitHub I can easily install all those packages and their dependencies:

devtools::install("stephenturner/Tverse")

Dev containers in VS Code

Development containers (dev containers) allow you to create and use consistent development environments using Docker containers. It allows you to open any folder inside (or mounted into) a container and take advantage of Visual Studio Code's full feature set. This is particularly useful when working with teams or switching between projects with different dependencies.

The dev container docs and tutorial are both good places to start. You’ll need to have Docker running, and install the Dev Containers VS Code extension.

From Microsoft’s documentation:

Workspace files are mounted from the local file system or copied or cloned into the container. Extensions are installed and run inside the container, where they have full access to the tools, platform, and file system. This means that you can seamlessly switch your entire development environment just by connecting to a different container.

Container Architecture
From Microsoft’s dev containers documentation.

Using a dev container template

You can use any pre-built dev container templates available on registries like Docker Hub or Microsoft’s container registry. Here’s an example using Rocker with R version 4.4.1, and adds a few extensions to VS Code running in the container. You could also create your own container for development, put that on Docker Hub, then use that image.

{
    "image": "rocker/r-ver:4.4.1",
    "customizations": {
        "vscode": {
            "extensions": [
                "REditorSupport.r",
                "ms-vscode-remote.remote-containers"
            ]
        }
    }
}

Using a custom Dockerfile

You can use a custom Dockerfile to create your dev container. First, create a .devcontainer/ directory in your project with a Dockerfile and a devcontainer.json file. Define your development environment in the Dockerfile (base image, installed packages and configuration). In the JSON replace the image property with build and dockerfile properties:

{
    "build": {
        "dockerfile": "Dockerfile"
    }
}

Start VS Code running the container

After you create your devcontainer.json file (either from a template or completely custom), open the folder in the container using the command palette:

And prove to yourself that your VS Code environment is indeed using the container (I’m using rocker R 4.4.1 here). Running whoami shows I’m root inside the container (not my own username), and I’m indeed running R version 4.4.1.

1

Vagrant and Ansible are powerful tools for managing development environments and automating configurations. Vagrant allows you to create and configure lightweight, reproducible, and portable virtual environments, while Ansible automates complex system setups and deployments across multiple machines. However, they can be overkill for simple or personal development environments, so I'm focusing on lighter, more straightforward solutions.

 

Using OpenAI Codex in Positron

 Reposted from the original at https://blog.stephenturner.us/p/codex-positron.

Last month I wrote about agentic coding in Positron using Positron assistant, which uses the Claude API on the back end.

Yesterday OpenAI announced a series of updates to Codex, the biggest being an IDE extension to allow you to use Codex in VS Code, Cursor, Windsurf, etc. More details at developers.openai.com/codex. And Codex is available in the Open VSX Registry, meaning you can install it in Positron.

Demo: creating an R package with Codex

I tried doing the same thing here with Codex as I did with Positron Assistant in the previous post. I used usethis::create_package() to give me a basic package skeleton, then I fired up Positron, hit the Codex extension in the side panel, and gave it a simple prompt.

write a simple function in this R package to reverse complement a DNA sequence (i.e. A>T, C>G, G>C, T>A). Document it with Roxygen, and write unit tests with testthat. Do not add any external package dependencies other than testthat.

Then I sat back and watched it work.

As you can see, after running devtools::document() and devtools::test(), my tests failed. I asked Codex to fix those tests. I had to do this twice, and the second time around it’s running those tests locally and diagnosing what’s happening.

The third time around all my tests pass.

And devtools::check() yields no errors, warnings, or notes.

The code is on the same GitHub repo, on the codex branch.

Why Codex instead of Positron Assistant?

I haven’t used either agent enough to know their failure modes, and which might be better in certain circumstances. As of last week, GPT-5 seems to outperform Claude for writing R code, and Codex uses GPT-5 under the hood.

Another factor might be cost. Instead of using API credits, Codex uses your existing ChatGPT Plus, Team, Pro, Edu, or Enterprise subscription. In my post on Positron Assistant I showed that the entire package development experiment (admittedly simple) cost about $0.09 cents. But if you’re relying on this daily and using it for heavier tasks, you might run up a decent bill. If you’re already paying $20/month for ChatGPT Plus, using Codex doesn’t cost you any more.

Finally, there’s the original selling point behind Codex before it was ever available in an IDE: You can wire up Codex to your GitHub account and ask Codex to read, write, and execute code in your repositories to answer questions or draft PRs. I haven’t tried this yet, but you can read more at developers.openai.com/codex/cloud.

 

Positron Assistant: GitHub Copilot and Claude-Powered Agentic Coding in R

Reposted from the original at https://blog.stephenturner.us/p/positron-assistant-copilot-chat-agent 

I have a little hobby project I’m working on and I wanted to use the opportunity to fully make the switch to Positron from RStudio. I used Positron here and there when it first came out, but now that it’s out of beta and has a more complete feature set (like remote SSH sessions!) I have everything I need to switch and not look back. The most exciting new addition is the new Positron Assistant.

Positron Assistant

I wrote a post last year about AI code completion in Positron. GitHub copilot wouldn’t work in Positron at the time so I tried out Codeium, Tabnine, and Continue.

Using a third-party plugin is no longer necessary. One of the more exciting new features in Positron is Positron Assistant.1 From the description:

Positron Assistant is an AI client that provides LLM integration within Positron, both for chat and for inline completions. Use Positron Assistant to generate or refactor code, ask questions, get help with debugging, and get suggestions for next steps in your data science projects.

Positron Assistant allows you to use GitHub Copilot for inline code completions, and Anthropic Claude for chat and agent mode. The documentation has instructions for getting this set up so I won’t go into those details. You make a configuration change in Positron, then sign into your GitHub account with OAuth, and put in your Anthropic API key, and you’re off to the races.

Cmd-Shift P to bring up the command pallette in Positron, then search for “Positron Assistant: Configure Language Model Providers.”

Code completion with GitHub Copilot

This isn’t anything new. GitHub Copilot has been available in VSCode and RStudio for years. But it’s nice to have it available in Positron now.

Here’s a demo where I’m starting with a blank R script, and write comments in the code describing what I want, then let Copilot take it away as I just hit the tab key to accept the suggestions. Here I’m asking for a function to reverse complement a DNA sequence. Here’s the code it produced.

Agent mode to create an R package

When Positron first came out I wrote about using it for R package development.

I wanted to try out Positron Assistant’s agent mode to see how it works with R packages. Cursor and Claude Code seem to be all the rage on all the tech podcasts, Twitter feeds, and blogs I follow, but I’ve been reluctant to switch IDEs (or in the case of Claude Code, ditching the IDE altogether).

Activate the Assistant in Positron’s sidebar, then select Agent mode.

I started up a fresh R session and ran usethis::create_package() to create a blank package. This just creates the bare minimum (DESCRIPTION, NAMESPACE, etc.) needed for a skeleton R package. Then I activated Positron Assistant in agent mode, asked it to write a function in the package to reverse complement a DNA sequence, document it with Roxygen, and write unit tests with testthat.

It’s fun to sit back and watch the agent work. It scans the directory structure, finds the R version, creates the function, writes the documentation, writes the tests, then presents a model asking me whether I want to run the tests that it just created. It wrote everything in one shot with all tests passing and no errors on devtools::check().

Everything you see here cost $0.09 cents using the Claude 4 Sonnet API.2

The one thing I had to fix was the License field in the DESCRIPTION file with a simple usethis::use_mit_license(). The default for this field came in from usethis::create_package() and was simply boilerplate telling me that I needed to choose a license. Once I fixed this all tests passed, and the package check came out clean with 0 errors, warnings, or notes. I uploaded the package here on GitHub.

View the package code on GitHub

It was honestly pretty mesmerizing to sit back and watch the agent do its thing, inspecting the environment, writing code, docs, and tests.

Obviously this was a simple greenfield example, and I’d be curious to see how the agent handles larger codebases with complex dependencies and newer coding paradigms (like R’s new S7 OOP system) that won’t have good training data from Stack Overflow or elsewhere.

1

At the time I’m writing this (July 2025) Positron Assistant is still in preview, meaning that features might change by the time you’re reading this. For instance, currently only GitHub Copilot is available for inline code completions, and only Anthropic Claude is available for chat and agent mode. I’m sure both of these will expand in the near future to allow for other model providers (although Claude 4 consistently ranks at the top for R coding capabilities).

2

So many people I’ve talked to have no issue paying $20/month for ChatGPT Plus or Claude Pro, but are reluctant to buy API credits. I’m not sure how to rationalize this. I think there might be a misunderstanding that it works like AWS, where you put in a credit card and could accidentally rack up a huge bill. It doesn’t work like this. It’s a prepaid service. I put $5 on my account months ago just to experiment around a bit and I still haven’t used it all. You can set rate limits and email notifications on your API keys if you’re worried about spending more than a few pennies trying out something like what you see in this post.

 

Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.