Friday, May 22, 2026

Repost: ctrlvee: Extract external R code and insert inline

Reposted from the original at https://blog.stephenturner.us/p/ctrlvee-extract-external-r-code-insert-inline-positron-rstudio-addin. 

Ever find yourself looking through a pkgdown page or a Quarto book, copying and pasting code chunks from your browser into your IDE? I do, and it’s a minor annoyance.1

My friend and colleague VP Nagraj published a new R package called ctrlvee that makes this a lot easier.

It does one thing. Put your cursor anywhere in an R script in Positron or RStudio, call the add-in, provide a URL, and a few milliseconds later you’ll have all the code from that page in your editor, separated by chunk boundaries (along with some metadata and a note to check the license!).

The package README provides a demonstration using the “Data Validation and QA” chapter of my Data Science Team Training book (dstt.stephenturner.us).

  1. Install the package: install.packages("ctrlvee")

  2. Run the add-in. In Positron you’ll open the command palette, search for Run RStudio Addin, then extract external R code and insert inline. You’ll get a modal asking you for a URL.

  3. Paste one in. E.g., https://dstt.stephenturner.us/validation.html

  4. The R code from the website appears in your editor 🚀

Here’s a demo.

Here’s what the extracted/inserted code looks like, from this source.

# -----------------------------------------------------------------
# Chunks fetched by ctrlvee from: https://dstt.stephenturner.us/validation.html
# Strategy: Rendered HTML page
# Date: 2026-05-16 05:14:44
# Chunks: 8
# NOTE: Check the source license before reusing this code.
# -----------------------------------------------------------------

flu <- data.frame(
    week = c(1, 2, 3, 4, 4),
    county = c("Fairfax", "Arlington", NA, "Loudoun", "Loudoun"),
    disease = c("Flu", "Flu", "Flu", "Flu", "Flu"),
    cases = c(23, 41, 18, -5, 12),
    rate = c(2.1, 3.8, 1.6, NA, 1.1)
)

flu

# ---- chunk boundary ----

if (any(flu$cases < 0, na.rm = TRUE)) {
    stop("Negative case counts detected. Inspect raw data before proceeding.")
}

# ---- chunk boundary ----

stopifnot(
    "Negative case counts" = all(flu$cases >= 0, na.rm = TRUE),
    "Missing county values" = !anyNA(flu$county),
    "Duplicate records" = !anyDuplicated(flu[, c("week", "county")])
)

# ---- chunk boundary ----

install.packages("pointblank")

# ---- chunk boundary ----

library(pointblank)

agent <- create_agent(tbl = flu, label = "Weekly flu surveillance") |>
    col_vals_gte(
        columns = cases,
        value = 0,
        label = "Case counts must be non-negative"
    ) |>
    col_vals_not_null(
        columns = c(week, county),
        label = "Week and county cannot be missing"
    ) |>
    rows_distinct(
        columns = c(week, county),
        label = "No duplicate week/county records"
    ) |>
    interrogate()

agent

# ---- chunk boundary ----

create_agent(tbl = flu, label = "Weekly flu surveillance — extended") |>
    col_is_numeric(
        columns = c(cases, rate),
        label = "Case count and rate must be numeric"
    ) |>
    col_vals_in_set(
        columns = disease,
        set = c("Flu", "COVID-19", "RSV"),
        label = "Disease must be from the approved list"
    ) |>
    col_vals_between(
        columns = week,
        left = 1,
        right = 52,
        label = "Week must be between 1 and 52"
    ) |>
    col_vals_gte(
        columns = rate,
        value = 0,
        na_pass = TRUE,
        label = "Rate must be non-negative (NAs allowed)"
    ) |>
    interrogate()

# ---- chunk boundary ----

if (!all_passed(agent)) {
    stop("Data validation failed. Review the agent report before proceeding.")
}

# ---- chunk boundary ----

library(readr)
library(pointblank)

flu <- read_csv("data/flu-2024.csv")

# Validate immediately after reading
agent <- create_agent(tbl = flu, label = "flu-2024 validation") |>
    col_vals_gte(columns = cases, value = 0, label = "No negative counts") |>
    col_vals_not_null(columns = c(week, county), label = "No missing keys") |>
    rows_distinct(columns = c(week, county), label = "No duplicate records") |>
    interrogate()

if (!all_passed(agent)) {
    stop("Validation failed — see agent report above.")
}

 

Wednesday, February 4, 2026

Repost: OpenAI Codex App: Vibe-updating my old qqman R package to ggplot2 with plan+execute.

Reposted from the original at https://blog.stephenturner.us/p/openai-codex-app-qqman.

Earlier this week I reposted a Bluesky post from Ethan Mollick. I’m on the same page here — there are problems for sure, but these tools have irreversibly changed the practice of software development (and IMHO, for the better on balance).

Claude Code has gotten everyone excited recently (including me), even going so far as to make it into the popular press (see the Atlantic article on CC). Codex has been around for a while, but this looks like OpenAI’s attempt to challenge CC’s position.

The new Codex desktop app

This week OpenAI launched a dedicated Codex desktop app positioned as a focused UI for running multiple agents in parallel, keeping changes isolated via built-in worktrees, and extending behavior with skills and scheduled automations. Like the demo I highlighted in my previous post on Claude Code, Codex uses worktrees to isolate tasks/PRs as primitives for conflict resolution, has a plan mode (/plan) to force upfront decomposition and questions, can use skills as reusable bundles that can connect to external services, and has automations for recurring background jobs.

Download it at openai.com/codex. It’s free to use for some unspecified time. Or works with your Plus/Enterprise/Edu/whatever account.

Demo: updating qqman to use ggplot2

My most highly cited paper is about some R code I wrote 15 years ago, and later turned into a package, for creating Manhattan Plots from GWAS data. The paper in JOSS has been cited ~1000 times, and the preprint another ~2100 times.

Turner, S. D. (2018). qqman: an R package for visualizing GWAS results using Q-Q and manhattan plots. Journal of Open Source Software, 3(25), 731. https://doi.org/10.21105/joss.00731.

I wrote the original code using ggplot2, then refactored everything to use base R, because at the time, at least with how I wrote the package, plotting millions of points with ggplot2 as agonizingly slow.

I’ve gotten lots of feature requests for qqman, and I don’t have time to respond to them unfortunately. Many of these feature requests would be easy to implement if the package did in fact use ggplot2, since it’s been so long since I’ve done anything in base R that I’ve forgotten how to do pretty much anything.

So I fired up the Codex app, put it into “plan” mode, and asked for help. Along the way it asks me questions about my preferences.

Here’s a screenshot of the plan (truncated).

#YOLO, implement the plan.

When I rebuilt the package, there were a few of those “no visible global binding for global variable…” notes that I had to fix, but otherwise everything worked. Everything now uses ggplot2. The function returns a ggplot object. Column names are no longer as strict. It’s fast. It uses ggrepel to label points so that they don’t overlap. Documentation was updated. The README was updated. The vignettes were updated. It went off and did this in about 5 minutes.

Here’s one of the plots that comes out of this.

I’m not going to update the package, because I still don’t have time to maintain it or respond to feature requests. But that shouldn’t stop you if there’s something you want to do with qqman or any other open-source R/Python package that the package doesn’t currently do.

 

 

Thursday, October 2, 2025

Repost: Construct objects with idiomatic R code

Reposted from the original at https://blog.stephenturner.us/p/construct-objects-with-idiomatic-r-code

---

Today I discovered the constructive package and the construct() function for creating R objects with idiomatic R code to make human-readable reproducible examples.

Imagine you want to create a reproducible example and you need to create an object you can share somewhere like StackOverflow. Here let’s take the starwars data that ships with dplyr, and get just the first few rows, and only a few select columns.

library(dplyr)

swpartial <- 
  starwars |> 
  head(4) |> 
  select(name, species, films)

swpartial

This partial dataset has 4 rows, 3 columns, and one of those is a list-column.

# A tibble: 4 × 3
  name           species films    
  <chr>          <chr>   <list>   
1 Luke Skywalker Human   <chr [5]>
2 C-3PO          Droid   <chr [6]>
3 R2-D2          Droid   <chr [7]>
4 Darth Vader    Human   <chr [4]>

The built-in dput() function gives you the code to recreate this object:

dput(swpartial)

But the output is hardly human readable:

structure(list(name = c("Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader"
), species = c("Human", "Droid", "Droid", "Human"), films = list(
    c("A New Hope", "The Empire Strikes Back", "Return of the Jedi", 
    "Revenge of the Sith", "The Force Awakens"), c("A New Hope", 
    "The Empire Strikes Back", "Return of the Jedi", "The Phantom Menace", 
    "Attack of the Clones", "Revenge of the Sith"), c("A New Hope", 
    "The Empire Strikes Back", "Return of the Jedi", "The Phantom Menace", 
    "Attack of the Clones", "Revenge of the Sith", "The Force Awakens"
    ), c("A New Hope", "The Empire Strikes Back", "Return of the Jedi", 
    "Revenge of the Sith"))), row.names = c(NA, -4L), class = c("tbl_df", 
"tbl", "data.frame"))

Alternatively, load the constructive package and use the construct() function:

library(constructive)
construct(swpartial)

The code you’ll get will create an identical object, but it’s much more human readable, and it’s probably what you would actually write out if you were constructing this object manually.

tibble::tibble(
  name = c("Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader"),
  species = c("Human", "Droid", "Droid", "Human"),
  films = list(
    c(
      "A New Hope", 
      "The Empire Strikes Back", 
      "Return of the Jedi",
      "Revenge of the Sith", 
      "The Force Awakens"
    ),
    c(
      "A New Hope",
      "The Empire Strikes Back",
      "Return of the Jedi",
      "The Phantom Menace",
      "Attack of the Clones",
      "Revenge of the Sith"
    ),
    c(
      "A New Hope",
      "The Empire Strikes Back",
      "Return of the Jedi",
      "The Phantom Menace",
      "Attack of the Clones",
      "Revenge of the Sith",
      "The Force Awakens"
    ),
    c(
      "A New Hope",
      "The Empire Strikes Back",
      "Return of the Jedi",
      "Revenge of the Sith"
    )
  )
)

There are lots of other features and use cases. See the documentation for details.

 

 

Tuesday, September 9, 2025

Repost: Make your development environment portable and reproducible

Reposted from the original at https://blog.stephenturner.us/p/development-environment-portable-reproducible.

You upgrade your old Intel Macbook Pro for a new M4 MBP. You’re setting up a new cloud VM on AWS after migrating away from GCP. You get an account on your institution’s new HPC. You have everything just so in your development environment, and now you have to remember how to set everything up again.

I just started a new position, and I’m doing this right now.

Setting up a reproducible and portable development environment that works seamlessly across different machines and cloud platforms can save you time and headaches. These are a few of the strategies I use1 to quickly reproduce my development environment across machines.

  1. Dotfiles in a GitHub repo

  2. New VM setup script in a GitHub repo

  3. R “verse” package on GitHub

  4. Dev containers in VS Code

Keep your dotfiles in a private GitHub repo

Dotfiles are the hidden configuration files in your home directory. Examples include .vimrc for Vim, .tmux.conf for tmux, or .bashrc for your shell environment. I have a long list of aliases and little bash functions in a .aliases.sh file that my .bashrc sources. I also have a .dircolors, global .gitignore, a .gitconfig, and a minimal .Rprofile.

Keeping these files in a GitHub repository makes it easy to quickly reproduce your development environment on another machine. If you search GitHub for “dotfiles” or look at the awesome-dotfiles repo, you’ll see many people keep their dotfiles in a public repo. I use a private repo, because I’m too scared I might accidentally commit secrets, such as API tokens in my .Renviron or PyPI credentials in .pypirc.

Whenever you get a new machine or VM, getting things set up is easy:

# Your private dotfiles repo
git clone https://github.com/<yourusername>/dotfiles
cd ~/dotfiles

# A script to symlink things to your home
./install.sh  

Keep a fresh cloud VM setup script

I started playing with computers in the 1990s. I’ve experienced enough hard drive failures, random BSODs, and other critical failures, that I treat my computer as if it could spontaneously combust at any moment and I could immediately lose all of my unsaved, un-backed-up work at any moment. I treat my cloud VMs the same way, as if they’re disposable (many times they are disposable, by design).

Imagine you launch a new cloud VM starting from a clean Ubuntu image. Now you need all the tools you use every day on this machine - vim, tmux, RStudio, conda, Docker, gcloud/gsutil, etc. Additionally, while I use conda to create virtual environments for installing tools for specific tasks, there are some domain-specific tools I use so often every day for exploratory analysis that I actually prefer having a local installation on the machine — things like bedtools, seqtk, samtools, bcftools, fastp, Nextflow, and a few others — instead of having to load a conda environment or use Docker every time I want to do something simple.

I keep a script on GitHub that will install all the software I need on a fresh VM. Here’s an example setup script I use as a GitHub gist.

I know this isn’t completely Reproducible™ in the sense that a Docker container might be, because I’m not controlling the version of every tool and library I’m installing, but it’s good enough to get me up and running for development and interactive data analysis and exploration.

R: Custom “verse” package on GitHub

The tidyverse is probably the best known meta-package that installs lots of other packages for data science. Take a look at the tidyverse package DESCRIPTION file. When you run install.packages("tidyverse"), it will install all the packages listed in the Imports field, including dplyr, tidyr, purrr, ggplot2, and others.

You can use this pattern to create your own “verse” package that installs all your favorite packages. This is helpful for setting up a new machine, or re-installing all the R packages you use whenever you upgrade to a new major version of R.

Take a look at my Tverse package on GitHub at github.com/stephenturner/Tverse, specifically at the DESCRIPTION file. In the Imports field I include all the packages I know I’ll use routinely. Note that this also includes several Bioconductor packages (which requires including the biocViews: directive in the DESCRIPTION), as well as one of my favorite packages, breakerofchains, that is only available from GitHub (requiring the Remotes: entry).

Once this package is pushed to GitHub I can easily install all those packages and their dependencies:

devtools::install("stephenturner/Tverse")

Dev containers in VS Code

Development containers (dev containers) allow you to create and use consistent development environments using Docker containers. It allows you to open any folder inside (or mounted into) a container and take advantage of Visual Studio Code's full feature set. This is particularly useful when working with teams or switching between projects with different dependencies.

The dev container docs and tutorial are both good places to start. You’ll need to have Docker running, and install the Dev Containers VS Code extension.

From Microsoft’s documentation:

Workspace files are mounted from the local file system or copied or cloned into the container. Extensions are installed and run inside the container, where they have full access to the tools, platform, and file system. This means that you can seamlessly switch your entire development environment just by connecting to a different container.

Container Architecture
From Microsoft’s dev containers documentation.

Using a dev container template

You can use any pre-built dev container templates available on registries like Docker Hub or Microsoft’s container registry. Here’s an example using Rocker with R version 4.4.1, and adds a few extensions to VS Code running in the container. You could also create your own container for development, put that on Docker Hub, then use that image.

{
    "image": "rocker/r-ver:4.4.1",
    "customizations": {
        "vscode": {
            "extensions": [
                "REditorSupport.r",
                "ms-vscode-remote.remote-containers"
            ]
        }
    }
}

Using a custom Dockerfile

You can use a custom Dockerfile to create your dev container. First, create a .devcontainer/ directory in your project with a Dockerfile and a devcontainer.json file. Define your development environment in the Dockerfile (base image, installed packages and configuration). In the JSON replace the image property with build and dockerfile properties:

{
    "build": {
        "dockerfile": "Dockerfile"
    }
}

Start VS Code running the container

After you create your devcontainer.json file (either from a template or completely custom), open the folder in the container using the command palette:

And prove to yourself that your VS Code environment is indeed using the container (I’m using rocker R 4.4.1 here). Running whoami shows I’m root inside the container (not my own username), and I’m indeed running R version 4.4.1.

1

Vagrant and Ansible are powerful tools for managing development environments and automating configurations. Vagrant allows you to create and configure lightweight, reproducible, and portable virtual environments, while Ansible automates complex system setups and deployments across multiple machines. However, they can be overkill for simple or personal development environments, so I'm focusing on lighter, more straightforward solutions.

 

Using OpenAI Codex in Positron

 Reposted from the original at https://blog.stephenturner.us/p/codex-positron.

Last month I wrote about agentic coding in Positron using Positron assistant, which uses the Claude API on the back end.

Yesterday OpenAI announced a series of updates to Codex, the biggest being an IDE extension to allow you to use Codex in VS Code, Cursor, Windsurf, etc. More details at developers.openai.com/codex. And Codex is available in the Open VSX Registry, meaning you can install it in Positron.

Demo: creating an R package with Codex

I tried doing the same thing here with Codex as I did with Positron Assistant in the previous post. I used usethis::create_package() to give me a basic package skeleton, then I fired up Positron, hit the Codex extension in the side panel, and gave it a simple prompt.

write a simple function in this R package to reverse complement a DNA sequence (i.e. A>T, C>G, G>C, T>A). Document it with Roxygen, and write unit tests with testthat. Do not add any external package dependencies other than testthat.

Then I sat back and watched it work.

As you can see, after running devtools::document() and devtools::test(), my tests failed. I asked Codex to fix those tests. I had to do this twice, and the second time around it’s running those tests locally and diagnosing what’s happening.

The third time around all my tests pass.

And devtools::check() yields no errors, warnings, or notes.

The code is on the same GitHub repo, on the codex branch.

Why Codex instead of Positron Assistant?

I haven’t used either agent enough to know their failure modes, and which might be better in certain circumstances. As of last week, GPT-5 seems to outperform Claude for writing R code, and Codex uses GPT-5 under the hood.

Another factor might be cost. Instead of using API credits, Codex uses your existing ChatGPT Plus, Team, Pro, Edu, or Enterprise subscription. In my post on Positron Assistant I showed that the entire package development experiment (admittedly simple) cost about $0.09 cents. But if you’re relying on this daily and using it for heavier tasks, you might run up a decent bill. If you’re already paying $20/month for ChatGPT Plus, using Codex doesn’t cost you any more.

Finally, there’s the original selling point behind Codex before it was ever available in an IDE: You can wire up Codex to your GitHub account and ask Codex to read, write, and execute code in your repositories to answer questions or draft PRs. I haven’t tried this yet, but you can read more at developers.openai.com/codex/cloud.

 

Positron Assistant: GitHub Copilot and Claude-Powered Agentic Coding in R

Reposted from the original at https://blog.stephenturner.us/p/positron-assistant-copilot-chat-agent 

I have a little hobby project I’m working on and I wanted to use the opportunity to fully make the switch to Positron from RStudio. I used Positron here and there when it first came out, but now that it’s out of beta and has a more complete feature set (like remote SSH sessions!) I have everything I need to switch and not look back. The most exciting new addition is the new Positron Assistant.

Positron Assistant

I wrote a post last year about AI code completion in Positron. GitHub copilot wouldn’t work in Positron at the time so I tried out Codeium, Tabnine, and Continue.

Using a third-party plugin is no longer necessary. One of the more exciting new features in Positron is Positron Assistant.1 From the description:

Positron Assistant is an AI client that provides LLM integration within Positron, both for chat and for inline completions. Use Positron Assistant to generate or refactor code, ask questions, get help with debugging, and get suggestions for next steps in your data science projects.

Positron Assistant allows you to use GitHub Copilot for inline code completions, and Anthropic Claude for chat and agent mode. The documentation has instructions for getting this set up so I won’t go into those details. You make a configuration change in Positron, then sign into your GitHub account with OAuth, and put in your Anthropic API key, and you’re off to the races.

Cmd-Shift P to bring up the command pallette in Positron, then search for “Positron Assistant: Configure Language Model Providers.”

Code completion with GitHub Copilot

This isn’t anything new. GitHub Copilot has been available in VSCode and RStudio for years. But it’s nice to have it available in Positron now.

Here’s a demo where I’m starting with a blank R script, and write comments in the code describing what I want, then let Copilot take it away as I just hit the tab key to accept the suggestions. Here I’m asking for a function to reverse complement a DNA sequence. Here’s the code it produced.

Agent mode to create an R package

When Positron first came out I wrote about using it for R package development.

I wanted to try out Positron Assistant’s agent mode to see how it works with R packages. Cursor and Claude Code seem to be all the rage on all the tech podcasts, Twitter feeds, and blogs I follow, but I’ve been reluctant to switch IDEs (or in the case of Claude Code, ditching the IDE altogether).

Activate the Assistant in Positron’s sidebar, then select Agent mode.

I started up a fresh R session and ran usethis::create_package() to create a blank package. This just creates the bare minimum (DESCRIPTION, NAMESPACE, etc.) needed for a skeleton R package. Then I activated Positron Assistant in agent mode, asked it to write a function in the package to reverse complement a DNA sequence, document it with Roxygen, and write unit tests with testthat.

It’s fun to sit back and watch the agent work. It scans the directory structure, finds the R version, creates the function, writes the documentation, writes the tests, then presents a model asking me whether I want to run the tests that it just created. It wrote everything in one shot with all tests passing and no errors on devtools::check().

Everything you see here cost $0.09 cents using the Claude 4 Sonnet API.2

The one thing I had to fix was the License field in the DESCRIPTION file with a simple usethis::use_mit_license(). The default for this field came in from usethis::create_package() and was simply boilerplate telling me that I needed to choose a license. Once I fixed this all tests passed, and the package check came out clean with 0 errors, warnings, or notes. I uploaded the package here on GitHub.

View the package code on GitHub

It was honestly pretty mesmerizing to sit back and watch the agent do its thing, inspecting the environment, writing code, docs, and tests.

Obviously this was a simple greenfield example, and I’d be curious to see how the agent handles larger codebases with complex dependencies and newer coding paradigms (like R’s new S7 OOP system) that won’t have good training data from Stack Overflow or elsewhere.

1

At the time I’m writing this (July 2025) Positron Assistant is still in preview, meaning that features might change by the time you’re reading this. For instance, currently only GitHub Copilot is available for inline code completions, and only Anthropic Claude is available for chat and agent mode. I’m sure both of these will expand in the near future to allow for other model providers (although Claude 4 consistently ranks at the top for R coding capabilities).

2

So many people I’ve talked to have no issue paying $20/month for ChatGPT Plus or Claude Pro, but are reluctant to buy API credits. I’m not sure how to rationalize this. I think there might be a misunderstanding that it works like AWS, where you put in a credit card and could accidentally rack up a huge bill. It doesn’t work like this. It’s a prepaid service. I put $5 on my account months ago just to experiment around a bit and I still haven’t used it all. You can set rate limits and email notifications on your API keys if you’re worried about spending more than a few pennies trying out something like what you see in this post.

 

Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.