Getting Genetics Done: 2025

Thursday, October 2, 2025

Repost: Construct objects with idiomatic R code

Reposted from the original at https://blog.stephenturner.us/p/construct-objects-with-idiomatic-r-code

---

Today I discovered the constructive package and the construct() function for creating R objects with idiomatic R code to make human-readable reproducible examples.

CRAN: https://cran.r-project.org/package=constructive
Source: https://github.com/cynkra/constructive/
Docs & vignettes: https://cynkra.github.io/constructive/

Imagine you want to create a reproducible example and you need to create an object you can share somewhere like StackOverflow. Here let’s take the starwars data that ships with dplyr, and get just the first few rows, and only a few select columns.

library(dplyr)

swpartial <- 
  starwars |> 
  head(4) |> 
  select(name, species, films)

swpartial

This partial dataset has 4 rows, 3 columns, and one of those is a list-column.

# A tibble: 4 × 3
  name           species films    
  <chr>          <chr>   <list>   
1 Luke Skywalker Human   <chr [5]>
2 C-3PO          Droid   <chr [6]>
3 R2-D2          Droid   <chr [7]>
4 Darth Vader    Human   <chr [4]>

The built-in dput() function gives you the code to recreate this object:

dput(swpartial)

But the output is hardly human readable:

structure(list(name = c("Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader"
), species = c("Human", "Droid", "Droid", "Human"), films = list(
    c("A New Hope", "The Empire Strikes Back", "Return of the Jedi", 
    "Revenge of the Sith", "The Force Awakens"), c("A New Hope", 
    "The Empire Strikes Back", "Return of the Jedi", "The Phantom Menace", 
    "Attack of the Clones", "Revenge of the Sith"), c("A New Hope", 
    "The Empire Strikes Back", "Return of the Jedi", "The Phantom Menace", 
    "Attack of the Clones", "Revenge of the Sith", "The Force Awakens"
    ), c("A New Hope", "The Empire Strikes Back", "Return of the Jedi", 
    "Revenge of the Sith"))), row.names = c(NA, -4L), class = c("tbl_df", 
"tbl", "data.frame"))

Alternatively, load the constructive package and use the construct() function:

library(constructive)
construct(swpartial)

The code you’ll get will create an identical object, but it’s much more human readable, and it’s probably what you would actually write out if you were constructing this object manually.

tibble::tibble(
  name = c("Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader"),
  species = c("Human", "Droid", "Droid", "Human"),
  films = list(
    c(
      "A New Hope", 
      "The Empire Strikes Back", 
      "Return of the Jedi",
      "Revenge of the Sith", 
      "The Force Awakens"
    ),
    c(
      "A New Hope",
      "The Empire Strikes Back",
      "Return of the Jedi",
      "The Phantom Menace",
      "Attack of the Clones",
      "Revenge of the Sith"
    ),
    c(
      "A New Hope",
      "The Empire Strikes Back",
      "Return of the Jedi",
      "The Phantom Menace",
      "Attack of the Clones",
      "Revenge of the Sith",
      "The Force Awakens"
    ),
    c(
      "A New Hope",
      "The Empire Strikes Back",
      "Return of the Jedi",
      "Revenge of the Sith"
    )
  )
)

There are lots of other features and use cases. See the documentation for details.

Tuesday, September 9, 2025

Repost: Make your development environment portable and reproducible

Reposted from the original at https://blog.stephenturner.us/p/development-environment-portable-reproducible.

You upgrade your old Intel Macbook Pro for a new M4 MBP. You’re setting up a new cloud VM on AWS after migrating away from GCP. You get an account on your institution’s new HPC. You have everything just so in your development environment, and now you have to remember how to set everything up again.

I just started a new position, and I’m doing this right now.

Setting up a reproducible and portable development environment that works seamlessly across different machines and cloud platforms can save you time and headaches. These are a few of the strategies I use1 to quickly reproduce my development environment across machines.

Dotfiles in a GitHub repo
New VM setup script in a GitHub repo
R “verse” package on GitHub
Dev containers in VS Code

Keep your dotfiles in a private GitHub repo

Dotfiles are the hidden configuration files in your home directory. Examples include .vimrc for Vim, .tmux.conf for tmux, or .bashrc for your shell environment. I have a long list of aliases and little bash functions in a .aliases.sh file that my .bashrc sources. I also have a .dircolors, global .gitignore, a .gitconfig, and a minimal .Rprofile.

Keeping these files in a GitHub repository makes it easy to quickly reproduce your development environment on another machine. If you search GitHub for “dotfiles” or look at the awesome-dotfiles repo, you’ll see many people keep their dotfiles in a public repo. I use a private repo, because I’m too scared I might accidentally commit secrets, such as API tokens in my .Renviron or PyPI credentials in .pypirc.

Whenever you get a new machine or VM, getting things set up is easy:

# Your private dotfiles repo
git clone https://github.com/<yourusername>/dotfiles
cd ~/dotfiles

# A script to symlink things to your home
./install.sh

Keep a fresh cloud VM setup script

I started playing with computers in the 1990s. I’ve experienced enough hard drive failures, random BSODs, and other critical failures, that I treat my computer as if it could spontaneously combust at any moment and I could immediately lose all of my unsaved, un-backed-up work at any moment. I treat my cloud VMs the same way, as if they’re disposable (many times they are disposable, by design).

Imagine you launch a new cloud VM starting from a clean Ubuntu image. Now you need all the tools you use every day on this machine - vim, tmux, RStudio, conda, Docker, gcloud/gsutil, etc. Additionally, while I use conda to create virtual environments for installing tools for specific tasks, there are some domain-specific tools I use so often every day for exploratory analysis that I actually prefer having a local installation on the machine — things like bedtools, seqtk, samtools, bcftools, fastp, Nextflow, and a few others — instead of having to load a conda environment or use Docker every time I want to do something simple.

I keep a script on GitHub that will install all the software I need on a fresh VM. Here’s an example setup script I use as a GitHub gist.

I know this isn’t completely Reproducible™ in the sense that a Docker container might be, because I’m not controlling the version of every tool and library I’m installing, but it’s good enough to get me up and running for development and interactive data analysis and exploration.

R: Custom “verse” package on GitHub

The tidyverse is probably the best known meta-package that installs lots of other packages for data science. Take a look at the tidyverse package DESCRIPTION file. When you run install.packages("tidyverse"), it will install all the packages listed in the Imports field, including dplyr, tidyr, purrr, ggplot2, and others.

You can use this pattern to create your own “verse” package that installs all your favorite packages. This is helpful for setting up a new machine, or re-installing all the R packages you use whenever you upgrade to a new major version of R.

Take a look at my Tverse package on GitHub at github.com/stephenturner/Tverse, specifically at the DESCRIPTION file. In the Imports field I include all the packages I know I’ll use routinely. Note that this also includes several Bioconductor packages (which requires including the biocViews: directive in the DESCRIPTION), as well as one of my favorite packages, breakerofchains, that is only available from GitHub (requiring the Remotes: entry).

Once this package is pushed to GitHub I can easily install all those packages and their dependencies:

devtools::install("stephenturner/Tverse")

Dev containers in VS Code

Development containers (dev containers) allow you to create and use consistent development environments using Docker containers. It allows you to open any folder inside (or mounted into) a container and take advantage of Visual Studio Code's full feature set. This is particularly useful when working with teams or switching between projects with different dependencies.

The dev container docs and tutorial are both good places to start. You’ll need to have Docker running, and install the Dev Containers VS Code extension.

From Microsoft’s documentation:

Workspace files are mounted from the local file system or copied or cloned into the container. Extensions are installed and run inside the container, where they have full access to the tools, platform, and file system. This means that you can seamlessly switch your entire development environment just by connecting to a different container.

From Microsoft’s dev containers documentation.

Using a dev container template

You can use any pre-built dev container templates available on registries like Docker Hub or Microsoft’s container registry. Here’s an example using Rocker with R version 4.4.1, and adds a few extensions to VS Code running in the container. You could also create your own container for development, put that on Docker Hub, then use that image.

{
    "image": "rocker/r-ver:4.4.1",
    "customizations": {
        "vscode": {
            "extensions": [
                "REditorSupport.r",
                "ms-vscode-remote.remote-containers"
            ]
        }
    }
}

Using a custom Dockerfile

You can use a custom Dockerfile to create your dev container. First, create a .devcontainer/ directory in your project with a Dockerfile and a devcontainer.json file. Define your development environment in the Dockerfile (base image, installed packages and configuration). In the JSON replace the image property with build and dockerfile properties:

{
    "build": {
        "dockerfile": "Dockerfile"
    }
}

Start VS Code running the container

After you create your devcontainer.json file (either from a template or completely custom), open the folder in the container using the command palette:

And prove to yourself that your VS Code environment is indeed using the container (I’m using rocker R 4.4.1 here). Running whoami shows I’m root inside the container (not my own username), and I’m indeed running R version 4.4.1.

Vagrant and Ansible are powerful tools for managing development environments and automating configurations. Vagrant allows you to create and configure lightweight, reproducible, and portable virtual environments, while Ansible automates complex system setups and deployments across multiple machines. However, they can be overkill for simple or personal development environments, so I'm focusing on lighter, more straightforward solutions.

Using OpenAI Codex in Positron

Reposted from the original at https://blog.stephenturner.us/p/codex-positron.

Last month I wrote about agentic coding in Positron using Positron assistant, which uses the Claude API on the back end.

Positron Assistant: GitHub Copilot and Claude-Powered Agentic Coding in R

Stephen Turner

Jul 16

Read full story

Yesterday OpenAI announced a series of updates to Codex, the biggest being an IDE extension to allow you to use Codex in VS Code, Cursor, Windsurf, etc. More details at developers.openai.com/codex. And Codex is available in the Open VSX Registry, meaning you can install it in Positron.

Demo: creating an R package with Codex

I tried doing the same thing here with Codex as I did with Positron Assistant in the previous post. I used usethis::create_package() to give me a basic package skeleton, then I fired up Positron, hit the Codex extension in the side panel, and gave it a simple prompt.

write a simple function in this R package to reverse complement a DNA sequence (i.e. A>T, C>G, G>C, T>A). Document it with Roxygen, and write unit tests with testthat. Do not add any external package dependencies other than testthat.

Then I sat back and watched it work.

As you can see, after running devtools::document() and devtools::test(), my tests failed. I asked Codex to fix those tests. I had to do this twice, and the second time around it’s running those tests locally and diagnosing what’s happening.

The third time around all my tests pass.

And devtools::check() yields no errors, warnings, or notes.

The code is on the same GitHub repo, on the codex branch.

Why Codex instead of Positron Assistant?

I haven’t used either agent enough to know their failure modes, and which might be better in certain circumstances. As of last week, GPT-5 seems to outperform Claude for writing R code, and Codex uses GPT-5 under the hood.

Another factor might be cost. Instead of using API credits, Codex uses your existing ChatGPT Plus, Team, Pro, Edu, or Enterprise subscription. In my post on Positron Assistant I showed that the entire package development experiment (admittedly simple) cost about $0.09 cents. But if you’re relying on this daily and using it for heavier tasks, you might run up a decent bill. If you’re already paying $20/month for ChatGPT Plus, using Codex doesn’t cost you any more.

Finally, there’s the original selling point behind Codex before it was ever available in an IDE: You can wire up Codex to your GitHub account and ask Codex to read, write, and execute code in your repositories to answer questions or draft PRs. I haven’t tried this yet, but you can read more at developers.openai.com/codex/cloud.

Positron Assistant: GitHub Copilot and Claude-Powered Agentic Coding in R

Reposted from the original at https://blog.stephenturner.us/p/positron-assistant-copilot-chat-agent

I have a little hobby project I’m working on and I wanted to use the opportunity to fully make the switch to Positron from RStudio. I used Positron here and there when it first came out, but now that it’s out of beta and has a more complete feature set (like remote SSH sessions!) I have everything I need to switch and not look back. The most exciting new addition is the new Positron Assistant.

Positron Assistant

I wrote a post last year about AI code completion in Positron. GitHub copilot wouldn’t work in Positron at the time so I tried out Codeium, Tabnine, and Continue.

AI code completion in Positron

Stephen Turner

October 1, 2024

Read full story

Using a third-party plugin is no longer necessary. One of the more exciting new features in Positron is Positron Assistant.1 From the description:

Positron Assistant is an AI client that provides LLM integration within Positron, both for chat and for inline completions. Use Positron Assistant to generate or refactor code, ask questions, get help with debugging, and get suggestions for next steps in your data science projects.

Positron Assistant allows you to use GitHub Copilot for inline code completions, and Anthropic Claude for chat and agent mode. The documentation has instructions for getting this set up so I won’t go into those details. You make a configuration change in Positron, then sign into your GitHub account with OAuth, and put in your Anthropic API key, and you’re off to the races.

Cmd-Shift P to bring up the command pallette in Positron, then search for “Positron Assistant: Configure Language Model Providers.”

Code completion with GitHub Copilot

This isn’t anything new. GitHub Copilot has been available in VSCode and RStudio for years. But it’s nice to have it available in Positron now.

Here’s a demo where I’m starting with a blank R script, and write comments in the code describing what I want, then let Copilot take it away as I just hit the tab key to accept the suggestions. Here I’m asking for a function to reverse complement a DNA sequence. Here’s the code it produced.

Agent mode to create an R package

When Positron first came out I wrote about using it for R package development.

R package development in Positron

Stephen Turner

July 29, 2024

Read full story

I wanted to try out Positron Assistant’s agent mode to see how it works with R packages. Cursor and Claude Code seem to be all the rage on all the tech podcasts, Twitter feeds, and blogs I follow, but I’ve been reluctant to switch IDEs (or in the case of Claude Code, ditching the IDE altogether).

Activate the Assistant in Positron’s sidebar, then select Agent mode.

I started up a fresh R session and ran usethis::create_package() to create a blank package. This just creates the bare minimum (DESCRIPTION, NAMESPACE, etc.) needed for a skeleton R package. Then I activated Positron Assistant in agent mode, asked it to write a function in the package to reverse complement a DNA sequence, document it with Roxygen, and write unit tests with testthat.

It’s fun to sit back and watch the agent work. It scans the directory structure, finds the R version, creates the function, writes the documentation, writes the tests, then presents a model asking me whether I want to run the tests that it just created. It wrote everything in one shot with all tests passing and no errors on devtools::check().

Everything you see here cost $0.09 cents using the Claude 4 Sonnet API.2

The one thing I had to fix was the License field in the DESCRIPTION file with a simple usethis::use_mit_license(). The default for this field came in from usethis::create_package() and was simply boilerplate telling me that I needed to choose a license. Once I fixed this all tests passed, and the package check came out clean with 0 errors, warnings, or notes. I uploaded the package here on GitHub.

View the package code on GitHub

It was honestly pretty mesmerizing to sit back and watch the agent do its thing, inspecting the environment, writing code, docs, and tests.

Obviously this was a simple greenfield example, and I’d be curious to see how the agent handles larger codebases with complex dependencies and newer coding paradigms (like R’s new S7 OOP system) that won’t have good training data from Stack Overflow or elsewhere.

At the time I’m writing this (July 2025) Positron Assistant is still in preview, meaning that features might change by the time you’re reading this. For instance, currently only GitHub Copilot is available for inline code completions, and only Anthropic Claude is available for chat and agent mode. I’m sure both of these will expand in the near future to allow for other model providers (although Claude 4 consistently ranks at the top for R coding capabilities).

So many people I’ve talked to have no issue paying $20/month for ChatGPT Plus or Claude Pro, but are reluctant to buy API credits. I’m not sure how to rationalize this. I think there might be a misunderstanding that it works like AWS, where you put in a credit card and could accidentally rack up a huge bill. It doesn’t work like this. It’s a prepaid service. I put $5 on my account months ago just to experiment around a bit and I still haven’t used it all. You can set rate limits and email notifications on your API keys if you’re worried about spending more than a few pennies trying out something like what you see in this post.

Tuesday, July 15, 2025

Repost: Tidy RAG in R with ragnar

Reposted from the original at: https://blog.stephenturner.us/p/tidy-rag-in-r-with-ragnar

Retrieval augmented generation in R using the ragnar package. Demonstration: scraping text from relevant links on a website and using RAG to ask about a university's grant funding.

Note: After I wrote this post last week, the Tidyverse team released ragnar 0.2.0 on July 12. Everything here should still work, but take a look at the release notes to learn about some nice new features that aren’t covered here.

I’ve written a little about retrieval-augmented generation (RAG) here before. First, about GUIs for local LLMs with RAG:

GUIs for Local LLMs with RAG

Stephen Turner

Mar 14

Read full story

…and later on building a little RAG app to chat with a bunch of PDFs in your Zotero library using Open WebUI:

Build a local RAG application with Open WebUI to chat with your Zotero library

Stephen Turner

Apr 5

Read full story

In an oversimplified nutshell: LLMs can't help you with things that are not in their training data or are past their training cutoff date. With RAG, you can provide relevant snippets from those documents as context to the LLM so that its answers are grounded in a collection of known content from a trusted document corpus.

Even more oversimplified: RAG lets you “chat with your documents.”

In this post I’ll demonstrate how to scrape text from a website and implement a RAG workflow in R using a new addition to the tidyverse: ragnar, along with functionality from ellmer to interact with LLM APIs through R.

Demonstration

Python has historically dominated the AI product development space, but with recent additions like ellmer, chores, gander, and mall, R is quickly catching up.

Here I’m going to use the new ragnar package in the tidyverse (source, documentation) to build a little RAG workflow in R that uses the OpenAI API.

I’m going to ingest information from the UVA School of Data Science (SDS) website at datascience.virginia.edu, then ask some questions that won’t have answers in the base model’s training data.

Setup

If you want to follow along you’ll need an OpenAI API key. You can set that up at platform.openai.com. Once you do that, run usethis::edit_r_environ() to add a new OPENAI_API_KEY environment variable, and restart your R session.

In R I’m going to need the ellmer and ragnar packages. Because ragnar isn’t yet on CRAN, I'll have to install it with pak or devtools.

install.packages("ellmer")
pak::pak("tidyverse/ragnar")

Create a vector store

The first thing I want to do is to find all the links to other pages at datascience.virginia.edu, scrape all of that content, and stick it into a DuckDB database. Most of this is modified straight from the ragnar documentation, hence the context chunking still looks like I’m ingesting a book.

library(ragnar)

# Find all links on a page
base_url <- "https://datascience.virginia.edu/"
pages <- ragnar_find_links(base_url)

# Create and connect to a vector store
store_location <- "pairedends.ragnar.duckdb"
store <- ragnar_store_create(
  store_location,
  embed = \(x) ragnar::embed_openai(x),
  overwrite=TRUE
)

# Read each website and chunk it up
for (page in pages) {
  message("ingesting: ", page)
  chunks <- page |>
    ragnar_read(frame_by_tags = c("h1", "h2", "h3")) |>
    ragnar_chunk(boundaries = c("paragraph", "sentence")) |>
    # add context to chunks
    dplyr::mutate(
      text = glue::glue(
        r"---(
        # Excerpt from UVA School of Data Science (SDS) page"
        link: {origin}
        chapter: {h1}
        section: {h2}
        subsection: {h3}
        content: {text}

        )---"
      )
    )
  ragnar_store_insert(store, chunks)
}
# Build the index
ragnar_store_build_index(store)

Retrieval

Now suppose we want to ask questions about research grant funding at the UVA School of Data Science (SDS). First, let’s see what ChatGPT tells us without providing any context at all when we ask it to tell us about SDS’s active grant funding.

I don’t have specific information about “SDS’s active grant funding” because “SDS” could refer to several different organizations or entities (e.g., Social and Decision Sciences departments, Sustainable Development Solutions, Students for a Democratic Society, or a company with the acronym SDS).

Now let’s use the OpenAI API providing context from the vector store we just created that should have information about UVA SDS’s active funding from their active grants listing page.

library(ragnar)
store_location <- "pairedends.ragnar.duckdb"
store <- ragnar_store_connect(store_location, read_only = TRUE)

# Get relevant chunks
text <- "Tell me about some of SDS's active grant funding."
relevant_chunks <- ragnar_retrieve_vss(
  store,
  text,
  top_k = 10
)
relevant_chunks

#'  Register ellmer tool
#' You can register an ellmer tool to let the LLM retrieve chunks.
system_prompt <- stringr::str_squish(
  r"--(
  You are an expert about UVA's School of Data Science.
  You are concise. You always respond by first direct
  quoting material from the SDS website, then adding
  your own additional context and interpertation.
  Always include links to the source materials used.
  )--"
)
chat <- ellmer::chat_openai(
  system_prompt,
  model = "gpt-4.1-mini",
  params = ellmer::params(temperature = .5)
)

# Register a retrieve tool with ellmer
ragnar_register_tool_retrieve(chat, store, top_k = 10)

# Run the query
chat$chat(text)

Now when asking about some of SDS’s active grant funding, we get a better response:

Here are some examples of active grant funding at UVA's School of Data Science (SDS):
Resource and Data Exchange (RDE) has a grant award of $5,000,000 with a performance period from Feb 29, 2024 to Feb 27, 2029, currently 25% completed.
Biomedical Data Science Innovation Labs: An Intensive Research Project Development Program has an award amount of $1,943,352.
Supporting Pre-Service Teachers Mathematical Discourse through Co-Design of Teaching Simulation Tools is sponsored by the US NSF with a total award of $250,000.
Activity Aware Bayesian Deep Learning is sponsored by the US NSF.
Enhancing Security and Privacy of Community Cyberinfrastructures for Collaborative Research has a total award of $882,518 from the US NSF.
Evaluating the Efficacy of Repurposed Medications has an award amount of $86,134.
Accelerating racial and ethnic equity in SCI rehabilitation outcomes has an award amount of $53,170.
The application of data science on financial systems with an emphasis on safety and security has an award amount of $5,000.
These grants cover a range of topics from education, biomedical innovation, security, to social equity in rehabilitation.
Source: UVA SDS Active Grants

API cost and local models

As I’m writing this, the cost for GPT-4.1 mini is ridiculously cheap at $0.40 cents per million input tokens (see more on their API pricing page). The demonstration here cost me $0.01 cent (the text embedding and vector storage cost a fraction of a penny in addition to the input/output completions).

There are plenty of open/local models that support tool use, as well as open/local text embedding models, all of which can be run through Ollama. I tried the same exercise above using Nomic Embed through Ollama for text embedding, and tried several with tool calling abilities, including qwen3, mistral, llama3.1, llama3.2, llama3.3, and the new llama4, and the results were all terrible. I don't know if this was due to the inferiority of the models themselves, or if this was the embedding model that I chose, which incidentally happened to be the most popular embedding model available in Ollama. Just put $1 on your OpenAI API account and get to work and stop worrying about it.

Learning more

This recent webinar from Posit CTO Joe Cheng doesn’t cover RAG at all. In fact, he mentions near the top that RAG should not be your first choice when simply changing a system prompt would be good enough. It’s a good talk and I learned a few nice things along the way.

Tuesday, June 3, 2025

Repost: The Modern R Stack for Production AI

Reposted from the original at: https://blog.stephenturner.us/p/r-production-ai

...

Python isn't the only game in town anymore: R can interact with local and cloud LLM APIs, inspect and modify your local R environment and files, implement RAG, computer vision, NLP, evals, & much more.

There was a time in late 2023 to early 2024 when I and probably many others in the R community felt like R was falling woefully behind Python in tooling for development using AI and LLMs. This is no longer the case. The R community, and Posit in particular, have been on an absolute tear bringing new packages online to take advantage of all the capabilities that LLMs provide. Here are a few that I’ve used and others I’m keeping a close eye on as they mature.

ellmer: interact with almost any LLM in R

CRAN: https://cran.r-project.org/package=ellmer
GitHub: https://github.com/tidyverse/ellmer/
Documentation: https://ellmer.tidyverse.org/
Blog post: https://posit.co/blog/announcing-ellmer/

I can't remember when I first started using Ollama to interact with local LLMs like Llama and Gemma, but I first used the ollamar package (CRAN, GitHub, pkgdown) last summer, and wrote a post on using this package to ask Llama3.1 what’s interesting about a set of genes, or to summarize papers on bioRxiv.

Use R to prompt a local LLM with ollamar

Stephen Turner

August 14, 2024

Read full story

Shortly after that, I wrote an R package to summarize bioRxiv preprints with a local LLM using ollamar:

biorecap: an R package for summarizing bioRxiv preprints with a local LLM

Stephen Turner

August 24, 2024

Read full story

In addition to the ollamar package, Johannes Gruber introduced the rollama package (CRAN, GitHub, pkgdown) around the same time, though I haven’t used it myself.

Earlier this year Posit announced ellmer, a new package that allows you to interact with most major LLM providers, not just local models running via Ollama. The ellmer package supports ChatGPT, Claude, AWS Bedrock, Azure OpenAI, DeepSeek, Gemini, Hugging Face, Mistral, Perplexity, and others. It also supports streaming and tool calling. I wrote another post more recently demonstrating how to summarize Bluesky posts on a particular topic using ellmer:

Bluesky conversation analysis with local and frontier LLMs with R/Tidyverse

Stephen Turner

December 30, 2024

Read full story

The ellmer package’s documentation and vignettes are top-notch. Check it out.

chores: automate repetitive tasks

CRAN: https://cran.r-project.org/package=chores
GitHub: https://github.com/simonpcouch/chores/
Documentation: https://simonpcouch.github.io/chores/
Blog post: https://posit.co/blog/introducing-chores/

The chores package connects ellmer to your source editor in RStudio and Positron, providing a library of ergonomic LLM assistants designed to help you complete repetitive, hard-to-automate tasks quickly. These “assistants” let you do things like highlight some code and convert tests to testthat3, document functions with roxygen, or convert error handling to use cli instead of stop or rlang. There’s a nice demonstration screencast on the documentation website.

The hex sticker for the chores package: A cartoon illustration of a light orange potato character with rosy cheeks, holding a clipboard with a checklist in one hand and several small cards in the other. The potato is set against a purple hexagon outlined in a lighter orange. The word 'chores' is written diagonally in white in the upper right of the purple hexagon.

gander: allow an LLM to talk to your R environment

CRAN: https://cran.r-project.org/package=gander
GitHub: https://github.com/simonpcouch/gander/
Documentation: https://simonpcouch.github.io/gander/
Blog post: https://posit.co/blog/introducing-gander/

The gander package feels kind of like a Copilot but it also knows how to talk to objects in your R environment. It can inspect file contents from elsewhere in the project that you're working on, and it also has context about objects in your environment, like variables, data frames, and functions. There’s a nice demonstration screencast on the documentation website.

The hex sticker for the gander package: a cartoonish goose swims on a green background with a blue 'reflection' below it.

btw: describe your R environment to an LLM

CRAN: (Not yet on CRAN)
GitHub: https://github.com/posit-dev/btw/
Documentation: https://posit-dev.github.io/btw/
Blog post: https://posit.co/blog/custom-chat-app/

The btw package is brand new and still in development. You can use it interactively, where it assembles context on your R environment, package documentation, and working directory, copying the results to your clipboard for easy pasting into chat interfaces. It also allows you to wrap methods that can be easily incorporated into ellmer tool calls for describing various kinds of objects in R. I’d recommend reading Posit’s “Teaching chat apps about R packages” blog post.

A digital illustration on a pink background with white dots. A raven with a piece of paper in its beak in a colorful hexagon covered in various graphs and doodles. Dotted lines extend from the hexagon to four white rectangular documents with horizontal lines representing text. An 'AI' icon is in the upper right corner

ragnar: retrieval-augmented generation (RAG) in R

CRAN: https://cloud.r-project.org/package=ragnar
GitHub: https://github.com/tidyverse/ragnar/
Documentation: https://ragnar.tidyverse.org/

The ragnar R package helps implement Retrieval-Augmented Generation (RAG) workflows in R using ellmer to connect to any LLM you wish on the backend. It provides some really handy utility functions for reading files or entire websites into a data frame, converting files to markdown, and finding all links on a webpage to ingest. It helps you chunk text into sections, embed into a vector store (using duckdb by default), and retrieve relevant chunks to provide an LLM with context given a prompt.

I’m working on another post right now with a deeper dive into using ragnar.

vitals: LLM evaluations in R

CRAN: (Not yet on CRAN)
GitHub: https://github.com/tidyverse/vitals/
Documentation: https://vitals.tidyverse.org/

LLM evaluation at R, aimed at ellmer users. It’s an R port of the widely adopted Python framework Inspect. As of this writing, the documentation notes that vitals is highly experimental and much of its documentation is aspirational.

kuzco: computer vision in R

CRAN: (Not yet on CRAN)
GitHub: https://github.com/frankiethull/kuzco
Documentation: (No pkgdown site yet)
Blog post: https://posit.co/blog/kuzco-computer-vision-with-llms-in-r/

The kuzco package is designed as a computer vision assistant, giving local models guidance on classifying images and return structured data. The goal is to standardize outputs for image classification and use LLMs as an alternative option to keras or torch. It currently supports classification, recognition, sentiment, and text extraction.

mall: use an LLM for NLP on your data

CRAN: https://cloud.r-project.org/package=mall
GitHub: https://github.com/mlverse/mall
Documentation: https://mlverse.github.io/mall/

The mall package provides several convenience functions for sentiment analysis, text summarization, text classification, extraction, translation, and verification.

I recently used the mall package to run a quick sentiment analysis of #Rstats posts on Bluesky:

Bluesky conversation analysis with local and frontier LLMs with R/Tidyverse

Stephen Turner

December 30, 2024

Read full story

Functions in the mall package integrate smoothly with piped workflows in dplyr. For example:

reviews |>
  llm_sentiment(review)

Other resources

I think this is just the tip of the iceberg, and I can’t wait to see what else Posit and others in the R community are doing in this space.

Here's a recording from a recent webinar Joe Cheng (Posit CTO) gave on harnessing LLMs for data analysis.

You might also take a look at the recordings from posit::conf(2024) which include a few AI/LLM-focused talks.

Also check out the posit::conf(2025) schedule at https://posit.co/conference/. There’s a workshop on Programming with LLM APIs: A Beginner’s Guide in R and Python, four talks in a session titled LLMs with R and Python, several lightning talks that will likely cover LLMs in R, and four more talks in a session titled Facepalm-driven Development: Learning From AI and Human Errors.

The R community has clearly stepped up. Whether you're building prototypes, shipping production tools, or just exploring what LLMs can do, R is now a real and robust option. I’m excited to see where we go from here.

Find this useful? Buy me a coffee! ☕️

This blog has moved!

Thursday, October 2, 2025

Repost: Construct objects with idiomatic R code

Tuesday, September 9, 2025

Repost: Make your development environment portable and reproducible

Keep your dotfiles in a private GitHub repo

Keep a fresh cloud VM setup script

R: Custom “verse” package on GitHub

Dev containers in VS Code

Using a dev container template

Using a custom Dockerfile

Start VS Code running the container

Using OpenAI Codex in Positron

Positron Assistant: GitHub Copilot and Claude-Powered Agentic Coding in R

Demo: creating an R package with Codex

Why Codex instead of Positron Assistant?

Positron Assistant: GitHub Copilot and Claude-Powered Agentic Coding in R

Positron Assistant

AI code completion in Positron

Code completion with GitHub Copilot

Agent mode to create an R package

R package development in Positron

Tuesday, July 15, 2025

Repost: Tidy RAG in R with ragnar

GUIs for Local LLMs with RAG

Build a local RAG application with Open WebUI to chat with your Zotero library

Demonstration

Setup

Create a vector store

Retrieval

API cost and local models

Learning more

Tuesday, June 3, 2025

Repost: The Modern R Stack for Production AI

ellmer: interact with almost any LLM in R

Use R to prompt a local LLM with ollamar

biorecap: an R package for summarizing bioRxiv preprints with a local LLM

Bluesky conversation analysis with local and frontier LLMs with R/Tidyverse

chores: automate repetitive tasks

gander: allow an LLM to talk to your R environment

btw: describe your R environment to an LLM

ragnar: retrieval-augmented generation (RAG) in R

vitals: LLM evaluations in R

kuzco: computer vision in R

mall: use an LLM for NLP on your data

Bluesky conversation analysis with local and frontier LLMs with R/Tidyverse

Other resources