Wednesday, November 20, 2024

Expand your Bluesky network with R (repost)

 This is reposted from the original at https://blog.stephenturner.us/p/expand-your-bluesky-network-with-r.

---

I’m encouraging everyone I know online to join the scientific community on Bluesky.

In that post I link to several starter packs — lists of accounts posting about a topic that you can follow individually or all at once to start filling out your network.

I started following accounts of people I knew from X and from a few starter packs I came across. One way to expand your network is to take all the accounts you follow, see who they are following but you aren’t. You can rank this list descending by the number of your follows who follow them, and use that list as a way to fill out your network.

Let’s do this with just a few lines of code in R. The atrrr package (CRANGitHubDocs) is one of several packages that wraps the AT protocol behind Bluesky, allowing you to interact with Bluesky through a set of R functions. It’s super easy to use and the docs are great.

The code below does this. It will first authenticate with an app password. It then retrieves all the accounts you follow. Next, it gets who all those accounts follow, and removes the accounts you already follow.1

library(dplyr)
library(atrrr)

# Authenticate first (switch out with your username)
bsky_username <- "youraccount.bsky.social"

# If you already have an app password:
bsky_app_pw <- "change-me-change-me-123"
auth(user=bsky_username, password=bsky_app_pw)

# Or be guided through the process
auth()

# Get the people you follow
f <- get_follows(actor=bsky_username, limit=Inf)

# Get just their handles
fh <- f$actor_handle

# Get who your follows are following
ff <-
  fh |>
  lapply(get_follows, limit=Inf) |>
  setNames(fh)

# Make it a data frame
ffdf <- bind_rows(ff, .id="follow")

# Get counts, removing ppl you already follow
ffcounts <-
  ffdf |>
  count(actor_handle, sort=TRUE) |>
  anti_join(f, by="actor_handle") |>
  filter(actor_handle!="handle.invalid")

# Join back to account info, add URL
ffcounts <-
  ffdf |>
  distinct(actor_handle, actor_name) |>
  inner_join(x=ffcounts, y=_, by="actor_handle") |>
  mutate(url=paste0("https://bsky.app/profile/",
                    actor_handle))

This returns a data frame of all the accounts followed by the people you follow, but who you don’t already follow, descending by the number of accounts you follow who follow them (mouthful right there).

Optional, but you can make this nicer by using the gt package to make a nice table with a clickable link.

# Optional, clean up and create a nice table
library(gt)
library(glue)
top <- 20L
ffcounts |>
  head(top) |>
  rename(Handle=actor_handle, N=n, Name=actor_name) |>
  mutate(Handle=glue("[{Handle}]({url})")) |>
  mutate(Handle=lapply(Handle, gt::md)) |>
  select(-url) |>
  gt() |>
  tab_header(
    title=md(glue("**My top {top} follows' follows**")),
    subtitle="Collected November 19, 2024") |>
  tab_style(
    style="font-weight:bold",
    locations=cells_column_labels()
  ) |>
  cols_align(align="left") |>
  opt_row_striping(row_striping = TRUE)

I can’t embed an HTML file here, but here’s what that output looks like. You can click any one of the names and follow the account if you find it useful.

Maybe you do this iteratively - add your top follows’ follows, then rerun the process a few times to possibly discover unknown second-degree connections.

The code here essentially replicates what @theo.io’s Bluesky Network Analyzer is doing, but all locally using R. That web app is faster and easier to use, and does some smart caching and throttling to avoid API rate limits. See the footnote for more.

Sunday, November 10, 2024

Build a Python CLI with Click+Cookiecutter (repost)

Reposted from the original at https://blog.stephenturner.us/p/python-cli-click-cookiecutter

---

In the spirit of Learning in Public, I wanted an excuse to explore (1) click for creating command line interfaces, (2) Cookiecutter project templates, and (3) modern tools in the Python packaging ecosystem. If you’re primarily an R developer like me, I recently wrote about resources for getting better at Python for R users.

Click is a really nice package for creating command line interfaces, and I like it better than argparse or other similar utilities. Simon Willison’s click app cookiecutter template was really helpful in getting the boilerplate set up for a python package, and while I looked at build backends like Flit, Poetry, uv, etc., I ended up just using setuptools with a pyproject.toml.

For this demo I built a silly little Python tool called caffeinated (inspired by coffee-o-clock) that tells you how much caffeine you’ll still have in your system at bedtime based on how much you consume and when. You can install it from PyPI and running caffeinated with the --help option (or without any arguments) prints the help. The code is on GitHub (github.com/stephenturner/caffeinated) if you want to follow along.

Demonstration of installing and running caffeinated (on GitHub and PyPI). First, pip install caffeinated, then run caffeinated --help for usage info. Run caffeinated -c 200 -b 9pm to see how much caffeine will remain in your system if you consume 200mg caffeine right now and go to bed at 9pm.

I’m not totally sure how accurate the formula is here, but I’m using this to calculate how much caffeine remains in circulation (and I’m going with 90mg for “a cup of coffee”).1

N(t)=N0(12)tt6

Where:

  • N(t) = Quantity of caffeine remaining

  • N0 = Original amount of caffeine

  • t = Time

  • t6 = Coffee's half-life (6 hours)

Click

Click (Command Line Interface Creation Kit) is a Python package for creating command line interfaces in a composable way with as little code as necessary.

Why Click? Why not argparse/docopt/etc? Good questions. The Click documentation has a section on Why Click?Why not Argparse? and Why not Docopt etc.? I like click because it enables you to easily create command line utilities with subcommands (e.g., mycommand subcommand ... e.g. like bedtools intersect ...), and it supports file handling, makes it easier to handle options versus arguments, and easily supports ANSI coloring of the output.

Simple Click demo

First let’s set up a folder structure we’ll use to create a Python package. Make a new folder named whatever you’re calling the package (in this case, caffeinated), and in that directory create a pyproject.toml file. Create a new subfolder with the same name as the parent directory (caffeinated), and in that folder you’ll have three files. Directory structure should look like this:

  • caffeinated/

    • caffeinated/

      • __init__.py

      • __main__.py

      • cli.py

    • pyproject.toml

The pyproject.toml will have just the basics you need for a Python package:

[project]
name = "caffeinated"
version = "0.1.1"
dependencies = ["click"]

[project.scripts]
caffeinated = "caffeinated.cli:caffeinated"

The __init__.py will be empty, and the __main__.py will just have one line that imports the function from the cli.py:

from .cli import caffeinated

if __name__ == "__main__":
    caffeinated()

The cli.py actually has the code for your command line tool. This is a really simple program that just echos out the amount of caffeine you consumed and what time your bedtime is:

import click

@click.command()
@click.option("-c", "--caffeine", default=100)
@click.option("-b", "--bedtime", default=2100)
def caffeinated(caffeine, bedtime):
    click.echo(f"Caffeine consumed: {caffeine} mg")
    click.echo(f"Your bedtime is:   {bedtime}")

Now, pip install the package you just wrote:

pip install .

And now the caffeinated command line utility is ready to use. First, get some help. Notice how by specifying a default value

$ caffeinated --help
Usage: caffeinated [OPTIONS]

Options:
  -c, --caffeine INTEGER
  -b, --bedtime INTEGER
  --help                  Show this message and exit.

Now run it:

$ caffeinated -c 200 -b 2100
Caffeine consumed: 200 mg
Your bedtime is:   2100

The real caffeinated app

You can see the real cookiecutter code here: github.com/stephenturner/caffeinated. Here are links to the actual working code. Everything important is in the cli.py file. It adds a few more arguments, picks up the version from the pyproject.toml, and defines functions to do all the calculation and conveniences such as translating “9pm” into 2100 (hours).

Once you update all the source or just pip install caffeinated again from PyPI, the tool will tell you approximately how much caffeine you’ll have remaining in your system after consuming a certain amount of caffeine at your chosen bedtime. Run caffeinated --help to get help on the options.

Cookiecutter

Organizing your project as a well-structured Python package can streamline development and distribution. The Cookiecutter package provides a straightforward way to generate project templates, ensuring consistency and best practices across your projects. Install it with pip, then I’ll use Simon Willison’s click-app cookiecutter template.

pip install cookiecutter
cookiecutter gh:simonw/click-app

You can see what this looks after running that at this demo. Running this and answering a few of the prompts will create:

  1. The directory structure described above

  2. The __init__.py__main__.py, and cli.py files with some boilerplate to get started.

  3. A pyproject.toml file based on your answers to the prompts.

  4. A tests directory with boilerplate for writing tests with pytest.

  5. A README with badges pointing to a future PyPI release, changelog from your GitHub releases, license, and test status.

  6. GitHub actions for publishing your tool as a package to PyPI (requires additional configuration as described here).

The final caffeinated code made from Simon’s click-app cookiecutter template is here github.com/stephenturner/caffeinated.

Packaging

setuptools, Flit, Poetry, Hatch, uv

I’ve written scores of R packages for fun and profit. There’s really only one build backend toolchain for R packages that everyone uses: devtools with Roxygen documentation with liberal assistance from usethis.

The Python documentation has a good guide on Packaging Python Projects. The build backend ecosystem in Python is more diverse. I really wanted to take a closer look at FlitPoetryHatch, and others, but because the cookiecutter template that I used created a pyproject.toml using setuptools by default, so I just ran with that.

setuptools is probably the oldest and most widely used packaging tools in Python with good documentation and community support. And with PEP 517, the standard became using a simpler pyproject.toml rather than the old setup.py. You can see my pyproject.toml for caffeinated here. It’s pretty simple, and one key feature is the readme="README.md" entry, which results in the documentation on the PyPI landing page (pypi.org/project/caffeinated/) populated with the README.md in the project root, avoiding the need for duplication.

Flit (flit.pypa.io) looks like a very minimal, very simple build backend for packaging plain Python code. I also took a look at Poetry (python-poetry.org), because if I were building something more complex I think I’d want something to help me manage dependencies instead of having to add them to the pyproject.toml by hand. Poetry helps with this. Finally, there’s a lot of interest in uv (docs.astral.sh/uv) right now. It’s a Python package and project manager written in Rust, and the benchmarks are impressive. See the “uv: Unified Python packaging” blog post for more. uv doesn’t yet have a build backend, but that’s in the works at astral-sh/uv#3957.

Building python packages with setuptools vs Flit vs Poetry vs uv might be the subject of a future post, but for now, I’m just using setuptools+build.

building and deploying with setuptools+build+twine

The pyproject.toml created by the cookiecutter template I used had just about everything I needed to build the package. From here it was simple. This will build the .whl binary file and .tar.gz source packages in a dist/ folder.

python -m build

After this it’s fairly straightforward to upload this to PyPI. But, before uploading to the production pypi.org, you should probably upload to the testing repository (test.pypi.org) first to avoid polluting PyPI with broken or testing packages.

twine upload -r testpypi dist/*

Uploading to the real PyPI follows the same convention. I recommend using a token in your .pypirc file instead of a username/password prompt.

twine upload dist/*

A few seconds later your package will be on PyPI, and you can install it with pip install like you would any other package. The caffeinated package is on PyPI at https://pypi.org/project/caffeinated/.

If you’ve ever tried getting a package into CRAN you know how onerous the process can be and how strict the CRAN maintainers can be2 and you may be shocked to see how easy it is to get a package onto PyPI. There is no curation or review process with PyPI. Upload your source and wheel files with twine and your package is live.

Docker

Once the app is on PyPI it’s easy to create a Docker container. There’s a Dockerfile in the repo that looks like this:

FROM python:3.11-alpine
RUN pip install caffeinated
WORKDIR /files
ENTRYPOINT ["caffeinated"]
CMD ["--help"]

You can build it like this (replace stephenturner with your Docker username):

docker build -t stephenturner/caffeinated

And push it up to Docker Hub:

docker push stephenturner/caffeinated 

This container is on Docker hub at stephenturner/caffeinated. Now you can run it:

$ docker run stephenturner/caffeinated -c 200 -b 9pm

You would have 35.4mg of caffeine in your system if you went to bed at 9:00pm (in 15.0 hours).
That's like having 39% of a cup of coffee before bed.

Alternatively, you can easily create an image with your new tool and additional tools you might want in the same container using Seqera containers (see the video linked further below for more details). I created this image that includes both caffeinated and cowsay, and I’m piping the result of caffeinated into cowsay.

$ docker run --rm community.wave.seqera.io/library/pip_caffeinated_python-cowsay:5a33eb2abfe4e6a5 sh -c 'caffeinated --caffeine 200 --bedtime 9pm --start-time 8am | cowsay'
 __________________________________________ 
/ You would have 44.5mg of caffeine in     \
| your system if you went to bed at 9:00pm |
| (in 13.0 hours). That's like having 49%  |
\ of a cup of coffee before bed.           /
 ------------------------------------------ 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

Learning more

This is part of my TIL / Learning in Public series, which I wrote about recently:

For a quick intro on Click, see the official Click intro video. It’s over 10 years old at this point, it’s mostly still applicable.

This video from NeuralNine demonstrates how to create Click groups to create CLI utilities with subcommands:

This video from ArjanCodes explains how to create a Python Package and publish on PyPI. It uses the old setup.py instead of the more modern pyproject.toml convention, but otherwise it’s still good to understand the steps in the process.

Finally, a little more about easily creating containers with multiple tools for multiple architectures using Seqera Containers:

Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.