Monday, November 26, 2012

Copy Text to the Local Clipboard from a Remote SSH Session

This is an issue that has bugged me for years, and I've finally found a good solution on osxdaily and Stack Overflow. I'm using Terminal on my OSX laptop to connect to a headless Linux machine over SSH, and I want to copy the output from a command (on the remote server) to the local clipboard on my Mac using only the keyboard (i.e., a pipe). In essence:

commandThatMakesOutput | sendToLocalClipboard

If I'm working from the command line on my local machine, the pbcopy command does this for me. For example, while working on my local mac, if I wanted to copy all the filenames in the directory to my clipboard, I could use:

ls | pbcopy

To do this for output generated over SSH, the general solution works like this:

commandThatMakesOutput | ssh desktop pbcopy

When you're SSH'd into a remote host, this will take the output of commandThatMakesOutput and pipe it to the pbcopy command of the desktop machine that you ssh into from the remote host. In other words, you ssh back into your local machine from the remote host, passing output from commandThatMakesOutput back to pbcopy on your local machine.

There is one required step and three recommendations.

First, you must configure your local desktop as an SSH server. I'm assuming you're aware of any security risks you might be taking when doing this. Open system preferences, sharing. Enable remote login, preferably allowing only you to log in, for added security. You'll set up SSH keys later.

Here, take note of how to SSH into your machine. I've blurred out my IP address where it says "To log into this computer..." This brings up the first recommendation: you need a static IP address for this to work well, otherwise you'll have to constantly look up your IP address and modify the command you'll use to SSH back into your local machine from the remote host. Most institutions simply hand out static IPs if you ask nicely. 

The second recommendation is that you create an SSH key pair, storing the public key on your mac in ~/.ssh/authorized_keys and your private key on the remote host, so that you don't have to type a password. Make sure you chmod these files appropriately. I'll leave this up to you and Google.

Finally, you want to alias the command on your remote host to some quick command, like "cb". Preferably you'll add this to your .bashrc, so it's sourced every time you log in. E.g.

alias cb="ssh user@123.456.7.8 pbcopy"

So now, if you're logged into a remote host and you want to copy the output of pwd (on the remote host) to your local clipboard, you can simply use:

pwd | cb

I don't really know what this would do with anything other than small bits of ASCII text, so if you pipe binary, or the output of gzip -c to pbpaste, you're voiding your nonexistent warranty. Presumably there's a way to do this with Windows - I don't know of a command-line utility to give you access to the clipboard, but if you can think of a way how, please leave a comment.

Wednesday, November 7, 2012

RegulomeDB: Identify DNA Features and Regulatory Elements in Non-Coding Regions

Many papers have noted the challenges associated with assigning function to non-coding genetic variation, and since the majority of GWAS hits for common traits are non-coding, resources for providing some mechanism for these associations are desperately needed.

Boyle and colleagues have constructed a database called RegulomeDB to provide functional assignments to variants using data from manual curation, CHiP-seq data, chromatin state information, eQTLs across multiple cell lines, and some computational predictions generated from DNase footprinting and transcription factor binding motifs.

RegulomeDB implements a tiered category system (1-6) where category 1 has an eQTL association in addition to other ENCODE sources of data, 2 -5 have some ENCODE data only with no eQTL associations, and category 6 has evidence of a binding motif change only. As you might imagine, the annotation density increases as you increase category numbers.

Their simple, but impressive interface will accept RS numbers, or whole BED, GFF, or VCF files for annotation. The resulting output (example above) is downloadable, providing both specifics of the annotation (such as the transcription factor binding to the area) and the functional score for the variant.

Friday, November 2, 2012

STAR: ultrafast universal RNA-seq aligner

There's a new kid on the block for RNA-seq alignment.

Dobin, Alexander, et al. "STAR: ultrafast universal RNA-seq aligner." Bioinformatics (2012).

Aligning RNA-seq data is challenging because reads can overlap splice junctions. Many other RNA-seq alignment algorithms (e.g. Tophat) are built on top of DNA sequence aligners. STAR (Spliced Transcripts Alignment to a Reference) is a standalone RNA-seq alignment algorithm that uses uncompressed suffix arrays and a mapping algorithm similar to those used in large-scale genome alignment tools to align RNA-seq reads to a genomic reference. STAR is over 50 times faster than any other previously published RNA-seq aligner, and outperforms other aligners in both sensitivity and specificity using both simulated and real (replicated) RNA-seq data.

The notable increase in speed comes at the price of a larger memory requirement. STAR requires ~27GB RAM to align reads to a human genome - a moderate amount, but not atypical on most modern servers. STAR aligns ~45 million paired reads per hour per processor, and scales nearly linearly with the number of processors (without appreciably increasing RAM usage). Notably, the STAR algorithm is also capable of handling longer reads such as those from PacBio and the upcoming Oxford Nanopore technologies. STAR is free and open source software.

Dobin, Alexander, et al. "STAR: ultrafast universal RNA-seq aligner." Bioinformatics (2012).

STAR software on Google Code

(This post adapted from my review on F1000).
Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.