Tuesday, December 31, 2013

Jeff Leek's non-comprehensive list of awesome things other people did in 2013

Jeff Leek, biostats professor at Johns Hopkins and instructor of the Coursera Data Analysis course, recently posted on Simly Statistics this list of awesome things other people accomplished in 2013 in genomics, statistics, and data science.

At risk of sounding too meta, I'll say that this list itself is one of the awesome things that was put together in 2013. You should go browse the entire post for yourself, but I'll highlight a few that I saved to my reading list:


This only a sample of what's posted on Jeff's blog. Go read the full post below.

Simply Statistics: A non-comprehensive list of awesome things other people did this year.

Wednesday, December 18, 2013

Curoverse raises $1.5M to develop & support an open-source bioinformatics data analysis platform



Boston-based startup Curoverse has announced $1.5 million in funding to develop and support the open-source Arvados platform for cloud-based bioinformatics & genomics data analysis.

The Arvados platform was developed in George Church's lab by scientists and engineers led by Alexander Wait Zaranek, now scientific director at Curoverse. According to the Arvados wiki:

Arvados is a platform for storing, organizing, processing, and sharing genomic and other biomedical big data. The platform is designed to make it easier for bioinformaticians to develop analyses, developers to create genomic web applications and IT administers to manage large-scale compute and storage genomic resources. The platform is designed to run on top of "cloud operating systems" such as Amazon Web Services and OpenStack. Currently, there are implementations that work on AWS and Xen+Debian/Ubuntu. ... A set of relatively low-level compute and data management functions are consistent across a wide range of analysis pipelines and applications that are being built for genomic data. Unfortunately, every organization working with these data have been forced to build their own custom systems for these low level functions. At the same time, there are proprietary platforms emerging that seek to solve these same problems. Arvados was created to provide a common solution across a wide range of applications that would be free and open source.

A few questions should be apparent: What value does Arvados provide over other more widely used platforms (namely, Galaxy) that also aim to enable reproducibility, transparency, sharing, collaboration, and data/workflow management with biological big data? And how does Curoverse distinguish itself from other cloud-based bioinformatics services like Seven Bridges, DNA Nexus, and the next implement-GATK-on-Amazon-and-sell-it-back-to-me service provider that pops up? I understand that there are real costs with free software, but will the service that Curoverse provides be valuable and cost-effective enough to overcome the activation energy and make up for the "switching costs" that the average bioinformatician faces on adopting a new way of doing things? While the platform and the support model sound potentially very useful, these are all questions that the Curoverse team will need to carefully consider in attracting new users.

Arvados open-source bioinformatics analysis platform: https://arvados.org/

Curoverse: https://curoverse.com/

Press Release: http://www.prweb.com/releases/2013/12/prweb11424292.htm

Monday, December 9, 2013

Biostar Tutorial: Cheat sheet for one-based vs zero-based coordinate systems

Obi Griffith over at Biostar put together this excellent cheat sheet for dealing with one-based and zero-based genomic coordinate systems. The cheat sheet visually explains the difference between zero and one-based coordinate systems, as well as how to indicate a position, SNP, range, or indel using both coordinate systems.

Biostar Tutorial: Cheat sheet for one-based vs zero-based coordinate systems


Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.