Wednesday, June 1, 2016

Covcalc: Shiny App for Calculating Coverage Depth or Read Counts for Sequencing Experiments

How many reads do I need? What's my sequencing depth? These are common questions I get all the time. Calculating how much sequence data you need to hit a target depth of coverage, or the inverse, what's the coverage depth given a set amount of sequencing, are both easy to answer with some basic algebra. Given one or the other, plus the genome size and read length/configuration, you can calculate either. This was inspired by a similar calculator written by James Hadfield, and was an opportunity for me to create my first Shiny app.

Check out the app here:
http://apps.bioconnector.virginia.edu/covcalc/

And the source code on GitHub:
https://github.com/stephenturner/covcalc

Give it your read length, whether you're using single- or paired-end sequencing, select a genome or enter your own. Then, select whether you want to calculate (a) the number of reads you need to hit a target depth of coverage, or (b) the coverage depth you'll hit given a set number of sequencing reads. Once you make the selection, use the slider to adjust either the desired coverage or number of reads sequenced, and the output text below is automatically updated.


Shiny App: Coverage / Read Count Calculator

3 comments:

  1. Good app. May I suggest improvements to account for real-life scenario. I would like to see this calculator take into account on-target coverage, instead of all coverage. We know that in NGS experiements many reads/bases fall in flanking introns. So the real coverage for a given output is lower that what is expected from the calculation above. So, taking insert-size into account will greatly help in getting the actual output needed for on-target coverage. Is it possible to perform this calculation based on the distribution of the inserts over a target region? Usually symmetrical, but the slope may vary? or one may use empirical data on on-target coverage as a ratio of on-target vs total? Additional improvements include accounting for duplicates, mapped-reads vs all reads. These may also be included as proportion of total reads, which the users can input based on empirical data from their labs (based on the sequencing instrument and chemistry etc etc).

    ReplyDelete
    Replies
    1. thanks for the feedback srirangan. would you mind posting this as an issue on the github page: https://github.com/stephenturner/covcalc/issues

      Delete
  2. Hi Stephen,
    Do you think this could be used to calculate the amount of samples needed for sequencing a reference population to obtain a certain imputation coverage using genotype data? Meaning if I have a project and my samples are not represented well in 1000G, how many would I need to sequence at 30x coverage to get imputation coverage (r2 >= 0.80).
    Janina

    ReplyDelete

Note: Only a member of this blog may post a comment.

Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.