I really liked how they framed the multiple-testing problem that routinely plagues large-scale genetic studies, where nominal significance thresholds can yield many false positives when applied to multiple hypothesis tests:
However, true hypotheses are true, and false hypotheses are false, regardless of how many are tested. As such, the actual 'multiple testing burden' depends on the proportion of true and false hypotheses in any given set: that is, the 'prior probability' that any given hypothesis is true, rather than the number of tests per se. This challenge can thus be viewed as a 'naive hypothesis testing' problem — that is, when in reality only one or a few variants are causal for a given phenotype, but all (or many) variants are a priori equally likely candidates, the prior probability of any given variant being causal is miniscule. As a consequence, extremely convincing data are required to support causality, which is potentially unachievable for some true positives.
Defining the challenge in terms of hypothesis quality rather than quantity, however, points to a solution. Specifically, experimental or computational approaches that provide assessments of variant function can be used to better estimate the prior probability that any given variant is phenotypically important, and these approaches thereby boost discovery power.
Check out the full review at Nature Reviews Genetics:
Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.