Friday, April 16, 2010

My thoughts on King's "Genetic Heterogeneity" essay in Cell

Update Thursday, April 29, 2010: See further commentary at a newer post here.


Just finished reading Jon McClellan and Mary-Claire King's Genetic Heterogeneity in Human Disease essay in Cell. It's definitely one of the most forthright and compelling essays I've read on the subject of the inadequacy of GWAS for identifying genes that cause complex human disease. The essay starts with an evolutionary perspective. Most human variation is relatively ancient - originating in ancient human populations long before the migration out of Africa. Yet new alleles arise constantly, and because of the relatively recent human population growth, we can be certain that most alleles are actually recent and rareFor a common allele to remain in the population it must withstand evolutionary pressure. If the variation is pathogenic, it must either (1) lead to disease later in life so as not to affect fitness (e.g. Alzheimer's Disease, AMD), or (2) it must be balanced by positive selection (e.g. hemoglobin genes which cause sickle cell anemia are balanced by positive selection from malaria resistance).

The authors then dive into heterogeneity, citing many examples of human diseases which display both locus heterogeneity (mutations in many different genes lead to the same disease), and allelic heterogeneity (many mutations in the same gene cause the same disease). The authors discuss early-onset breast and ovarian cancer, inherited hearing loss, genetics of lipid metabolism, and severe mental illnesses such as autism or schizophrenia.

Next comes a very nice discussion of the common-disease-common-variant (CDCV) hypothesis and GWAS. Thousands of "risk variants" have been identified from GWAS, yet most of these have no apparent biological function. Since most genotyping platforms select for common variants, and because evolution has ensured that most common variants are neutral, then it follows that most GWAS findings are neutral, stemming from factors other than a true association with disease risk.

For one, the authors cite a problem we're all well aware of: population stratification. Yet we tend to think that if we eliminate ethnic outliers or control for stratification with PCA or the like, then we've eliminated the problem. Yet the authors point to a recently Nature-published GWAS in autism that provides a striking example of the problem hypervariable alleles can cause. The authors found an association with a SNP which had a frequency in cases of 0.65, and a frequency in controls of 0.61. All cases and controls were of European descent. Yet the frequency of the risk variant varies from 0.21 to 0.77 across European populations! (N.b. - see the discussion of this point in a newer post). This difference in frequency across European populations is 14 times higher than the frequency difference between cases and controls! Even very minimal differences in ancestry between cases and controls could have explained this association rather than true association with autism.

The authors do give a few examples of where common variants truly affect a common disease (hemoglobin genes and sickle-cell anemia, autoimmune disorders and the MHC region, Alzheimer's disease and APOE, lactose intolerance and alleles in the lactase gene enhancer region). Yet these examples prove two points. (1) All the variants in these genes have a demonstrable effect on the protein or its expression, as opposed to most GWAS findings, and (2) back to the evolutionary perspective, all of these genes have reason to remain common because of their evasion of evolutionary pressure, because they either do not affect reproductive fitness, or are balanced by positive selection.

The authors conclude by offering potential paths going forward, utilizing high-throughput sequencing technologies. One of the problems with sequencing data is not just finding potentially deleterious mutations, but determining which of the many potentially deleterious mutations actually play a role in human disease. One of the most promising strategies is to use next-gen sequencing to trace coinheritance of potential disease causing alleles with disease within affected families - essentially linkage analysis. Finally the authors assert that replication in genetics studies should focus on the identification and confirmation of multiple biologically relevant mutations in the same gene. This would provide both biological and epidemiological support for the causality of the gene or pathway in the pathogenesis of the disease.

This essay is definitely worth a read.

Cell: Genetic Heterogeneity in Human Disease



Update Tuesday, April 27, 2010: Keep an eye out over at Genetic Future for an upcoming post pointing out some of the problems with this paper I didn't consider here.

23 comments:

  1. I think you've got a logical fallacy in here. You say (in bold, no less) "GWAS study common SNPs", "evolution has made most common SNPs neutral" therefore "most GWAS hits are neutral". That's only true if GWAS hits are a random selection of common SNPs, and the whole point is that they're not: they represent part of the small minority of common SNPs which aren't neutral.

    It's like saying, "I'm doing a survey about people with pet cats", "Most pet cats are small housecats" therefore "The pet cats in my survey are mostly housecats". But what if my survey (GWAS) were explicitly about man eating tigers? Because I'm focusing on the tiny subset of people who own man eating tigers, the properties of my subset (associated variants) are not the same as the properties of the larger group (all common variants).

    ReplyDelete
  2. Jeff - so you're saying that you think all your GWAS hits are man eating tigers? I don't agree.

    ReplyDelete
  3. Thank you both for your comments. I don't think Jeff necessarily meant that GWAS preferentially targets non-neutral variants with extremely large effect. Jeff's got a point here - some GWAS hits are non-neutral. The authors of this paper give lots of examples of truly non-neutral biologically plausible common variants that consistently associate with a complex trait. And there are certainly dozens more GWAS hits that the authors fail to mention, which also consistently replicate, and are likely non-neutral.

    However the point still remains - (1) genotyping platforms DO preferentially genotype common variants, although that's starting to change, and if you believe that (2) most common variants are neutral (see Kimura), thus the probability of any one common variant being non-neutral and truly associated are infinitesimally small, and (3) that there are many, many things that can cause an apparent association to a truly neutral variant (chance, stratification, etc), then I don't think it's a logical fallacy to say that MOST (not ALL, there are exceptions) GWAS hits are truly neutral. The fact that MOST GWAS hits do not replicate is evidence to support this statement (although there are lots of reasons for nonreplication as well).

    ReplyDelete
  4. Where is your evidence that "MOST GWAS hits do not replicate"? If you consider published GWAS hits which meet accepted levels of statistical significance (e.g. 5e-8), then nearly all replicate in well powered follow-up studies. That's the victory of GWAS over complex disease linkage or candidate gene mapping -- we finally are publishing true associations rather than spurious claims due to the confounders you mention.

    This is what bugs me about this McClellan and King piece and similar commentaries: unsubstantiated claims that most GWAS hits aren't replicated, when in fact they almost all are.

    ReplyDelete
  5. 1. JAMA. 2010 Feb 17;303(7):631-7.

    Association between a literature-based genetic risk score and cardiovascular
    events in women.

    Paynter NP, Chasman DI, Paré G, Buring JE, Cook NR, Miletich JP, Ridker PM.

    Center for Cardiovascular Disease Prevention and the Divisions of Preventive
    Medicine and Cardiovascular Diseases, Brigham and Women's Hospital, Boston,
    Massachusetts 02215, USA. npaynter@partners.org

    CONTEXT: While multiple genetic markers associated with cardiovascular disease
    have been identified by genome-wide association studies, their aggregate effect
    on risk beyond traditional factors is uncertain, particularly among women.
    OBJECTIVE: To test the predictive ability of a literature-based genetic risk
    score for cardiovascular disease. DESIGN, SETTING, AND PARTICIPANTS: Prospective
    cohort of 19,313 initially healthy white women in the Women's Genome Health Study
    followed up over a median of 12.3 years (interquartile range, 11.6-12.8 years).
    Genetic risk scores were constructed from the National Human Genome Research
    Institute's catalog of genome-wide association study results published between
    2005 and June 2009. MAIN OUTCOME MEASURE: Incident myocardial infarction, stroke,
    arterial revascularization, and cardiovascular death. RESULTS: A total of 101
    single nucleotide polymorphisms reported to be associated with cardiovascular
    disease or at least 1 intermediate cardiovascular disease phenotype at a
    published P value of less than 10(-7) were identified and risk alleles were added
    to create a genetic risk score. During follow-up, 777 cardiovascular disease
    events occurred (199 myocardial infarctions, 203 strokes, 63 cardiovascular
    deaths, 312 revascularizations). After adjustment for age, the genetic risk score
    had a hazard ratio (HR) for cardiovascular disease of 1.02 per risk allele (95%
    confidence interval [CI], 1.00-1.03/risk allele; P = .006). This corresponds to
    an absolute cardiovascular disease risk of 3% over 10 years in the lowest tertile
    of genetic risk (73-99 risk alleles) and 3.7% in the highest tertile (106-125
    risk alleles). However, after adjustment for traditional factors, the genetic
    risk score did not improve discrimination or reclassification (change in c index
    from Expert Panel on Detection, Evaluation, and Treatment of High Blood
    Cholesterol in Adults [ATP III] risk score, 0; net reclassification improvement,
    0.5%; [P = .24]). The genetic risk score was not associated with cardiovascular
    disease risk (ATP III-adjusted HR/allele, 1.00; 95% CI, 0.99-1.01). In contrast,
    self-reported family history remained significantly associated with
    cardiovascular disease in multivariable models. CONCLUSION: After adjustment for
    traditional cardiovascular risk factors, a genetic risk score comprising 101
    single nucleotide polymorphisms was not significantly associated with the
    incidence of total cardiovascular disease.


    PMCID: PMC2845522 [Available on 2011/2/17]
    PMID: 20159871 [PubMed - indexed for MEDLINE]

    ReplyDelete
  6. 1: Paynter NP, Chasman DI, Paré G, Buring JE, Cook NR, Miletich JP, Ridker PM.
    Association between a literature-based genetic risk score and cardiovascular
    events in women. JAMA. 2010 Feb 17;303(7):631-7. PubMed PMID: 20159871; PubMed
    Central PMCID: PMC2845522.


    2: Ioannidis JP. Non-replication and inconsistency in the genome-wide association
    setting. Hum Hered. 2007;64(4):203-13. Epub 2007 Jun 6. Review. PubMed PMID:
    17551261.


    3: Ioannidis JP. Why most published research findings are false. PLoS Med. 2005
    Aug;2(8):e124. Epub 2005 Aug 30. PubMed PMID: 16060722; PubMed Central PMCID:
    PMC1182327.


    4: Ott J. Association of genetic loci: Replication or not, that is the question.
    Neurology. 2004 Sep 28;63(6):955-8. PubMed PMID: 15452283.


    5: Hirschhorn JN, Altshuler D. Once and again-issues surrounding replication in
    genetic association studies. J Clin Endocrinol Metab. 2002 Oct;87(10):4438-41.
    Review. PubMed PMID: 12364414.


    6: Hirschhorn JN, Lohmueller K, Byrne E, Hirschhorn K. A comprehensive review of
    genetic association studies. Genet Med. 2002 Mar-Apr;4(2):45-61. Review. PubMed
    PMID: 11882781.

    ReplyDelete
  7. OK, you just cited 5 papers which were published before the GWAS era. I fully concur that complex disease genetics was plagued with non replicating results before 2007.

    See:

    Hirschhorn JN. NEJM, 2009
    Altshuler, Daly, Lander. Science, 2008

    The JAMA paper states in the abstract that they had available: " (199 myocardial infarctions, 203 strokes, 63 cardiovascular deaths, 312 revascularizations)". The effects they were seeking to replicate are tiny -- far too small to be detected with this number of samples (I think this is on your checklist of statistical issues in a separate post?). Sensible people accept that GWAS hits have very little predictive value, but that doesn't mean they aren't real associations. The point of GWAS isn't to predict who will have a heart attack, it's to elucidate the biology of disease.

    GWAS aren't perfect, and they won't solve all of human genetics, but to maintain that they haven't yielded hundreds of genuine associations between phenotype and genotype is being willfully obtuse. I suspect I won't convince anyone in a blog comment, but come find me at the next meeting we're both at and we can discuss over a pint.

    ReplyDelete
  8. I fully agree with Jeff here: while failure to replicate was an absolutely massive problem in the candidate gene era, the vast majority of GWAS hits with genome-wide significance have successfully replicated in independent samples. There's absolutely no basis whatsoever to the claim that "MOST GWAS hits do not replicate".

    In addition, it's extremely unlikely that many of these replicated associations are due to population stratification: that would require exactly the same pattern of confounding to pop up independently in multiple samples (often taken from different European countries). While the case of the autism variant does look suspicious, most GWAS hit SNPs display nothing like this level of population differentiation.

    It's true that GWAS have captured a relatively small fraction of heritable disease risk for most complex diseases (with some exceptions, e.g. AMD and T1D). But they have nonetheless identified hundreds of novel regions associated with these diseases, the majority of which are likely genuine, and in the process have generated some genuine new insights into disease biology.

    It's good to cast a skeptical eye over scientific results - but McClellan and King totally overstep the mark in their criticisms, and IMO their claims should be treated with a great deal more caution than is displayed in this post.

    ReplyDelete
  9. Jeff & Daniel - Thanks for your comments and insight. I'm writing a very short summary of this paper for F1000 Bio, where I'll make sure to include some of the concerns you've mentioned here. Maybe we can grab that pint at the next ASHG where I'll be presenting my own GWAS results that (hopefully) replicate.

    ReplyDelete
  10. I think that there are some problems with the Paynter paper - 101 SNPs? How carefully did they establish that they were all valid? Not very according to the methods - probably many were not real which would dilute the effect of any real - see also http://bit.ly/bB2Efd

    I am not competent to judge the claims in MCK paper but I do think, agreeing with Daniel, that it went too far. It made some strong claims but with no real analysis or data to back what seemed to be not much more than assertions. Of course for a pure geneticist the hunt for many rare variants with nice new next-next-gen machines sounds a lot more fun than trying to tease out gene-gene and gene-environment interactions that might also be contributing to apparent low impact of these GWAS SNPs

    ReplyDelete
  11. Make sure to keep an eye out over at Genetic Future for an upcoming discussion on this paper that'll hopefully point out some of the flaws I missed here!

    ReplyDelete
  12. I noted several objections I have to the MKC paper over at Genetic Future, but two concerning stratification bear re-iteration:

    - many of the GWA studies include family data (trios or sibships) that corroborate the associations. These, by definition, are robust to population stratification.

    - Pop strat is even worse for rare variants; strategies for sample matching really aren't clear, either.

    So if one were unkind, one could simply dismiss all rare variant associations as being due to subtle population stratification.

    I propose calling that academic technique "pulling a King".

    ReplyDelete
  13. I agree with Chris and Jeff,

    if you are in a robust setting and look at genome wide significant hits (at the 5*10-8 level), I know of very few examples that do not replicate so GWAS is a huge success and teaches us about the underlying biology of complex disorders. i think an interesting and perhaps provocative question is why we believe the heritability estimates in the first place? they usually are done in quite small studies (there are of course excellent exceptions to this) and the heritability varies hugely between the different studies (of which most are done in a non-standardized way, using different designs, covariates etc.)- obesity as defined by bmi is a great example where heritability estimates vary between ~40-70% (ref Stunkard, A.J. et al. Jama 256, 51-4 (1986) & Maes, H.H.,et al. Behav Genet 27, 325-51 (1997)).

    It is also true that rare variants will be more sensitive to population stratification and might prove difficult to replicate (as they might be private or extremely rare).

    Good to get the discussion going though!

    ReplyDelete
  14. @Jeff:
    I'm with you regarding the replicability of GWAS hits, but I do not understand the issues you are having with such hits being neutral. It doesn't mean that they are not real. On the other hand, if common disease alleles are non-neutral (I assume you mean deleterious) over a substantial period of time, they are, by definition, declining in frequency.
    This leads to a testable hypothesis, that the risk alleles would be ancestral. This doesn't seem to be the case, at least in one of your papers - only 4 out of the 11 "previously published" risk alleles in http://www.nature.com/ng/journal/v40/n8/fig_tab/ng.175_T2.html are ancestral so many alleles are likely neutral (or have been deleterious for only a short while).

    @ChrisC:
    Transmission tests are a great argument, when available. Rare-allele stratification stratification, however, is not so bad if you consider multiple rare alleles, rather than a single one.

    @Cecilia:
    Whether GWAS had been successful depends where you set the bar. If you judge the outcome of the experiment according to the original hypotheses, those actually often talked about odds ratios of 1.5 and well-powered studies of 2000 samples. IMHO the real success had been in brilliant management of investigation in light of rejection of such hypotheses in many studies, management that involved expanding the study sizes to score against the moved the goal posts. As for heritability estimates, even if they are off by 2-fold, the GWAS-explained part still doesn't look good, and besides, high heritability has been and still is a major justification for GWAS, so this argument shoots yourself in the foot, at least to some extent.

    ReplyDelete
  15. McClellan et al recently published an essay on human genetics in the journal Cell with a large section criticizing the utility of genome-wide association studies (GWAS). Discussion on at least three diseases in the paper (hearing loss, SCA and autism) cited some of my published papers, and I therefore decided to post my comments on Internet, to set the records straight. Although I whole-heartedly agree that rare variants play a substantial role in human diseases, I also think that the section on GWAS reflects misunderstandings of the concept of GWAS, ignorance of standard practices in GWAS, misinterpretation of published primary research data, and as a result, is misinforming the general readership of Cell. These issues need to be rectified for the good of the scientific community, and for the healthy development of methodology and practice of human genetic research.

    For inpatient readers, these are the bullet points: (1) GWAS interrogate disease loci through linkage disequilibrium, so the lack of known biological function on GWAS SNPs does not justify the attack against GWAS by McClellan et al; (2) Methods for adjusting population stratification are well established in the GWAS community; it is not a valid argument to explain most GWAS signals (with odds ratio less than 2) by stratification, especially if family-based study design is used (including the autism GWAS); (3) McClellan et al used rs4307059 (from autism GWAS) as a “particularly dramatic” example of stratification because its frequency varies across Europe and it is monoallelic in Africa, which is not scientifically and statistically justified. In fact, it is the nature of SNPs to have differing allele frequencies across populations, and almost half of the SNPs in Illumina array have higher Fst population divergence values than rs4307059 (that is, half the SNPs are more variable than rs4307059 across human populations). Below I elaborate these points more specifically for interested readers.

    ReplyDelete
  16. (1) McClellan et al use the fact that most detected SNPs in GWAS are from intergenic regions to question the utility and the reliability of GWAS, and raised a serious question “How did genome-wide association studies come to be populated by risk variants with no known function?”. In fact, GWAS do not attempt to identify functional SNPs, but rather identify approximate location of loci that harbor disease variants. This is possible due to the extensive linkage disequilibrium (LD) between segregating sites in a given human population. Most SNPs in SNP arrays have unknown biological function, only because most SNPs in HapMap are outside of coding regions and because manufacturers of SNP arrays usually do not select SNPs by known function. Unfortunately, this fact may not be well known outside of the GWAS community, such as most readers of the journal Cell. McClellan et al did mention LD but they did not recognize that GWAS do not attempt to interrogate causal variants in the first place. More interestingly, they discussed the SCA GWAS and hearing loss GWAS that I published; the hits in both GWAS are actually outside but close to the causal gene (HBB and GJB2), yet they tag exonic variants in the causal gene, representing two particularly vivid and classic examples on how GWAS work through LD. It is unclear how McClellan et al can discuss these two examples extensively by ignoring the basic facts that both non-coding hits indeed faithfully tag the causal variants in causal genes through the magic of LD. For readers not familiar with GWAS, I need to also emphasize that GWAS variants were typically referred to as “risk variants” only because of convention of published literature, not because they are the actual functional variants that confer risk. Unlike what some readers may think based on McClellan et al, 100% of Africans carry a risk allele does not suggest that all subjects of African descent are predisposed to risk; it merely suggest that LD patterns in European and African populations at a locus are different. One cannot interpret GWAS results without acknowledging these basic facts.

    (2) McClellan et al erroneously attributed many published GWAS hits as caused by population stratification, as if GWAS used similar strategies as candidate gene association studies. Without any scientific support, they even claimed that “an odds ratio of 3.0, or even of 2.0 depending on population allele frequencies” would be robust to be interrogated in GWAS. In fact, the beauty of whole-genome SNP data is that inflation of test statistics due to population substructure can be identified and adjusted. Populations do not differ in one or two SNPs; they differ in many loci and that explains why whole-genome data helps identify stratification, and several recent studies already show how extremely fine-scale sub-populations in Europe can be separated by whole-genome data. The GWAS community has established methods to deal with population stratification and these methods are fairly effective for common variants without any controversy in the field. There are certainly some challenges on analyzing rare variants or recently admixed populations, and these are research topics that we are actively studying. McClellan et al failed to inform readers of the standard practices of genomic control, EigenStrat, multi-dimensional scaling or many dozens of other approaches for addressing stratification, which are now commonly used in case/control GWAS. Furthermore, family-based study design in GWAS has the advantage of protecting against stratification, which should be emphasized to readers. For example, McClellan et al attacks our autism paper as false positive due to population stratification, but our paper is largely driven and replicated by family-based cohorts, not case/control cohorts. Therefore, their general claim lacks scientific support, ignores massive amounts of work by the statistical genetics community in developing stratification adjustment methods, and reflects unrealististic speculation and unfamiliarity with standard GWAS practices.

    ReplyDelete
  17. (3) McClellan et al mistakenly treated GWAS hits as “false positive”, if their allele frequencies vary across European populations or HapMap populations. The allele frequency variation for ANY (I mean it, ANY!) SNP across populations is not something that should be surprising to researchers with substantial GWAS knowledge. Of course, it is the very nature of ANY SNP to have variable allele frequencies across human populations, so that Asians, Caucasians and Africans differ from each other. I have no idea what McClellan et al are surprising about, as they probably thought that most SNPs should have similar allele frequencies in all populations. Specifically, they treated the SNP rs4307059, reported by us to be associated with autism, as a “particularly dramatic example of the perils of cryptic population stratification”. Their reasoning on “stratification” is that the frequency of the proposed risk variant varies from 0.21 to 0.77 across European populations and that it is monomorphic in African populations. In reality, the allele frequency of rs4307059 is fairly consistent among large cohorts of European Americans (MAF=39%), WTCCC (MAF=38%), POPRES British (MAF=39%), POPRES Spanish (MAF=37%). In HGDP data, I did confirm that the allele frequency differ in Tuscany (MAF=75% in 7 samples, yes you read it right, SEVEN) and Orcadian (MAF=25% in 15 samples), but readers should be aware that frequency estimate depends on the sample size (seriously, mathematically, what would you expect from 7 or 15 samples, and how much do these two populations contribute to genes in European Americans?). Furthermore, assuming that allele frequency measures are indeed accurate, if we want to do science rigorously, we need appropriate control experiments, so let us compare this SNP with others in the same genomic region: there is no any evidence of increased population differentiation for this particular SNP in 2Mb genomic region across human populations (chr5:25500000..26499999 in the HGDP browser http://hgdp.uchicago.edu/cgi-bin/gbrowse/HGDP/). Finally, if we examine the SNP in the context of the whole genome, based on HGDP browser, we can see that 44% of SNPs (-log(0.44)/log(10)=0.35 for rs4307059 in the “Fst” track, raw data http://hgdp.uchicago.edu/Browser_tracks/FST/) in the Illumina array have a more extreme Fst values than this SNP, so about half of the SNPs have stronger population divergence than this SNP. One cannot just take a random SNP from the MIDDLE of a ranked list and claims it as “particularly striking” example of population stratification. Any such claim needs to be made in the context of comparative analysis with other SNPs, otherwise it is not a scientifically rigorous practice and serves a purpose solely to misinform readers outside of the field.

    ReplyDelete
  18. (4) McClellan et al mistakenly interpreted the hearing loss GWAS and SCA GWAS that we published in PLoS Biology. Interestingly, they even have a somewhat opposite interpretation of the primary research data presented in our paper: our original purpose is to demonstrate how rare variants may contribute to human diseases (and may show up in GWAS through LD with common SNPs in Illumina arrays), so our paper should really be interpreted as supporting the arguments for studying rare variants in their paper. For readers, I need to clarify that SCA is a classic example of heterozygosity advantage in any genetic textbook, and our study demonstrates how rare alleles under balancing selection can show up in GWAS. On the other hand, hearing loss is known to be caused by many genes but the major cause is GJB2 mutation, so the GWAS demonstrates that moderately rare alleles (MAF=1.2%) can be picked up by GWAS without balancing selection. I simply do not understand what they are trying to get by “had inherited hearing loss been investigated in a region where it is more common (e.g., in the Middle East), ……”, as any GWAS should be focused on a specific ethnicity group, and I cannot just combine Caucasians with Middle East people together and of course this will dilute the signal in GWAS. Why would I even bother to apply GWAS “in heterogeneous populations of common diseases” at all, as suggested by McClellan et al, when the very power of GWAS comes from examination of LD? I do not understand how they can take the exactly same results and re-interpret the data and get a drastically different interpretation from the data.


    (5) McClellan et al’s interpretation of the autism locus is wrong. McClellan et al utilized this as an example of “false positive”, without any valid scientific evidence (differences of allele frequencies in Tuscany and Africans does NOT suggests false positive in European Americans!). Another study (Weiss et al) cited by McClellan et al was not able to garner evidence for this SNP, but the study has very small non-overlapping sample size and therefore little power to “replicate” loci with moderate effect sizes. Furthermore, Weiss et al used family-based association test (TDT test), so there is no comparison of case/control allele frequencies as mentioned by McClellan et al. I seriously doubt whether McClellan et al actually read either paper carefully, otherwise I do not see where a gross mis-interpretation of primary research data could come from. Due to power issues and sample comparability issues, Weiss and Arking (both are nice people who I know) faithfully described their research results in the paper without comments, yet McClellan et al mistakenly interpolate these primary results without scientific support and attach a “false positive” label that completely misled the scientific community. On the other hand, McClellan et al failed to mention another companion study identifying this same locus purely by family-based cohorts (Annals of Human Genetics). In addition, a paper in press shows that the SNP also functions as a quantitative trait locus for autistic traits in ~8000 children in a single UK city born at the same year, which pretty much blows away any concern on stratification in case/control studies. For me, these are compelling evidence that population stratification does not explain the signal, though I think that functional studies are certainly necessary to identify causal variants and to study their roles. In summary, their criticism on the autism locus lacks any rigorous scientific support whatsoever, and can probably be better explained by non-scientific reasons.

    ReplyDelete
  19. I will send a shortened version of my comments to Cell. I cannot predict what will be the outcome of this appeal, but I would appreciate comments from readers of this post and I will try to address them. I wonder what is the appropriate balance between academic freedom and scientific responsibility for researchers to make comments on subjects outside of their expertise in the absence of rigorous scientific support; I also wonder what is the appropriate standard for basic fact checking for journals to publish especially strong claims, even for non-research articles (essays/commentary/review), and what is the appropriate response from well-respected journals to recognize and rectify these mistakes. Let us wait and see.

    ReplyDelete
  20. Sorry if I join this discussion only now but I've just discovered this beautiful blog. In this paper (doi:10.1371/journal.pone.0007927) it is reported, even if with a trivial approach, an enrichment for genes with high level of population differentiation among genes associated to complex diseases. It is well recognized that population differentiation is one possible signature of natural selection. In the limits of this approach, I think the authors gave a reasonable support to the CDCV hypothesis. Despite of this, I absolutely don't believe in a black-or-white world... Raise an hand who actually believe that one hypothesis between CDRA and CDCV is totally wrong.

    ReplyDelete
  21. It certainly appears that those who adhere to the dogma of GWAS (since I am unable to detect it in the above discussion) cannot provide a plausible rebuttal to the fundamental and inherent limitation of genome wide association studies. The strength of an association between a SNP and a causitive mutation is only as good as the relative ages of the two loci. Recurring mutations on different genetic backgrounds and recombination events will only degrade the association with time. It may be true that a statistically significant association may be detected at one time; but the association will be attenuated as a function of time.

    ReplyDelete
  22. Agreed Dr. F and I also believe that anticipation pedigree charts and AS/PWS mutations are nothing short of a B****

    ReplyDelete

Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.