tag:blogger.com,1999:blog-6232819486261696035.post5032077491678298635..comments2023-09-25T09:01:44.323-05:00Comments on Getting Genetics Done: Mapping SNPs to Genes for GWAS Enrichment AnalysisStephen Turnerhttp://www.blogger.com/profile/06656711316726116187noreply@blogger.comBlogger19125tag:blogger.com,1999:blog-6232819486261696035.post-58759040639118476332014-01-23T04:45:45.780-06:002014-01-23T04:45:45.780-06:00Hi Stephen
Thanks for the great website, and all t...Hi Stephen<br />Thanks for the great website, and all the tutotrials.<br />I have a question if you could assist me - I have th output from a GWAS, and want now to do the permutations based set-based test in Plink, to have a gene-based analysis. I need for that a list with ll rsnumbers assign to each gene, in a list. After seeing all these steps I am not sure the result from here is what I want. Am I doing something wrong, or could you advise for which other tools I should use?<br /><br />Thank you in advance, <br />my best regards<br /><br />NunoAnonymoushttps://www.blogger.com/profile/17231681042616767578noreply@blogger.comtag:blogger.com,1999:blog-6232819486261696035.post-62549452753849981862013-05-13T13:59:20.363-05:002013-05-13T13:59:20.363-05:00Hi Stephen
Thanks very much for sharing experience...Hi Stephen<br />Thanks very much for sharing experiences. <br />I am a big fan of Getting Genetics Done. It is really helpful to me.<br />However, I found that GWAS Enrichment for some species such as pig are very poor while many overlapped packages just for focusing on human data. The only way we can do by trying to using Biomart to get all GOs, and KEGG to get all pathways... than to perform test by our-self. <br />Do you have any suggestion in this cases? <br />Best regards <br />DDO fchttps://www.blogger.com/profile/05380776628706361111noreply@blogger.comtag:blogger.com,1999:blog-6232819486261696035.post-16387087231353671642013-04-03T11:17:05.670-05:002013-04-03T11:17:05.670-05:00Hi Stephen!
I have been using this annotation meth...Hi Stephen!<br />I have been using this annotation method and I wanted to share an issue I came upon.<br />When I download the *.bed from UCSC, the chromosome column format is, e.g.,: "chr4", when in the list you provide the format is simply "4". If we execute bedtools this way, it will not retrieve any result becuase chromosomes do not match. Even if it is a straightforward thing to correct, maybe it could complicate things at first until you realize about it.<br />Apart from this, I have to say that all this approach you suggest works fine. And the gene lists are specially appreciated... ;)<br /><br />Best,<br /><br />juanAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-6232819486261696035.post-6812001802304783972013-03-11T10:42:42.377-05:002013-03-11T10:42:42.377-05:00I've figured out the solution: the intersectio...I've figured out the solution: the intersection is empty because the first columns with the chromosome numbers don't match (when in one file it's chr1 in the other it's just 1). We can fix the problem by adding chr to the first column of entrezgenes.bed. This command creates the file entregeneschr.bed where the prefix chr is added: <br />sed -e 's/^/chr/' entrezgenes.bed>entrezgeneschr.bed<br />Great thanks for the blog! Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6232819486261696035.post-72422805686571492802013-03-11T08:24:58.124-05:002013-03-11T08:24:58.124-05:00Hi Rish, I have the same problem now. Had you figu...Hi Rish, I have the same problem now. Had you figured out a solution?Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6232819486261696035.post-78898509642894067672012-12-05T15:35:23.396-06:002012-12-05T15:35:23.396-06:00This data is now citeable. See citation details at...This data is now citeable. See citation details at <a href="http://dx.doi.org/10.6084/m9.figshare.103113" rel="nofollow">http://dx.doi.org/10.6084/m9.figshare.103113</a>.Stephen Turnerhttps://www.blogger.com/profile/06656711316726116187noreply@blogger.comtag:blogger.com,1999:blog-6232819486261696035.post-89959464081394184822012-09-04T14:24:59.928-05:002012-09-04T14:24:59.928-05:00thanks for providing the gene files, i am going to...thanks for providing the gene files, i am going to use them.<br />rahanoreply@blogger.comtag:blogger.com,1999:blog-6232819486261696035.post-48735638235378065362012-01-26T09:14:26.545-06:002012-01-26T09:14:26.545-06:00Oh goodness, I had not even thought of it that way...Oh goodness, I had not even thought of it that way. Thank you. I will dig more and see where I go. I appreciate your help...Lekkihttps://www.blogger.com/profile/10772159761761509969noreply@blogger.comtag:blogger.com,1999:blog-6232819486261696035.post-32878219812493338902012-01-26T09:03:36.478-06:002012-01-26T09:03:36.478-06:00Lekki - please see some of the links in the first ...Lekki - please see some of the links in the first paragraph of this post. You have to be careful of pathway or enrichment analysis with GWAS data. You have a problem here that you don't have with microarrays - namely, that you have many SNPs per gene, different gene sizes, and different SNP densities in genes. Think of the problem like this. Imagine the null is true - you have random numbers for your phenotype. A large, SNP-dense gene, say a gene with 100 SNPs, will have about 5 SNPs with a nominal p-value of <0.05. Compare this to a gene with only one SNP - it probably won't be <0.05 by chance. So you have this problem that large SNP-dense genes will by chance have a significantly associated SNP than smaller, SNP-sparse genes. If you condense entire genes down to a single-best p-value, you're biasing any kind of downstream pathway or enrichment analysis towards larger genes with more SNPs. And this doesn't even get into the LD problem. Methods like ALLIGATOR have permutation procedures built into the method to account for this problem. So do some of the SNPath functions. Take a look at some of those to get you started.Stephen Turnerhttps://www.blogger.com/profile/06656711316726116187noreply@blogger.comtag:blogger.com,1999:blog-6232819486261696035.post-69554344082358133212012-01-26T07:49:00.139-06:002012-01-26T07:49:00.139-06:00I am perhaps too much of a dilettante to be meande...I am perhaps too much of a dilettante to be meandering this board... but all people messing with genetics data should at least try to do the best analysis, right? So, I would like to do a gene based analysis form GWAS data. I.e. get a p-value per gene (preferably corrected for multiple testing) not per SNP. I am at the point where I have just my GWAS ouput: a list of rs numbers from the affy 6.0, their chr, position and the P-value for association with a continuous trait. Any ideas how I might be able to go from this to a gene based P-value set? I have no a priori set of genes that I am interested in, so anything that involves looking things up by hand is going to be somewhat unfeasible.<br /><br />Thanks for any pointers anyone can give,<br /><br />LekkiLekkihttps://www.blogger.com/profile/10772159761761509969noreply@blogger.comtag:blogger.com,1999:blog-6232819486261696035.post-4876244166506272312011-12-16T10:11:11.146-06:002011-12-16T10:11:11.146-06:00Hi, Thank you for providing such user friendly too...Hi, Thank you for providing such user friendly tools! I tried to use your codes to map single SNP to multiple Genes within a certain distance range, but I have following problems:<br /><br />1)The following command will produce negative starting positions for some SNPs and it seems your tool "intersectBed" doesn't allow negative positions. <br />awk '{printf("%d\t%d\t%d\t%s\n", $1, $2-10000, $3+10000, $4); }' ensemblgenes.bed<br /><br />2) After I corrected the negative positions by replacing them with 0s in the bed file, the following command can be run but produced empty result.<br />intersectBed -a SNPs.bed -b entrezgenes.bed -wa -wb | awk '{printf("%s\t%s\n", $4, $8);}'Wenhttps://www.blogger.com/profile/01002587928757706406noreply@blogger.comtag:blogger.com,1999:blog-6232819486261696035.post-15466033614988896702011-12-03T11:53:16.313-06:002011-12-03T11:53:16.313-06:00Hello, This post is very helpful, however when I r...Hello, This post is very helpful, however when I running the command, it is not printing anything to screen or creating a new file. So I'm not sure if this is running properly. I followed all the instructions carefully. I even tried intersecting with all three gene lists, with no luck. <br /><br />Where would this new tab-limited file be created?<br /><br />I am not even getting any errors when running the command, would this just mean that my list of snps are not mapping to any gene?<br /><br />Thank you so much for your helpRishhttps://www.blogger.com/profile/03774002869961388114noreply@blogger.comtag:blogger.com,1999:blog-6232819486261696035.post-10369544811085534232011-07-29T15:13:45.021-05:002011-07-29T15:13:45.021-05:00[Full disclosure: coauthor of following rec]
You ...[Full disclosure: coauthor of following rec]<br /><br />You might also want to look at DAPPLE, which defines enriched networks from protein interaction data rather than testing previously defined pathways.<br /><br />http://www.broadinstitute.org/mpg/dapple/dapple.php<br /><br />Cheers<br />ChrisChris Cotsapashttps://www.blogger.com/profile/02256985652919023996noreply@blogger.comtag:blogger.com,1999:blog-6232819486261696035.post-39825361960842436822011-07-20T14:14:37.181-05:002011-07-20T14:14:37.181-05:00That's not an ignorant question at all! First,...That's not an ignorant question at all! First, I believe the HUGO file has 20975 (not 22975) -- still there are ~3000 genes that don't overlap. When building these lists, we examined the agreement between what Ensembl calls a "gene" and what Entrez calls a "gene". The 17685 from Entrez represents the set of genes where the two resources agree. We actually pulled the hugo symbols from Ensembl and didn't really restrict that set at all. Most of the genes that aren't common between the sets are computational predictions or genes that aren't characterized very well and thus aren't likely to be useful for enrichment analyses.Willhttps://www.blogger.com/profile/09703349044940180835noreply@blogger.comtag:blogger.com,1999:blog-6232819486261696035.post-37258189932969188892011-07-20T07:37:44.618-05:002011-07-20T07:37:44.618-05:00Thanks for the helpful blog, it's such a great...Thanks for the helpful blog, it's such a great resource!<br /><br />One quick question. I can see that some Hugo genes don't have a corresponding Entrez ID (22,975 HUGO genes in the genelist vs 17,685 Entrez ID's). Forgive the ignorant question, but I am very keen on using these genelists so just wondering why this is?Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6232819486261696035.post-61287035830700534872011-07-13T01:57:16.639-05:002011-07-13T01:57:16.639-05:00What great timing! I was just sitting down to figu...What great timing! I was just sitting down to figure out how to do this, and now I see you've done it already!Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6232819486261696035.post-85636264691840549032011-07-09T07:50:55.845-05:002011-07-09T07:50:55.845-05:00Hi.
We also get a tool PfSNP http://pfs.nus.edu.s...Hi.<br /><br />We also get a tool PfSNP http://pfs.nus.edu.sg which can easily map the SNP to gene. It requires rs No. And gives extra information like whether the SNP is potentially functional or not.Jingbo Wanghttps://www.blogger.com/profile/02206391899811193118noreply@blogger.comtag:blogger.com,1999:blog-6232819486261696035.post-58507119224352117592011-07-07T10:25:49.711-05:002011-07-07T10:25:49.711-05:00Thanks Gerome! I haven't used FORGE yet, but ...Thanks Gerome! I haven't used FORGE yet, but I'll add it to the list -- I really like the idea of a gene-based statistic for enrichment analyses. It seems like a better way to handle the biases due to gene size, etc.Willhttps://www.blogger.com/profile/09703349044940180835noreply@blogger.comtag:blogger.com,1999:blog-6232819486261696035.post-7751497775516420652011-07-07T07:16:59.629-05:002011-07-07T07:16:59.629-05:00Hi,
We have a perl utility which can makes these f...Hi,<br />We have a perl utility which can makes these files using the ensembl perl API as part of our tool FORGE. This link gives the very short tutorial:<br /><br />https://github.com/inti/FORGE/wiki/Tutorials-1%3A-Make-snp-to-gene-mapping.<br /><br />It can be downloaded from the other link below along with the rest of package to conduct gene-based analysis of GWAS and subsequent pathway analysis.<br /><br />https://github.com/inti/FORGE<br /><br />Regards<br /><br />Gerome Breen gerome.breen AT kcl.ac.ukGerome Breenhttps://www.blogger.com/profile/02236390306739261314noreply@blogger.com