Do you ever find yourself switching back and forth between your work computer, your laptop, and your home computer? This happens to me all the time when I'm writing. Rather than carry all your files on a USB stick and risk losing it or corrupting your data, give Dropbox a try. It's dead simple, and works for PC, Mac, and Linux too.
Once you sign up and install on all your computers, you'll have a special folder, where if you save something there on one computer, it is automatically created and stays synchronized in the same folder on all your other computers. What's more, if you use someone else's computer, you can access all your files through a web interface because they're all securely backed up online. I've been using this for a while now to sync all the papers I'm working on, RefMan/EndNote databases, config files, and R functions I reuse all the time. You can also create "public" folders. Put something here, and you can get a direct link to the file online to share with other folks. For example, here's a link to some R code I wrote to use ggplot2 to make manhattan plots and QQ-plots for every PLINK output file in the current directory (I'm hoping to clean this code up and include this with some other functions I've written into a package on CRAN soon).
I can't recommend this little app enough. If you're still not convinced, check out this short video that explains what Dropbox is all about and shows of just how simple it is to use.
You get a whopping 2GB for free, but if you use the registration link provided below, you'll get an extra free 1/4GB. Happy holidays from GGD, and I'll catch up with you all next week!
Dropbox - Secure online backup and synchronization
Tuesday, December 22, 2009
Sync files across multiple computers with Dropbox (PC, Mac & Linux too!)
Tags:
Productivity,
Software,
Writing
Thursday, December 17, 2009
Review: The challenges of sequencing by synthesis
A tip of the hat to a commenter on my previous coverage of a next-gen sequencing paper for pointing out this detailed and perhaps more technically-oriented review on sequencing by synthesis recently published in Nature Biotechnology. Thanks, Clive.
Review: The challenges of sequencing by synthesis (Nature Biotechnology)
Review: The challenges of sequencing by synthesis (Nature Biotechnology)
Tags:
Recommended Reading,
Technology
Wednesday, December 16, 2009
Recent improvements to Pubget
If you've never heard of it before, check out my previous coverage on Pubget. It's like PubMed, but you get the PDFs right away. Pubget has recently implemented a number of improvements.
1. Citation matching. Pubget's citation matcher seems to work better than Pubmed most of the time. Try going to Pubget and pasting any of these random citations into the search bar:
J Biol Chem 277: 30738-30745
Nucleic Acids Res 2004;32:4812-20.
Evol. Biol. 7, 214 (2007).
2. The PaperPlane bookmarklet. Go here and drag the link to your bookmark toolbar. Now, if you're searching from pubmed, click the bookmarklet for one-click access to the PDF.
3. If you have a long list of PMIDs, separate them with commas and you can paste them directly into the search bar.
Pubget (Vanderbilt institutional link)
Pubget (If you're anywhere else)
1. Citation matching. Pubget's citation matcher seems to work better than Pubmed most of the time. Try going to Pubget and pasting any of these random citations into the search bar:
J Biol Chem 277: 30738-30745
Nucleic Acids Res 2004;32:4812-20.
Evol. Biol. 7, 214 (2007).
2. The PaperPlane bookmarklet. Go here and drag the link to your bookmark toolbar. Now, if you're searching from pubmed, click the bookmarklet for one-click access to the PDF.
3. If you have a long list of PMIDs, separate them with commas and you can paste them directly into the search bar.
Pubget (Vanderbilt institutional link)
Pubget (If you're anywhere else)
Tuesday, December 15, 2009
Seminar announcement: A Multivariate Methodology for Analyzing Genome-wide Association Studies
This looks interesting.
Department of Biostatistics Seminar/Workshop Series: A Multivariate Methodology for Analyzing Genome-wide Association Studies, by Janice Brodsky, PhD, UCLA.
Wednesday, December 16, 1:30-2:30pm, MRB III Conference Room 1220
Intended Audience: Persons interested in applied statistics, statistical theory, epidemiology, health services research, clinical trials methodology, statistical computing, statistical graphics, R users or potential users
In the last few years, high-dimensional genome-wide association (GWA) studies have become a common tool in genetics for investigating which genes are associated with physical traits. However, the results of many GWA studies have fewer genes than expected or even no genes at all. This does not necessarily indicate that there are no genetic associations in the data: genes with weaker associations or which only work in groups will be missed with the standard GWA statistical analysis. We present a multivariate methodology for analyzing GWA data which is designed to handle weaker signals, dependent data, and multicollinearity. We applied this method to a large GWA study, and the results were consistent with previously performed studies. We also discuss extensions of the methodology.
Department of Biostatistics Seminar/Workshop Series: A Multivariate Methodology for Analyzing Genome-wide Association Studies, by Janice Brodsky, PhD, UCLA.
Wednesday, December 16, 1:30-2:30pm, MRB III Conference Room 1220
Intended Audience: Persons interested in applied statistics, statistical theory, epidemiology, health services research, clinical trials methodology, statistical computing, statistical graphics, R users or potential users
In the last few years, high-dimensional genome-wide association (GWA) studies have become a common tool in genetics for investigating which genes are associated with physical traits. However, the results of many GWA studies have fewer genes than expected or even no genes at all. This does not necessarily indicate that there are no genetic associations in the data: genes with weaker associations or which only work in groups will be missed with the standard GWA statistical analysis. We present a multivariate methodology for analyzing GWA data which is designed to handle weaker signals, dependent data, and multicollinearity. We applied this method to a large GWA study, and the results were consistent with previously performed studies. We also discuss extensions of the methodology.
Tags:
Announcements
Follow GGD on Twitter @genetics_blog
GGD is now on Twitter! I'll be linking to all of our posts on the Twitter page, and occasionally post something there that may not make its way into a full length post here on the blog. You can follow us on Twitter here @genetics_blog.
Tags:
Announcements,
Twitter
Browse R Graphics with the R Graph Gallery and the R Graphical Manual
One of R's biggest strengths is its unparalleled graphing capabilities. Just see any of our previous posts on ggplot2, visualization, or other posts tagged with R. R has several fundamentally different systems for plotting, including base graphics, lattice, and ggplot2. Furthermore, many add-on packages come with their own functions for producing problem-domain specific graphics. For example, see GenABEL, a very nice R package for GWAS analysis, which has functions for producing manhattan plots, LD plots, etc.
Now let's say you've seen a certain graphic before, and you want to find the package you need to download and which function you should use to make the plot. That's where the R Graph Gallery and the R Graphical Manual can become very useful. Both sites give you thumbnail previews of graphics produced by functions bundled with certain R packages, code for producing the graphic, and which R packages you need to download for the functions used to create the graphic. The R Graphical Manual is much more comprehensive, and is categorized based on CRAN Task Views (CTV) categories (check out all 29 pages of graphics in the Genetics task view).
R Graphical Manual
R Graph Gallery
Now let's say you've seen a certain graphic before, and you want to find the package you need to download and which function you should use to make the plot. That's where the R Graph Gallery and the R Graphical Manual can become very useful. Both sites give you thumbnail previews of graphics produced by functions bundled with certain R packages, code for producing the graphic, and which R packages you need to download for the functions used to create the graphic. The R Graphical Manual is much more comprehensive, and is categorized based on CRAN Task Views (CTV) categories (check out all 29 pages of graphics in the Genetics task view).
R Graphical Manual
R Graph Gallery
Tags:
R,
Visualization
Monday, December 14, 2009
Sequencing technologies — the next generation
Following up on last week's coverage of the Genotyping Portal, check out this new review article on next-generation sequencing in Nature Reviews Genetics. One major focus of this paper is that the next generation of sequencing platforms each use fundamentally different technologies. Because of this, it's likely that multiple platforms will coexist in the marketplace, and different platforms will have clear advantages over others for particular biological applications. The paper has some nice figures illustrating how the technology works in sequencing by reversible terminators used by Illumina/Solexa and Helicos BioSciences, emulsion PCR used by Life/APG's SOLiD ligation platform and the Roche/454 Pyrosequencing system, and the highly-anticipated real-time single-molecule sequencing from Pacific Biosciences. Finally, there's a table giving the pros, cons, biological applications, cost, read length, run time, and references for each of the next-gen sequencing applications. Finally, a revealing piece of information I found in the last table showing sequencing statistics on personal genomes shows that the sequencing of Stephen Quake's genome with Helicos a few months ago cost only $48,000, a decrease of several orders of magnitude compared to the sequencing of J. Craig Venter's genome (Sanger), which cost an estimated $70,000,000 just a few years ago.
The author of the paper, Michael L. Metzker, is an associate professor of genetics at Baylor College of Medicine, a senior manager at the Human Genome Sequencing Center at Baylor, and President & CEO of LaserGen, Inc., Houston, TX.
Sequencing Technologies - The Next Generation (NRG AOP)
Tags:
Recommended Reading,
Technology
Tuesday, December 8, 2009
Genotyping Portal: A comprehensive (and freely available) online resource about methods for DNA genotyping, screening and sequencing
Diego Forero has compiled a comprehensive list of primary publications on commonly used SNP genotyping and DNA sequencing technologies (including SNP arrays, Sequenom, TaqMan, Pyrosequencing, Molecular Beacons, FP-TDI, Invader, xMAP, SNaPshot, SNPlex, Sanger, 454, Illumina, Helicos, SOLiD, Complete Genomics, Bisulfite sequencing, and others). Also included here are links to review articles, protocols, and links to manufacturers of reagents and equipment. Where available, links are included to open access versions of the papers on PubMed Central.
This is an excellent resource for anyone who is generally interested in how these technologies work. For 2nd year grad students at Vanderbilt, you will be asked about some of these technologies on your qualifying exam!
Genotyping Portal: A comprehensive (and freely available) online resource about methods for DNA genotyping, screening and sequencing
Tags:
Technology
Monday, December 7, 2009
Use PuTTY and XMing to see Linux graphics via SSH on your Windows computer
Do you use SSH to connect to a remote Linux machine from your local Windows computer? Ever needed to run a program on that Linux machine that displays graphical output, or uses a GUI? I was in this position last week trying to make figures using ggplot2 in R of results from an analysis of GWAS data which required using a 64-bit Linux machine with more RAM than my 32-bit windows machine can see.
You try plotting something in R on a Linux machine in an SSH session you'll get this nasty error message:
Turns out there's a very easy way to see graphical output over your SSH terminal. First, if you're not already using PuTTY for SSH, download putty.exe from here. Next, download, install, and run Xming. While Xming is running in your system tray, log into the Linux server as you normally would using PuTTY. Then type this command at the terminal to log into the linux server of your choice (here, pepperjack), with the -X (uppercase) to enable X11 forwarding.
If all goes well you should now be able to use programs that utilize graphical output or interfaces, which are running on the remote Linux machine rather than your local windows computer.
Xming - PC X Server
Xming download link on SourceForge
You try plotting something in R on a Linux machine in an SSH session you'll get this nasty error message:
Error in function (display = "", width, height, pointsize, gamma, bg,:
X11 I/O error while opening X11 connection to 'localhost:10.0'
Turns out there's a very easy way to see graphical output over your SSH terminal. First, if you're not already using PuTTY for SSH, download putty.exe from here. Next, download, install, and run Xming. While Xming is running in your system tray, log into the Linux server as you normally would using PuTTY. Then type this command at the terminal to log into the linux server of your choice (here, pepperjack), with the -X (uppercase) to enable X11 forwarding.
ssh -X pepperjack.mc.vanderbilt.edu
If all goes well you should now be able to use programs that utilize graphical output or interfaces, which are running on the remote Linux machine rather than your local windows computer.
Xming - PC X Server
Xming download link on SourceForge
Tags:
Linux,
Visualization
Tuesday, December 1, 2009
Get Started with Machine Learning in R
A Beautiful WWW put together a great set of resources for getting started with machine learning in R. First, they recommend the previously mentioned free book, The Elements of Statistical Learning. Then there's a link to a list of dozens of machine learning and statistical learning packages for R. Next, you'll need data. Hundreds of free real datasets are available at the UCI machine learning repository. Each dataset, such as this breast cancer dataset from Wisconsin, has its own page giving a summary, links to publications of major findings, and detailed descriptions of the variables in the data. If you want to simulate genetic data, check out our software, genomeSIMLA, capable of simulating gene-gene interactions in case-control and family-based GWAS-sized datasets with realistic patterns of linkage disequilibrium. If you're interested, check out the genomeSIMLA paper. Finally, if time is not an issue, consider taking MIT's OpenCourseWare Machine Learning course. Alternatively, check out Stanford Engineering professor Andrew Ng - all his lectures are available on youtube. Here's the first lecture.
For more, check out the link below.
A beautiful WWW: Guide to Getting Started in Machine Learning
For more, check out the link below.
A beautiful WWW: Guide to Getting Started in Machine Learning
Tags:
Machine Learning,
R
Subscribe to:
Posts (Atom)







Will Bush, contributor, is a postdoctoral fellow in the 