Monday, May 9, 2011

Accessing Databases From R

Jeffrey Breen put together a useful slideshow on accessing databases from R. I use RODBC every single day to access my own local MySQL server from R. I've had trouble with RMySQL, so I've always used RODBC instead after setting up my localhost MySQL server as a Windows data source. Once you get accustomed to accessing your data directly with SQL queries rather than dumping files you'll wonder why you waited so long. After you get a handle on using SQL you can take it one step further and use SQL queries on data.frames with the sqldf package for quick and easy data management and group summaries (which happens to be way faster than plyr, aggregate, doBy, and data.table for simple grouping tasks).



Also, if you're new to databases, check out Will's previous post on how to store and organize results from a genetic study using MySQL, or take a look at the w3schools SQL tutorial.

Greater Boston UseR Group Files via (@RevoDavid)

2 comments:

  1. How does RODBC compare to RJDBC? We are forced to use the latter at work, it's very, very slow, but eventually, it works.

    ReplyDelete

Note: Only a member of this blog may post a comment.

Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.