Getting Genetics Done: PubMed

Showing posts with label PubMed. Show all posts

Tuesday, October 22, 2013

PubMed Commons: One post-publication peer review forum to rule them all?

Several post-publication peer review forums already exist, such as Faculty of 1000 or PubPeer, that facilitate discussion of papers after they have already been published. F1000 only allows a small number of "faculty" to comment on articles, and access to read commentary requires a paid subscription. PubPeer and similar startup services lack a critical mass of participants to make such a community truly useful. And while the Twitter-/blogosphere space is great for discussing scientific research, commentary is fragmented, scattered across many blogs, Google+, and Twitter feeds (not to mention, discussion content is owned by Twitter, Google, and other dot-com services with no guarantee of permanence).

Enter PubMed Commons.

PubMed Commons is a system that enables researchers to share their opinions about scientific publications. Researchers can comment on any publication indexed by PubMed, and read the comments of others. PubMed Commons is a forum for open and constructive criticism and discussion of scientific issues. It will thrive with high quality interchange from the scientific community. PubMed Commons is currently in a closed pilot testing phase, which means that only invited participants can add and view comments in PubMed.

PubMed Commons is currently in an invitation-only pilot phase. But if this catches on, it could easily become the post-publication peer review forum that solves many of the problems mentioned above. PubMed commons would be open to anyone that registers with a MyNCBI ID, meaning discussion would not be limited to a select "faculty" of few. Discussion would be collated and hosted by NCBI/NLM, which I like to think has a longer half-life than Google's latest foray into social networking or other dot-com ventures. Most importantly, PubMed use is ubiquitous. Whether you use PubMed's search directly or you land on a PubMed abstract from Google Scholar, you'll almost always link out to a paper from a PubMed abstract. This means that the momentum and critical mass to make a forum like this actually useful already exists.

The platform for publishing comments looks pretty simple (and even supports Markdown syntax!):

Another critical feature is the ability to invite the authors of the paper to join the discussion:

Right now PubMed Commons is invitation-only, but I'm optimistic about the public launch and hope to see this catch on.

PubMed Commons: http://www.ncbi.nlm.nih.gov/pubmedcommons/

Friday, February 17, 2012

Your Publications (with PMCID) as a PubMed Query

I'm updating my CV and biosketch for a few grant applications, and for some time now, NIH has required you to include the PubMed Central ID for each article you publish that arose from NIH support. I only have a dozen or so papers indexed in PubMed, but I still wanted a way to do this automatically. If you have scores of publications, looking up all the PMCIDs could easily become a hassle.

First, create an account at My NCBI. Under your bibliography, click "Manage My Bibliography." Then click "Add citation," then in the new window that comes up, select "Citation from PubMed" and hit the "Go To PubMed" button.

Now the trick here is constructing a PubMed query that will get your publications only. There are lots of Stephen D. Turner's out there, so I had to get creative. This query construction tip comes to me by way of my colleague here at UVA, Aaron Mackey:

For many people, simple PubMed author searches suffice, e.g. "Pearson WR[Author]". For some, such name-based searches get it mostly right, but may include a few spurious false hits. For these cases, it's easy enough to exclude those false hits explicitly (e.g. "Mackey AJ"[Author] NOT 9850730[PMID] NOT 10730495[PMID] gets rid of the two AJ Mackey publications that are not, in fact, mine). For others, simple author searches do not suffice at all, but usually adding an institution and/or departmental affiliation does narrow the results sufficiently (e.g. for Jeff Smith, Biochemistry: "Smith JS"[au] AND "University of Virginia"[Affiliation] AND "Biochemistry"[Affiliation] identifies the 16 articles for which Jeff Smith is the senior author; Jeff could also add a few collaborative publications by adding those pubmed IDs to the search, i.e. adding "OR 17482543[PMID]" to the end of his query.

When I did this for myself, I searched by author, AND (any of my institutional affiliations separated by OR's), but NOT (any of the PMIDs that were not mine, separated by OR's). Apparently there was once another Stephen D. Turner at UVA in the department of Urology. Here are the results returned by my unique query:

"Turner SD"[Au] AND ("James Madison"[Affiliation] OR Vanderbilt[Affiliation] OR Hawaii[Affiliation] OR "University of Virginia"[Affiliation]) NOT (11514333[PMID] OR 11058553[PMID])

The final step is clicking the "Send to" link at the top right, and sending the results of your query to My Bibliography.

Now, when you are back at My NCBI, you should see a list of all your publications, complete with both the PMID and PMCID, ready to go in your biosketch.

You can then export this bibliography as text, or simply copy/paste. Finally, you have the option of making your bibliography public (example).

Friday, May 20, 2011

Using NCBI E-Utilities

NCBI has put a lot of effort into unifying their data access and retrieval system -- whether you are searching for a gene, protein, or publication, the results are returned in a similar fashion.

What most people don't realize is that this Entrez system is easily adapted for programmatic access (there are lots of details here). For example, recently I was interested in building a co-authorship network for a few investigators in our center, and rather than searching for and exporting this information using the pubmed website, I used the Entrez E-utilities inside a perl script. Python, Ruby and other scripting languages work great too, but I have gotten used to perl for tasks like this. If you don't have access to a linux distribution with perl installed, you can use strawberry perl in Windows.

To start, we need a web retrieval library called LWP::Simple. If for some reason you don't have this installed by default, you should be able to find it in a CPAN search.

use LWP::Simple;

Then, I set up the base url for the entrez utilities.

my $esearch = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?" . "db=pubmed&retmax=10000&usehistory=y&term=";

In the above line, you can change the db= to any of the databases listed here. The retmax= value is the maximum number of results to return. The term= value is the collection of search terms you wish to use. In my case, I used an authors last name, initials, and our home institution, Vanderbilt. We then execute the query.

my $q = "Bush WS Vanderbilt";
my $esearch_result = get($esearch . $q);

So here, we use a two-step process --

1. First, the program submits a search to the system. When this happens, their web-servers accept the search request and tag it with WebEnv ID (which the web-dev geeks would call a session variable) and a query key, then conducts the search to find identifiers that match the search request. Since we searched the pubmed database, the identifiers are all pubmed ids. This list of ids is stored on the NCBI servers for a brief time until it expires.

To do anything useful with our list of identifiers sitting on the NCBI servers out there, we need to pull the WebEnv ID and the QueryKey from the esearch result. The following code will yank these out of the XML stuff the web server sends back, and it also gives us a count of the records our query found.

$esearch_result =~
m|(\d+).*(\d+).*(\S+)|s;

my $Count = $1;
my $QueryKey = $2;
my $WebEnv = $3;

To see these, you can print them if you like:

print "Count = $Count; QueryKey = $QueryKey; WebEnv $WebEnv\n";

2. Next, our program must submit a fetch request to fish out the details for each of these identifiers. We do this using their eSummary engine, which works like so:

my $efetch = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgidb=gds&query_key=$QueryKey&WebEnv=$WebEnv";

my $efetch_result = get($efetch);

Now within perl, you can parse through this result to pull out any relevant information you might want. In case you don't know, perl is great for parsing text -- all the slashes and squigglies are for doing regular expression pattern matching. For my example, I was curious to see how many people I've been co-author with and on how many publications. I used the following to pull each author/pubmed id combination for a given search term.

@lines = split(/\n/,$efetch_result);
%citarray = ();
$opendoc = 0;
$id = 0;

foreach $line (@lines)
{
if($line =~ //)
{
$opendoc = 1;
}

if($line =~ /<\/DocSum>/)
{
$opendoc = 0;
}

if($opendoc == 1 && $line =~ /(\d+)<\/Id>/)
{
$id = $1;
}

if($opendoc == 1 && $line =~ /(.*)<\/Item>/)
{
print "$id\t$1\n";
}

}

For the sake of brevity, I'll skip a protracted discussion of the parsing logic I used, but if there is interest, I can elaborate.

In case you are wondering, I loaded this into a database table, joined that table to itself matching on pubmed id, and imported this into Gephi to build our co-authorship network. This was a big hit at the faculty meeting!

Tuesday, November 30, 2010

Abstract Art with PubMed2Wordle

While preparing for my upcoming defense, I found a cool little web app called pubmed2wordle that turns a pubmed query into a word cloud using text from the titles and abstracts returned by the query. Here are the results for a pubmed query for me ("turner sd AND vanderbilt"):

And quite different results for where I'm planning to do my postdoc:

Looks useful to quickly get a sense of what other people work on.

http://www.pubmed2wordle.appspot.com/

Thursday, January 14, 2010

FreeMyPDF.com unlocks PDFs for submitting to PubMed Central

Do you submit manuscripts to journals that are not indexed in PubMed? This can make it difficult for others to find your publications, especially if they don't have a subscription to the journal. This often happens with us when we publish in computer science journals. Using the NIH manuscript submission system you can upload your manuscript to PubMed Central, which provides free open access, and is indexed in PubMed. This takes less than 5 minutes to do per manuscript, and it makes it much easier for you and any other interested parties to access your publications. Furthermore, if you use NIH funding, you are required by law to make any publications resulting from this funding free and publicly available. Make sure you're not breaching any copyright agreements first by contacting the editor of your publisher.

I've uploaded a few of my own papers, and a snag I often run into is that the publisher will often "lock" the PDF by enabling security which prevents software from extracting data from the PDF file. FreeMyPDF.com will liberate your PDF from data extraction, printing, and other security restrictions, making it compatible with the NIH manuscript submission system.

NIH Manuscript Submission System

NIH Open Access Policy

FreeMyPDF.com - Removes security from viewable PDFs

Wednesday, December 16, 2009

Recent improvements to Pubget

If you've never heard of it before, check out my previous coverage on Pubget. It's like PubMed, but you get the PDFs right away. Pubget has recently implemented a number of improvements.

1. Citation matching. Pubget's citation matcher seems to work better than Pubmed most of the time. Try going to Pubget and pasting any of these random citations into the search bar:

J Biol Chem 277: 30738-30745
Nucleic Acids Res 2004;32:4812-20.
Evol. Biol. 7, 214 (2007).

2. The PaperPlane bookmarklet. Go here and drag the link to your bookmark toolbar. Now, if you're searching from pubmed, click the bookmarklet for one-click access to the PDF.

3. If you have a long list of PMIDs, separate them with commas and you can paste them directly into the search bar.

Pubget (Vanderbilt institutional link)

Pubget (If you're anywhere else)

Wednesday, August 5, 2009

Pubget = Pubmed on Steroids

I've used this a little bit recently. Pubget indexes essentially everything that PubMed does, except you get the PDF you're looking for right away. Lots of other useful tools as well. I sent one email to the Pubget team and CC'd the biomedical library, and a few days later they've worked it out so PubGet recognizes Vanderbilt's subscriptions. If you're at Vanderbilt, go to http://vanderbilt.pubget.com/, otherwise just use http://pubget.com/, and select your institution from the dropdown list, or email them if it's not there.

The one thing I've found is that they don't index things as quickly as PubMed, so you might have a hard time finding Advance Online Publications using Pubget.

Wednesday, May 20, 2009

Would a gene by any other name be just as significant?

So you have found significant SNPs from a study, and you are investigating the region. Browsing through Ensembl or Entrez-Gene, you find a coding region nearby. Atop this coding region, you see a collection of letters that are commonly used to refer to this gene, lets say "MYLK". So you begin a PubMed search to find publications that describe the function of this gene, searching with "MYLK". Seems reasonable, right?

Beware! Unfortunately, gene names or acronyms are NOT a standardized way of identifying coding regions. According to Gene Cards, the coding region with the symbol "MYLK" has 14 different symbol aliases, and four unique descriptions! To be complete, conduct a PubMed search using all of these terms. For example, searching PubMed for MYLK retrieves only 30 articles, mostly involving muscle contraction. Searching for MLCK on the other hand retrieves 847 articles! These references have much more emphasis on the neural activities of the gene, so perhaps a difference groups of investigators use different symbols.

To make matters worse, according to Entrez-Gene, MYLK is the "official" gene symbol. yet less than 5% of the PubMed articles use that designation! If possible, use the Entrez-gene or Ensembl gene ID when referencing a gene in the literature to help avoid this confusion.

Monday, March 2, 2009

Pubmed Searches as an RSS feed

As Stephen nicely posted earlier, RSS feeds are a very powerful way to keep up with the literature -- they "push" the information to you. In addition to subscribing to individual journals, you can subscribe to a PubMed search! This will let you keep up with ALL PubMed indexed journals.

To subscribe to a PubMed search, first go to www.pubmed.org and enter your search terms. Once you retrieve a search listing, you'll see a bar that says

Display Summary Show 20 Sort By Send to

The SEND TO drop down box will allow you to select an RSS Feed. Once you select this, you'll be taken to a page with a button that says "Create Feed". When you click this, you'll get a new page with a little orange XML button. Click it and your browser will give you the option to subscribe to the feed. Once you subscribe, there are lots of ways to read RSS Feeds, which we'll probably get to in another post.

Enjoy!

This blog has moved!