BioCのMLで"dbSNP - getting details of Pubmed ID for a SNP"というのがあったのでメモ

あるSNPが触れられている文献を「rsSNPidをクエリーにして、NCBI eUtilsでPubMed ID」をゲットする、というもの。

[BioC] dbSNP - getting details of Pubmed ID for a SNP

Hi,

I had a set of snp ids (e.g. 'rs11805303'). Is there any package that will let me access dbSNP database and return the Pubmed IDs of articles that have cited that particular snp id. For example, for 'rs11805303', it will return Pubmed IDs: 17554300,18533027,19522770

thanks!

Hi, Paul. There may be other ways to do this, but you could use
eUtils from NCBI to do this:

snp2linkedPubmed <-
function(id) {
 id = as.integer(id)
 if(length(id)!=1) stop("id should be an integer vector of length 1")
 require(XML)
 url = sprintf("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=snp&db=pubmed&linkname=snp_pubmed_cited&id=%d",id)
 doc = xmlTreeParse(url,useInternalNode=TRUE)
 return(as.integer(sapply(getNodeSet(doc,'//Link/Id'),xmlValue)))
}

# note that this uses the integer value of the rs
> snp2linkedPubmed(11805303)
[1] 19522770 18533027 17554300

As it stands, this function can take only 1 rs# at a time. If you
need to do this for many rs#, DO NOT simply use this in a large loop.
If that is what you want to do, then you will need to modify the
function to take in a vector of values and search in blocks, and then
parse the returned XML. If this is your intent, I can provide further
code.

MLでちょっといいなというポスト、ブクマに持っていきづらいのではてダに残す。