GenBank records typically come with links to related NCBI records, such as the NCBI Taxonomy and PubMed databases, but not all sequences have PubMed records. For example, sequence
DQ343272 has the following publication record:
REFERENCE 1 (bases 1 to 563)
AUTHORS Schubart,C.D., Cannicci,S., Vannini,M. and Fratini,S.
TITLE Molecular phylogeny of grapsoid crabs (Decapoda, Brachyura) and
allies based on two mitochondrial genes and a proposal for
refraining from current superfamily classification
JOURNAL J. Zoolog. Syst. Evol. Res. 44 (3), 193-199 (2006)
Ideally, every publication would have a GUID, and the GenBank record would be linked to that GUID. As a first step to this, bioGUID uses a simple web service to parse the
JOURNAL field and look for a DOI. The web service uses the Open Source
ParaTools to extract metadata from the citation, then calls
CrossRef's OpenURL resolver to search for a DOI.
Returning to the example above, if we append
J. Zoolog. Syst. Evol. Res. 44 (3), 193-199 (2006) to http://bioguid.info/cgi-bin/paracite?q=, we get this XML result (you can get the same result by clicking
here):
<?xml version="1.0" encoding="UTF-8"?>
<paracite result="parsed">
<issue>3</issue>
<date>2006</date>
<year>2006</year>
<publication>J. Zoolog. Syst. Evol. Res.</publication...</marked>
<volume>44</volume>
<match>_PUBLICATION_ _VOLUME_ (_ISSUE_), _SPAGE_-_EPAGE_ (_YEAR_)
</match>
<epage>199</epage>
<title>J. Zoolog. Syst. Evol. Res.</title>
<spage>193</spage>
<ref>J. Zoolog. Syst. Evol. Res. 44 (3), 193-199 (2006)</ref>
<openurl>sid=paracite&spage=193...year=2006 </openurl>
<doi>10.1111/j.1439-0469.2006.00354.x</doi>
</paracite>
If the
result attribute of the
paracite tag is
parsed, then the service found a template that matches the citation (shown in the
match tag) and extracted the metadata. If it didn't match a template, the attribute is set to
failed and no metadata is returned.
Any metadata found is used to construct an OpenURL query, which is sent to CrossRef. In this example, the reference has the DOI
doi:10.1111/j.1439-0469.2006.00354.x, which gives us a GUID to link the sequence to. This is an example of finding an existing GUID based on metadata, and thereby adding value to a GenBank record.