Tuesday 29 May 2007

OpenURL resolver for Psyche



Following on from first efforts with Journal of Arachnology described on iPhylo, I've added Psyche to my OpenURL resolver. Psyche is the journal of the Cambridge Entomological Club. What makes it easy to add is that there are full text PDFs available for most articles, Jonathan Rees has created a series of XML files for each issue (e.g., 103.xml), listing metadata for each article, and the URLs for the PDFs are easy to construct from the metadata. So, it was simply a case of harvesting the XML files, extracting the metadata, and adding this to a local MySQL database.

There were some minor "gotchas", such as the presence of entities in the XML (e.g. é for the character é). These weren't declared, so I needed to stick these into each XML file:

<!ENTITY aelig "&#230;" >
<!ENTITY ldquo "&#8220;">
<!ENTITY rdquo "&#8221;">
<!ENTITY lsquo "&#8220;">
<!ENTITY rsquo "&#8221;">
<!ENTITY ouml "&#246;">
<!ENTITY uuml "&#252;">
<!ENTITY mdash "&#8212;">
<!ENTITY eacute "&#233;">
<!ENTITY euml "&#235;">
<!ENTITY oelig "&#339;">
<!ENTITY OElig "&#338;">
<!ENTITY AElig "&#198;">
<!ENTITY acir "&#226;">
<!ENTITY oacute "&#243;">
<!ENTITY iacute "&#237;">
<!ENTITY aacute "&#225;">
<!ENTITY ndash "&#8211;">
<!ENTITY atilde "&#227;">
<!ENTITY uacute "&#250;">
<!ENTITY auml "&#228;">
<!ENTITY ocirc "&#244;">

Now, my OpenURL resolver will check if you are trying to resolve a link to an article in Pysche, and if it knows where the PDF is you will be taken there. For example, this link http://bioguid.info/openurl.php?sid=paracite&aulast=Gardner&aufirst=B&atitle=Observations on three species of Phidippus jumping spiders (Araneae: Salticidae)&title=Psyche, Camb.&date=1965&year=1965&volume=72&spage=133&epage=147 goes to the PDF of this paper on Phidippus.