Using R to parse time (and taxon names) with GBIF’s API

Update: The taxize package for R now includes a gbif_parse function.

GBIF has recently made a bunch of handy tools available via their revamped API. These tools include a species name parser, which seems very useful for cleaning long lists of taxon names.

Here’s a simple R function that takes a vector of taxon names and parses them using GBIF’s API, extracting, among other details, the genus, species, infraspecific rank and epithet, nothorank (i.e., indicating the taxonomic rank of hybridisation), and authorship.

I’ve created a gist of this function, so you can grab it from github with source_url('') (requires devtools package), or you can just copy and paste it from here.

It’s a bit awkward to include wide tabular output here, but I’ve provided a few examples of the function’s use on github. I haven’t tested the API thoroughly (and the stable version hasn’t yet been released – expected end of 2013), so I’m interested to hear if it “parses” your tests.

EDIT: updated to point to GBIF API v 1.0.
WARNING: currently returns an error when input strings contain certain diacritics.


6 thoughts on “Using R to parse time (and taxon names) with GBIF’s API

    • Sounds good, Scott. I figured ROpenSci would get to coding it all up into an updated package before long.. just figured I’d write a quick post since I was playing with this stuff anyway. Happy to be involved, or for you to just pinch it – nothing particularly creative here! 😉

      • If interested, want to fork and send a pull request to the rgbif repo (“newapi” branch)? Or I can easily add in for you, either way.

  1. Would be safer to wrap the require calls in a conditional that exited with an informative error message if they were not yet installed. Perhaps: if( !all( c( require(RJSONIO),
    require(plyr) ) {error(message=”packages missing”)} else { …. the rest of your function

    • Thanks David. Good call – when I wrote the function I had in-package use in mind, where the required packages would be dependencies. I’ve changed require calls to library calls to ensure the function exits on error.

  2. Pingback: Use R to get gbif data into a GRASS database – Ecostudies

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s