Using R to parse time (and taxon names) with GBIF’s API

Update: The taxize package for R now includes a gbif_parse function.


GBIF has recently made a bunch of handy tools available via their revamped API. These tools include a species name parser, which seems very useful for cleaning long lists of taxon names.

Here’s a simple R function that takes a vector of taxon names and parses them using GBIF’s API, extracting, among other details, the genus, species, infraspecific rank and epithet, nothorank (i.e., indicating the taxonomic rank of hybridisation), and authorship.

I’ve created a gist of this function, so you can grab it from github with source_url('https://gist.github.com/johnbaums/6971353/raw/gbif_parse.R') (requires devtools package), or you can just copy and paste it from here.

It’s a bit awkward to include wide tabular output here, but I’ve provided a few examples of the function’s use on github. I haven’t tested the API thoroughly (and the stable version hasn’t yet been released – expected end of 2013), so I’m interested to hear if it “parses” your tests.


EDIT: updated to point to GBIF API v 1.0.
WARNING: currently returns an error when input strings contain certain diacritics.

Getting rasters into shape from R

Update: Windows people should use the modified version of the function provided by Francisco Rodriguez-Sanchez (mentioned in this comment), which is available as a Gist here.


Today I needed to convert a raster to a polygon shapefile for further processing and plotting in R. I like to keep my code together so I can easily keep track of what I’ve done, so it made sense to do the conversion in R as well. The fantastic raster package has a function that can do this (rasterToPolygons), but it seems to take a very, very long time and chew up all system resources (at least on my not particularly souped up X200 laptop).

We can cut down the time taken for conversion by calling a GDAL utility,  gdal_polygonize.py, directly from R using system2(). GDAL needs to be installed on your system, but you’ll probably want it installed anyway if you’re planning to talk to spatial data (in fact, you might find you actually already have it installed, as it is required for a bunch of other GIS software packages). Together with it’s cousin OGR, GDAL allows reading and writing of a wide range of raster and vector geospatial data formats. Once installed, you have at your disposal a bunch of GDAL Utilities that can be run from the terminal, including gdal_polygonize.py. For polygonize and a couple of others, you’ll also need Python installed (again, you may already have it installed, so check first!). Finally, it helps if the path to the gdal_polygonize.py exists in the global PATH variable. To check if this is the case, run the following in R:

Sys.which("gdal_polygonize.py")

If this returns an empty string, it looks like gdal_polygonize.py either doesn’t exist on your system, or the path to its containing directoy is missing from the PATH variable. In the latter case, you can either use the pypath argument accepted by the function below to specify the path, or modify your PATH variable. The PATH variable can be modified by following these instructions for Windows and Mac. I suspect Linux users should typically not run into this problem, as the GDAL executables are usually installed to paths that already exist in PATH (e.g. /usr/bin).

Anyhow… let’s compare rasterToPolygons with gdal_polygonize. The function included below (gdal_polygonizeR) borrows from code provided in a post by Lyndon Estes on R-sig-geo. Thanks mate!

Continue reading