The end of the line for error bars in R

When plotting in R, I often use the segments function to add lines representing confidence intervals. This is a very simple way to plot lines connecting pairs of x,y coordinates.

Recently I discovered that by default, segments are styled with rounded line caps, which add to their length. This means, of course, that confidence intervals are slightly wider than intended.

R provides three styles of line ending – round, butt and square – which can be specified by the lend argument. The figure here shows the outcome of using each line ending, with vertical lines indicating actual end-points of segments. Both round and square line ends overshoot these points, while butt ends represent them correctly.

plot.new()
par(mar=c(1, 4, 1, 1))
plot.window(xlim=c(0, 1), ylim=c(0.5, 3.5))
axis(2, 1:3, c('round', 'butt', 'square'), las=1)
box(lwd=2)
segments(0.1, 1, 0.9, 1, lwd=20, lend='round')
segments(0.1, 2, 0.9, 2, lwd=20, lend='butt')
segments(0.1, 3, 0.9, 3, lwd=20, lend='square')
abline(v=c(0.1, 0.9))
Line end styles applied to segments plotted in R. Only 'butt' accurately represents end points.

Line end styles applied to segments plotted in R. Only ‘butt’ accurately represents end points.

The effect is slight, and is emphasized when line width is large. Regardless, it’s a good idea to routinely add lend='butt' (or lend=2) to your segments function calls.

A secondary benefit is that lines will appear crisper than when plotted with the default round caps.

Using R to parse time (and taxon names) with GBIF’s API

GBIF has recently made a bunch of handy tools available via their revamped API. These tools include a species name parser, which seems very useful for cleaning long lists of taxon names.

Here’s a simple R function that takes a vector of taxon names and parses them using GBIF’s API, extracting, among other details, the genus, species, infraspecific rank and epithet, nothorank (i.e., indicating the taxonomic rank of hybridisation), and authorship.

I’ve created a gist of this function, so you can grab it from github with source_url('https://gist.github.com/johnbaums/6971353/raw/gbif_parse.R') (requires devtools package), or you can just copy and paste it from here.

It’s a bit awkward to include wide tabular output here, but I’ve provided a few examples of the function’s use on github. I haven’t tested the API thoroughly (and the stable version hasn’t yet been released – expected end of 2013), so I’m interested to hear if it “parses” your tests.


EDIT: I’ve added a modified version of this function to the dev version of ROpenSci’s rgbif package (thanks, Scott!).

Getting rasters into shape from R

Today I needed to convert a raster to a polygon shapefile for further processing and plotting in R. I like to keep my code together so I can easily keep track of what I’ve done, so it made sense to do the conversion in R as well. The fantastic raster package has a function that can do this (rasterToPolygons), but it seems to take a very, very long time and chew up all system resources (at least on my not particularly souped up X200 laptop).

We can cut down the time taken for conversion by calling a GDAL utility,  gdal_polygonize.py, directly from R using system2(). GDAL needs to be installed on your system, but you’ll probably want it installed anyway if you’re planning to talk to spatial data (in fact, you might find you actually already have it installed, as it is required for a bunch of other GIS software packages). Together with it’s cousin OGR, GDAL allows reading and writing of a wide range of raster and vector geospatial data formats. Once installed, you have at your disposal a bunch of GDAL Utilities that can be run from the terminal, including gdal_polygonize.py. For polygonize and a couple of others, you’ll also need Python installed (again, you may already have it installed, so check first!). Finally, it helps if the path to the gdal_polygonize.py exists in the global PATH variable. To check if this is the case, run the following in R:

Sys.which("gdal_polygonize.py")

If this returns an empty string, it looks like gdal_polygonize.py either doesn’t exist on your system, or the path to its containing directoy is missing from the PATH variable. In the latter case, you can either use the pypath argument accepted by the function below to specify the path, or modify your PATH variable. The PATH variable can be modified by following these instructions for Windows and Mac. I suspect Linux users should typically not run into this problem, as the GDAL executables are usually installed to paths that already exist in PATH (e.g. /usr/bin).

Anywho… let’s compare rasterToPolygons with gdal_polygonize. The function included below (gdal_polygonizeR) borrows from code provided in a post by Lyndon Estes on R-sig-geo. Thanks mate!

Continue reading

R functions to filter rjags results

A while back I was running a bunch of JAGS models through R, using the rjags (written by Martyn Plummer) and R2jags (by Yu-Sung Su) packages. These packages provide a great interface to the JAGS software, which allows analysis of Bayesian models (written in the BUGS language) through Markov chain Monte Carlo simulation.

Running a JAGS model using these tools returns an rjags object, which when printed to the screen, summarises the posterior distribution of each monitored node, giving its mean and standard deviation, a range of quantiles, and its Gelman-Rubin convergence diagnostic statistic (Rhat), which indicates the ratio of variance within chains to that among chains. The summary is great, but when monitoring a large number of nodes, printing these to the screen can cause R to hang, and can exceed the screen buffer (not to mention making it painful to find the nodes you’re immediately interested in).

To help deal with this I wrote a couple of simple R functions:

  • jagsresults: return a matrix containing summary results for just the nodes you are interested in (using regular expression pattern-matching, if desired).
  • rhats: sort the output by the nodes’ Rhat values, making it easy to show the n least converged nodes.

Continue reading

I, Rbot: Tweeting from R

Model status update. A Tweet from the Rbot.

Model status update. A Tweet from the Rbot.

Over the past few weeks I’ve been running batches of JAGS simulations from R. Although these models typically converged within an hour or so, more complex models can take days, or even weeks to converge. Because we, as humans, are required to bathe, feed, socialise and attend weddings, there will inevitably come a time when you are required to leave the safety of your modelling den while your simulation is still chugging away. It would be nice, however, to keep in touch with your models’ progress, to be made instantly aware of any errors that occur in your absence, and to be advised upon successful task completion. This would be decidedly more satisfying than returning from a week-long holiday to find that your model broke on Day 1, just after you closed the door on your way out.

Enter Twidge, a command line Twitter client that humbly fills this neglected niche by allowing you to send a short Tweet to yourself from R via the system() function. The tweet can, of course, be a message pasted together from R objects, which permits dynamic tweet content and means your R-Tweeting power is really only limited by your imagination (and the 140 character message cap). Continue reading

OpenBUGS error messages. Something went right.

Although my relationship with OpenBUGS is still in the early stages, one glaringly obvious and quite infuriating shortcoming has been the lack of detail in its error messages. The initiated amongst you are no doubt familiar with the phrase “Sorry, something went wrong in procedure [blah]”. These error messages had me chasing my tail for hours in vain attempts to find offending segments of code. Surely, I thought, there must be a way to display more detailed information about the cause of an error.

Default OpenBUGS error messages

A default OpenBUGS error message

As it happens, there is. Continue reading