Getting rasters into shape from R

Update: Windows people should use the modified version of the function provided by Francisco Rodriguez-Sanchez (mentioned in this comment), which is available as a Gist here.


Today I needed to convert a raster to a polygon shapefile for further processing and plotting in R. I like to keep my code together so I can easily keep track of what I’ve done, so it made sense to do the conversion in R as well. The fantastic raster package has a function that can do this (rasterToPolygons), but it seems to take a very, very long time and chew up all system resources (at least on my not particularly souped up X200 laptop).

We can cut down the time taken for conversion by calling a GDAL utility,  gdal_polygonize.py, directly from R using system2(). GDAL needs to be installed on your system, but you’ll probably want it installed anyway if you’re planning to talk to spatial data (in fact, you might find you actually already have it installed, as it is required for a bunch of other GIS software packages). Together with it’s cousin OGR, GDAL allows reading and writing of a wide range of raster and vector geospatial data formats. Once installed, you have at your disposal a bunch of GDAL Utilities that can be run from the terminal, including gdal_polygonize.py. For polygonize and a couple of others, you’ll also need Python installed (again, you may already have it installed, so check first!). Finally, it helps if the path to the gdal_polygonize.py exists in the global PATH variable. To check if this is the case, run the following in R:

Sys.which("gdal_polygonize.py")

If this returns an empty string, it looks like gdal_polygonize.py either doesn’t exist on your system, or the path to its containing directoy is missing from the PATH variable. In the latter case, you can either use the pypath argument accepted by the function below to specify the path, or modify your PATH variable. The PATH variable can be modified by following these instructions for Windows and Mac. I suspect Linux users should typically not run into this problem, as the GDAL executables are usually installed to paths that already exist in PATH (e.g. /usr/bin).

Anyhow… let’s compare rasterToPolygons with gdal_polygonize. The function included below (gdal_polygonizeR) borrows from code provided in a post by Lyndon Estes on R-sig-geo. Thanks mate!

[EDIT: I’ve given this post a considerable overhaul, including correction of the source code such that it should work across platforms. Please feel free to leave me a comment if you have any problems or suggestions.]

First, we’ll import a raster data set

Here’s one I prepared earlier. It’s the result of a conversion of a polygon shapefile of country boundaries (from Natural Earth, a great source of public domain, physical/cultural spatial data) to a raster data set.

download.file('http://dl.dropbox.com/u/1058849/blog/NEcountries.asc.zip',
              destfile={f <- tempfile()}, quiet=TRUE, cacheOK=FALSE)
unzip(f, exdir={d <- tempdir()})
library(rasterVis)
r <- raster(file.path(d, 'NEcountries.asc'), crs=CRS('+proj=longlat'))
levelplot(r, margin=FALSE, col.regions=rainbow)
Result using rasterToPolygons

The raster object resulting from using rasterToPolygons. Click to see it in all it’s pixelated glory.

Using rasterToPolygons in the raster package

p <- rasterToPolygons(r, dissolve=TRUE)

…time passes… Ok, ouch. I decided to kill the process after 6 hours.

Using gdal_polygonize.py

Note that the following takes an argument outshape, which specifies the path where the converted polygon shapefile will reside. Default is NULL, which saves the shapefile to a temporary file. Either way, the function returns a SpatialPolygonsDataFrame representation of the shapefile, unless readpoly is FALSE, in which case the function returns NULL.

## Define the function
gdal_polygonizeR <- function(x, outshape=NULL, gdalformat = 'ESRI Shapefile',
                             pypath=NULL, readpoly=TRUE, quiet=TRUE) {
  if (isTRUE(readpoly)) require(rgdal)
  if (is.null(pypath)) {
    pypath <- Sys.which('gdal_polygonize.py')
  }
  if (!file.exists(pypath)) stop("Can't find gdal_polygonize.py on your system.")
  owd <- getwd()
  on.exit(setwd(owd))
  setwd(dirname(pypath))
  if (!is.null(outshape)) {
    outshape <- sub('\\.shp$', '', outshape)
    f.exists <- file.exists(paste(outshape, c('shp', 'shx', 'dbf'), sep='.'))
    if (any(f.exists))
      stop(sprintf('File already exists: %s',
                   toString(paste(outshape, c('shp', 'shx', 'dbf'),
                                  sep='.')[f.exists])), call.=FALSE)
  } else outshape <- tempfile()
  if (is(x, 'Raster')) {
    require(raster)
    writeRaster(x, {f <- tempfile(fileext='.tif')})
    rastpath <- normalizePath(f)
  } else if (is.character(x)) {
    rastpath <- normalizePath(x)
  } else stop('x must be a file path (character string), or a Raster object.')
  system2('python', args=(sprintf('"%1$s" "%2$s" -f "%3$s" "%4$s.shp"',
                                  pypath, rastpath, gdalformat, outshape)))
  if (isTRUE(readpoly)) {
    shp <- readOGR(dirname(outshape), layer = basename(outshape), verbose=!quiet)
    return(shp)
  }
  return(NULL)
}

Time to polygonizeR the raster!

Firstly, using a raster object…

system.time(p <- gdal_polygonizeR(r))

Creating output /tmp/RtmpdnHOc7/filec7f2acf5d36.shp.
0...10...20...30...40...50...60...70...80...90...100 - done.
   user  system elapsed
 19.833   0.336  20.464

And now referring directly to the raster file, and disabling the importing of the resulting shapefile by specifying readpoly=FALSE.

system.time(
  gdal_polygonizeR(file.path(d, 'NEcountries.asc'), readpoly=F)
)

Creating output /tmp/RtmpdnHOc7/filec7f4ae4d60d.shp.
0...10...20...30...40...50...60...70...80...90...100 - done.
   user  system elapsed
  2.864   0.124   3.023

Just over 20 seconds if we begin with a raster object, and only 3 seconds if we already have the raster saved as a file on disk. That’s around 1/7000 of the time spent unsuccessfully attempting the same feat with rasterToPolygons.

spplot(p, col.regions=rainbow(200))
Result using gdal_polygonizeR

The SpatialPolygonsDataFrame resulting from gdal_polygonizeR.

Don’t get me wrong… I’m a big fan of raster as well as Robert’s other packages. This is just one place where it is not particularly efficient. Also note that I’m dissolving the polygons when using rasterToPolygons, and as this relies on rgeos, the problem might lie there.

Advertisements

36 thoughts on “Getting rasters into shape from R

  1. Hi there,

    Thanks for sharing this!

    Do you know how to avoid trouble when using osgeo4 installation of python and gdal/ogr. I get an error “ImportError: No module named ‘gdal'”. However, when I open the osgeo shell (I’m on OS Win7) an run python from there, like:

    C:\OSGeo4W64>python
    Python 2.7.5 (default, May 15 2013, 22:44:16) [MSC v.1500 64 bit (AMD64)] on win
    32
    Type “help”, “copyright”, “credits” or “license” for more information.
    >>> from osgeo import gdal

    the import works as it should. I tried to alter the call to system in your function – but without success.

    Thanks,
    Kay

  2. Hi John,

    Thank you for posting this – very helpful! Also the most updated version here: https://gist.github.com/johnbaums/26e8091f082f2b3dd279.

    However, I couldn’t make the function work in Windows with an OSGeo4W installation of GDAL. After a few hours of struggle, I found an easy solution that I’m linking to in case more people find the same problems.

    Basically, I substituted the call to ‘python’ in system2 by the path to OSGeo4W shell (‘C:\\OSGeo4W64\\OSGeo4W.bat’ in my computer), so that gdal_polygonize is run from the OSGeo4W shell. Then the ‘pypath’ argument is just “gdal_polygonize”. I also had to comment out the line checking for the existence of gdal_polygonize file.

    For the record, the modified function is available here: https://gist.github.com/Pakillo/c6b076eceb0ef5a70e3b.

    Hope it’s useful, and thanks again for posting this material!

    Cheers,

    F. Rodriguez-Sanchez
    @frod_san

    • Thank you for the modification!

      The conversion from raster to polygons works well, but I don’t know how to specify “outshape” so that the polygons get saved as .shp (ESRI shapefile) to a given directory (or working directory).

      Can you please help me? Thanks in advance 🙂
      Kathi

      • It’s many months later and I’ve only just seen these WordPress notifications (sorry!). For future reference, you should just be able to enter the desired path, with or without the .shp extension. For example: polygonizer('myfile.tif', outshape='c:/output/whatever/myshapefile.shp').

  3. I’ve finally managed to convert my raster to polygons thanks to your explanation and function!
    But now I’m left with a shapefile consisting of 80000 polygons. Is there a possibility to include the dissolution into your function?

    • gdal_polygonize creates a polygon for each contiguous (4-connected, by default) block of raster cells that share a common cell value. My guess is that your data are continuous, and few adjacent cells have the same value. You might consider, e.g., rounding decimal places, or binning the raster into a number of classes (with cut, perhaps).

      That said, you can dissolve SpatialPolygons objects with either aggregate from the sp package, or gUnaryUnion from the rgeos package. However, this will probably create a single rectangular polygon equal to the extent of your raster (because polygons are continuous across the extent of the raster, and adjoining polygons are merged by these functions), which is almost certainly not what you want.

    • I also had the same problem. The gdal_pologonizeR generated more polygons than was the correct. It does this because the feature in raster format often have “thin” parts composed of only 1 pixel (or sequences of 1 pixels), but that integrate the feature as a whole. The function transforms these pixels into independent polygons, with the same identifier (called "DN"). I solved this problem by applying the unionSpatialPolygons function of the Maptools package after vectoring. My example command:

      tmp2 = unionSpatialPolygons (tmp2, tmp2$DN)

      Note: tmp2 was my vector file generated by gdal_pologonizeR

  4. Thank you for the code. We are wrangling with a lot of raster files that need to be converted into polygons. I am getting this error with our raster files:

    Error in .startAsciiWriting(x, filename, …) :
    x has unequal horizontal and vertical resolutions. Such data cannot be stored in arc-ascii format

    My raster is already in R:

    shark_richness
    class : RasterLayer
    dimensions : 3000, 3000, 9e+06 (nrow, ncol, ncell)
    resolution : 11578.35, 4826.238 (x, y)
    extent : -17367530, 17367530, -7212962, 7265753 (xmin, xmax, ymin, ymax)
    coord. ref. : +proj=cea +lon_0=0 +lat_ts=30 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0
    data source : /Users/ara/Documents/GIS/Ocean_preds_project/richness_maps/shark_richness.tif
    names : shark_richness
    values : 0, 71 (min, max)

    Is there a way around the writing to the ascii format?

    Thanks
    ara

    • Thanks for pointing this out. I’ve modified the code above such that it writes out the intermediate raster file as a geoTIFF rather than an ASCII. Now it allows unequal vertical and horizontal res.

  5. Is there a way to do this without dissolving based on the raster values? So you’d just get a polygon grid with the same cell size as the raster?

    • A simple way to do this is to assign unique values to the raster cells, then polygonize, and finally add a new attribute to the polygons to restore the original cell values. Be aware that if you try this with a large raster, you’ll end up with a huge SpatialPolygonsDataFrame.

      Something like this:

       # decrease resolution for this example
      r2 <- aggregate(r, fact=20)
      # assign unique values to non-NA cells
      r3 <- mask(init(r2), r2) 
      # polygonize
      p <- polygonizer(r3) 
      # create an attribute for country, 
      # and fill it with values from the 
      # original raster
      p$country <- na.omit(r2[])
      

      Note that above I use polygonizer, which is the version of the function that I maintain as a Gist here. You can source it with devtools, using devtools::source_gist('26e8091f082f2b3dd279').

  6. Thanks a lot for the code. Is there a way to ignore NA cells into your function ? I have NA cells in my raster and by using your function, I obtain polygons with gaps. Thank you very for your help.

    • What behaviour would you prefer? Polygons are delineated by changes in value. The function wouldn’t know what value to assign to NA cells. If you’re referring to NA cells that are completely surrounded by cells of a constant value, then you could remove the “hole” polygons after polygonizing the raster. For example:

      # below, I draw on Roger Bivand's suggestion here:
      # https://stat.ethz.ch/pipermail/r-sig-geo/2014-January/020139.html
      
      library(raster)
      devtools::source_gist('26e8091f082f2b3dd279')
      r <- raster(matrix(c(1, 1, 1, 
                           1, NA, 1, 
                           1, 1, 1, 
                           2, 2, 2, 
                           2, 2, NA, 
                           2, 2, 2), 3), 
                  xmn=0, xmx=6, ymn=0, ymx=3)
      
      p <- polygonizer(r)
      pp <- slot(p, 'polygons')
      holes <- lapply(pp, function(x) sapply(slot(x, 'Polygons'), slot, 'hole'))
      res <- lapply(seq_along(pp), function(i) slot(pp[[i]], 'Polygons')[!holes[[i]]])
      p2 <- SpatialPolygonsDataFrame(
        SpatialPolygons(
          lapply(seq_along(res), function(i) Polygons(res[[i]], ID=row.names(p)[i])), 
          proj4string=CRS(proj4string(p))),
        as.data.frame(p)
      )
      

      Compare p and p2.

      Note that NA cells at the edge of a polygon are not considered holes, so will retain their NA value. With a bit of thought and Googling you should be able to find ways to fill those cells (probably before polygonizing) according to some other rule you specify.

  7. Hello! Thank you for posting about a different way to change raster to spatial polygon since rasterToPolygon takes forever. However, I keep getting the same error message and I don’t know how to debug the code right now.

    Error in ogrInfo(dsn = dsn, layer = layer, encoding = encoding, use_iconv = use_iconv, :
    Cannot open data source
    In addition: Warning message:
    running command ‘”python” “C:\Users\Admin\DOWNLO~1\RELEAS~1\bin\gdal\python\scripts\GDAL_P~2.PY” “C:\Users\Admin\AppData\Local\Temp\Rtmp2XfxOl\file1a503f605027.tif” -f “ESRI Shapefile” “C:\Users\Admin\AppData\Local\Temp\Rtmp2XfxOl\file1a50102c72ff.shp”‘ had status 127

    this is the error message that is keep appearing. Also, I ran gdal_polygonize.PY on python and I get DLL Load Failed: Specified Module cannot be found.

    Do you know how to fix these errors? Thank you in advance.

    • Are you using the version from this post, or the polygonizer function from here? If the former, try the latter. If that doesn’t work, then try debugonce(polygonizer) before trying again, and work out what exactly is throwing the error.

      • I am getting the same error with the latter polygonizer.py right now. I will try to debug using debugonce to see which part causes the error.

    • Hi Daniel,

      This is probably a little late now, but I would though I comment to help the troops following up the rear with the same issue.

      So, I was getting the exact same error, the reason this is happening is because for the Windows version to work you have to have GDAL installed through this tool: https://trac.osgeo.org/gdal/wiki/OSGeo4W

      Once you have installed this you just need to add “C:\OSGeo4W64” to the environmental PATH variable and you should be good to go.

      Hope this helps 🙂

  8. With regards to the rasterToPolygons function, it seems to crash if there are any NA values. Simply set them to zero before running the function: r[is.na(r)] <- 0

    Would be interesting if you compared the runtime after doing this.

    Cheers

  9. Hi John, great post. Thank you! I’m writing an R package that currently uses rasterToPolygons(), and I’d love to be able to improve its performance. However, I’d like for the package users to not have to worry about downloading extra software and setting path variables and such. Is there any way of doing this with the ‘rgdal’ package, by any chance?

  10. Hi, I was reviewing the code of the RasterToPolygons function of the raster package and modifying it for some of my own needs. The question is if you have an idea why it takes so long with large rasters and that can be done, because the solution you propose is very good but requires having GDAL installed and in addition to python and modify the PATH variable of the system. I’m wondering why I’m creating an R-package to facilitate the transformation of some scientific data formats for spatiotemporal and geospatial datasets, but my package already depends on some other packages (including raster and rgdal) and I do not want it to have Which will also depend on having GDAL and python installed, much less having to modify the PATH variable. Thank you for sharing with us and apologize for my English is not very good I’m Spanish speaking.

    • Hi Yuniel.
      I’m afraid I haven’t profiled RasterToPolygons to see where the bottlenecks are, nor have I thought about how to resolve them. If you get anywhere with this, I’d be interested in your findings.
      Cheers

  11. Pingback: Connectivity Analysis for “Ch. 3” – Jessica Gorzo

  12. Pingback: How To Create A Tableau Shape File From An Image - Your Favourite Story

  13. Hi All-
    I ran into an issue when I was working with this today that I struggled with for several hours and ultimately was able to overcome. For some reason for Mac it seems like GDAL has removed it’s inherent linkage with python, so I was coming up with an error of “No module named osgeo” and the function wouldn’t run. I solved this by installing the second most recent version of GDAL (2.1) and the Numpy installation that is included with that, which wasn’t included in the most recent version. I don’t have any background knowledge of python, but this was a workaround for me that seems to have functioned.

    Thanks John for a great function!

  14. Hi John,

    it is my third day trying to resolve something but I am not able to do it, so thought of trying to ask you in here. I am using the Francisco modified version and I get the following error:

    Error in setwd(dirname(pypath)) : cannot change working directory

    When I search for pypath after:
    pypath <- Sys.which(“gdal_polygonize.py”)

    This is what I get: pypath
    gdal_polygonize.py
    “”

    Do you think it is related? and could you somehow help me? haha.
    I installed the gdal_polygonize.py and everything else in OSGeo4W64 and I checked that the file exists in there.

    I appreciate your help!
    Oscar

    • Hi Oscar. Sorry it took me so long to reply. Maybe this is no longer relevant. My guess is that you are not providing pypath to the function. In this case, it will try to determine the location of gdal_polygonize.py by using Sys.which('gdal_polygonize.py'). If you try that command in R, you’ll probably see an empty string as a result. This empty string is then passed to setwd(), leading to setwd(''), which returns that error. So either you don’t have gdal_polygonize.py on your system, or it’s containing folder is not included in the PATH variable. I mention this in the third paragraph of this post. Good luck getting it to work!

  15. Hi John,

    Thank you for this function. I am unfortunately having some problems when using it and recive these error messages:
    running command ‘”C:\OSGeo4W64\OSGeo4W.bat” “C:\Users\Filip\Desktop\gdal_polygonize.py” “C:\Users\Filip\AppData\Local\Temp\RtmpOutu6N\file282c363e9d9.tif” -f “ESRI_Shapefile” “C:\Users\Filip\AppData\Local\Temp\RtmpOutu6N\file282c67e9485d.shp”‘ had status 127
    Error in ogrInfo(dsn = dsn, layer = layer, encoding = encoding, use_iconv = use_iconv, :
    Cannot open data source

    Status 127 means that the function gdal_polygonize could not be found in the OSGeo4W shell so I think that the problem is that my path to the script can not be read eventhough that “C:\Users\Filip\Desktop\gdal_polygonize.py” is the location of the script. I am not sure which data source that the second error is referring to, so I am unsure where to start to try and solve it.

    If someone could point me in the right direction I would greatly appreciate it.

  16. Hello! Sorry if my question is super-näive, but… can you tell me in which part of the code may be the error, so I get this message all the time:

    Error in ogrInfo(dsn = dsn, layer = layer, encoding = encoding, use_iconv = use_iconv, :
    Cannot open data source

    I am just trying to apply the function above to a raster (.tif) dataset.

    Thanks a lot!

    • Hi Ester. I guess you’re passing a file path to the function – are you sure that file path is correct? Does file.exists('myfile') (where myfile is the path you’re passing to the function) return TRUE?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s