Remove small raster polygons — sieveFilter • gdalraster

sieveFilter() is a wrapper for GDALSieveFilter() in the GDAL Algorithms API. It removes raster polygons smaller than a provided threshold size (in pixels) and replaces them with the pixel value of the largest neighbour polygon.

Usage

sieveFilter(
  src_filename,
  src_band,
  dst_filename,
  dst_band,
  size_threshold,
  connectedness,
  mask_filename = "",
  mask_band = 0L,
  options = NULL,
  quiet = FALSE
)

Arguments

src_filename: Filename of the source raster to be processed.
src_band: Band number in the source raster to be processed.
dst_filename: Filename of the output raster. It may be the same as src_filename to update the source file in place.
dst_band: Band number in dst_filename to write output. It may be the same as src_band to update the source raster in place.
size_threshold: Integer. Raster polygons with sizes (in pixels) smaller than this value will be merged into their largest neighbour.
connectedness: Integer. Either 4 indicating that diagonal pixels are not considered directly adjacent for polygon membership purposes, or 8 indicating they are.
mask_filename: Optional filename of raster to use as a mask.
mask_band: Band number in mask_filename to use as a mask. All pixels in the mask band with a value other than zero will be considered suitable for inclusion in polygons.
options: Algorithm options as a character vector of name=value pairs. None currently supported.
quiet: Logical scalar. If TRUE, a progress bar will not be displayed. Defaults to FALSE.

Value

Logical indicating success (invisible TRUE). An error is raised if the operation fails.

Details

Polygons are determined as regions of the raster where the pixels all have the same value, and that are contiguous (connected). Pixels determined to be "nodata" per the mask band will not be treated as part of a polygon regardless of their pixel values. Nodata areas will never be changed nor affect polygon sizes. Polygons smaller than the threshold with no neighbours that are as large as the threshold will not be altered. Polygons surrounded by nodata areas will therefore not be altered.

The algorithm makes three passes over the input file to enumerate the polygons and collect limited information about them. Memory use is proportional to the number of polygons (roughly 24 bytes per polygon), but is not directly related to the size of the raster. So very large raster files can be processed effectively if there aren't too many polygons. But extremely noisy rasters with many one pixel polygons will end up being expensive (in memory) to process.

The input dataset is read as integer data which means that floating point values are rounded to integers.

Examples

## remove single-pixel polygons from the vegetation type layer (EVT)
evt_file <- system.file("extdata/storml_evt.tif", package="gdalraster")

# create a blank raster to hold the output
evt_mmu_file <- file.path(tempdir(), "storml_evt_mmu2.tif")
rasterFromRaster(srcfile = evt_file,
                 dstfile = evt_mmu_file,
                 init = 32767)
#> initializing destination raster...
#> done

# create a mask to exclude water pixels from the algorithm
# recode water (7292) to 0
expr <- "ifelse(EVT == 7292, 0, EVT)"
mask_file <- calc(expr = expr,
                  rasterfiles = evt_file,
                  var.names = "EVT")
#> calculating from 1 input layer(s)...
#> ================================================================================
#> output written to: /tmp/Rtmp3mX7RE/rastcalc23b8d575ebc.tif

# create a version of EVT with two-pixel minimum mapping unit
sieveFilter(src_filename = evt_file,
            src_band = 1,
            dst_filename = evt_mmu_file,
            dst_band = 1,
            size_threshold = 2,
            connectedness = 8,
            mask_filename = mask_file,
            mask_band = 1)
#> 0...10...20...30...40...50...60...70...80...90...100 - done.