Overview
Configuration options are essentially global variables the user can set. They are used to alter the default behavior of certain raster format drivers, and in some cases the GDAL core. A large number of configuration options are available. An overall discussion along with full list of available options and where they apply is in the GDAL documentation at https://gdal.org/user/configoptions.html.
This quick reference covers a small subset of configuration options
that may be useful in common scenarios, with links to topic-specific
documentation provided by the GDAL project. Options can be set from R
with gdalraster::set_config_option()
. Note that specific
usage is context dependent. Passing value = ""
(empty
string) will unset a value previously set by
set_config_option()
:
library(gdalraster)
set_config_option("GDAL_NUM_THREADS", "ALL_CPUS")
# unset:
set_config_option("GDAL_NUM_THREADS", "")
General options
GDAL doc: https://gdal.org/user/configoptions.html#general-options
GDAL_RASTERIO_RESAMPLING
The $read()
method of a GDALRaster
object
will perform automatic resampling if the specified output size
(out_xsize * out_ysize
) is different than the size of the
source region being read (xsize * ysize
). In that case,
resampling can be configured to override the default NEAR
to one of BILINEAR
, CUBIC
,
CUBICSPLINE
, LANCZOS
, AVERAGE
,
MODE
, RMS
, or GAUSS
:
# bilinear interpolation (2x2 neighborhood of pixels)
set_config_option("GDAL_RASTERIO_RESAMPLING", "BILINEAR")
CPL_TMPDIR
By default, temporary files are written into the current working directory. This can be changed with:
set_config_option("CPL_TMPDIR", "<dirname>") # tmpdir to use
Performance and caching
GDAL doc: https://gdal.org/user/configoptions.html#performance-and-caching
GDAL_NUM_THREADS
Sets the number of worker threads to be used by GDAL operations that
support multithreading. This affects several different parts of GDAL
including multi-threaded compression for GeoTiff and SOZip, and
multithreaded computation during warp()
(see topics
below).
GDAL_CACHEMAX
The size limit of the block cache is set upon first use (first I/O).
Setting GDAL_CACHEMAX
after that point will not resize the
cache. It is a per-session setting. If GDAL_CACHEMAX
has
not been set upon first use of the cache, then the default cache size
(5%
of physical RAM) will be in effect for the current
session. See also GDAL
Block Cache.
# set to a specific size in MB
set_config_option("GDAL_CACHEMAX", "800")
# or percent of physical RAM
set_config_option("GDAL_CACHEMAX", "10%")
GDAL_MAX_DATASET_POOL_SIZE
The default number of datasets that can be opened simultaneously by
the GDALProxyPool
mechanism (used by VRT for example) is
100
. This can be increased to get better random I/O
performance with VRT mosaics made of numerous underlying raster files.
Note: on Linux systems, the number of file handles that can be opened by
a process is generally limited to 1024
. This is currently
clamped between 2
and 1000
. Also note that gdalwarp
increases the pool size to 450
:
# default is 100
set_config_option("GDAL_MAX_DATASET_POOL_SIZE", "450")
PG_USE_COPY
This configures PostgreSQL/PostGIS to use COPY
for
inserting data which is significantly faster than INSERT
.
This can increase performance substantially when using
gdalraster::polygonize()
to write polygons to PostGIS
vector. See also GDAL
configuration options for PostgreSQL.
# use COPY for inserting to PostGIS
set_config_option("PG_USE_COPY", "YES")
SQLITE_USE_OGR_VFS
For the SQLite-based formats GeoPackage (.gpkg) and Spatialite
(.sqlite), setting SQLITE_USE_OGR_VFS
enables extra
buffering/caching by the GDAL/OGR I/O layer and can speed up I/O. Be
aware that no file locking will occur if this option is activated, so
concurrent edits may lead to database corruption. This setting may
increase performance substantially when using
gdalraster::polygonize()
to write polygons to a vector
layer in these formats. Additional configuration and performance hints
for SQLite databases are in the driver documentation at: https://gdal.org/drivers/vector/sqlite.html#configuration-options.
# SQLite: GPKG (.gpkg) and Spatialite (.sqlite)
# enable extra buffering/caching by the GDAL/OGR I/O layer
set_config_option("SQLITE_USE_OGR_VFS", "YES")
OGR_SQLITE_JOURNAL
SQLite is a transactional DBMS. When many INSERT
statements are executed in close sequence, application code may group
them into large batches within transactions in order to get optimal
performance. By default, if no transaction is explicitly started, SQLite
will autocommit on every statement which will be slow.
The OGR_SQLITE_JOURNAL
option configures operation of
the rollback journal that implements transactions in SQLite. The SQLite
documentation describes the default operation:
The DELETE journaling mode is the normal behavior. In the DELETE mode, the rollback journal is deleted at the conclusion of each transaction. Indeed, the delete operation is the action that causes the transaction to commit.
The DELETE
mode requires file system I/O so performance
is degraded if many INSERT
s are autocommitted individually.
Using MEMORY
journaling mode (or even OFF
) can
be much faster in this case:
The MEMORY journaling mode stores the rollback journal in volatile RAM. This saves disk I/O but at the expense of database safety and integrity. If the application using SQLite crashes in the middle of a transaction when the MEMORY journaling mode is set, then the database file will very likely go corrupt.
See the SQLite documentation for all available journal
modes. This setting also applies when using
gdalraster::polygonize()
to write polygons to a vector
layer in GeoPackage (.gpkg) or Spatialite (.sqlite) formats (see
SQLITE_USE_OGR_VFS
above).
# configure SQLite to store the rollback journal in RAM
set_config_option("OGR_SQLITE_JOURNAL", "MEMORY")
Networking
GDAL doc: https://gdal.org/user/configoptions.html#networking-options
CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE
Whether to use a local temporary file to support random writes in
certain virtual file systems. The temporary file will be located in
CPL_TMPDIR
(see above).
# YES|NO to use a temp file
set_config_option("CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE", "YES")
PROJ
GDAL doc: https://gdal.org/user/configoptions.html#proj-options
OSR_DEFAULT_AXIS_MAPPING_STRATEGY
This option can be set to either TRADITIONAL_GIS_ORDER
or AUTHORITY_COMPLIANT
. GDAL >= 3.5 defaults to
AUTHORITY_COMPLIANT
. Determines whether to honor the
declared axis mapping of a CRS or override it with the traditional GIS
ordering (x = longitude, y = latitude).
OSR_WKT_FORMAT
As of GDAL 3.0, the default format for exporting a spatial reference definition to Well Known Text is WKT 1. This can be overridden with:
# SFSQL/WKT1_SIMPLE/WKT1/WKT1_GDAL/WKT1_ESRI/WKT2_2015/WKT2_2018/WKT2/DEFAULT
set_config_option("OSR_WKT_FORMAT", "WKT2")
Warp
GDAL doc: https://gdal.org/programs/gdalwarp.html#memory-usage
The performance and caching
topic above generally applies to processing with
gdalraster::warp()
(reproject/resample/crop/mosaic).
GDAL_NUM_THREADS
Multithreaded computation in warp()
can be enabled
with:
# note this also affects several other parts of GDAL
set_config_option("GDAL_NUM_THREADS", "4") # number of threads or ALL_CPUS
Increasing the memory available to warp()
may also
increase performance (i.e., the options passed in cl_arg
include a value like c("-wm", "1000")
). The warp memory
specified by "-wm"
is shared among all threads. It is
especially beneficial to increase this value when running
warp()
with multithreading enabled.
Multithreading could also be enabled by including a GDAL warp option
in cl_arg
with
c("-wo", "NUM_THREADS=<value>")
greater than 1, which
is equivalent to setting the GDAL_NUM_THREADS
configuration
option as shown above.
This option can be combined with the -multi
command-line argument passed to warp()
in
cl_arg
. With -multi
, two threads will be used
to process chunks of the raster and perform input/output operation
simultaneously, whereas the GDAL_NUM_THREADS
configuration
option affects computation separately.
GDAL_CACHEMAX
Increasing the size of the I/O block cache may also help. This can be
done by setting GDAL_CACHEMAX
as described in the performance and caching topic
above.
GeoTIFF
GDAL doc: https://gdal.org/drivers/raster/gtiff.html#configuration-options
The behavior of the GTiff driver is highly configurable, including
with respect to overview creation. For full discussion, see the link
above and also the documentation for the gdaladdo
command-line utility.
GDAL_NUM_THREADS
The GTiff driver supports multi-threaded compression (default is
compression in the main thread). GDAL documentation states that it is
worth it for slow compression algorithms such as DEFLATE
or
LZMA
. Starting with GDAL 3.6, this option also enables
multi-threaded decoding when read requests intersect several
tiles/strips:
# specify the number of worker threads or ALL_CPUS
# note this also affects several other parts of GDAL
set_config_option("GDAL_NUM_THREADS", "ALL_CPUS")
COMPRESS_OVERVIEW
Raster overviews (a.k.a. pyramids) can be built with the
$buildOverviews()
method of a GDALRaster
object. It may be desirable to compress the overviews when building:
# applies to external overviews (.ovr), and internal overviews if GDAL >= 3.6
# LZW is a good default but several other compression algorithms are available
set_config_option("COMPRESS_OVERVIEW", "LZW")
PREDICTOR_OVERVIEW
Sets the predictor to use for overviews with LZW
,
DEFLATE
and ZSTD
compression. The default is
1
(no predictor), 2
is horizontal differencing
and 3
is floating point prediction.
PREDICTOR=2
is only supported for 8, 16, 32 and 64 bit
samples (support for 64 bit was added in libtiff > 4.3.0).
PREDICTOR=3
is only supported for 16, 32 and 64 bit
floating-point data.
# horizontal differencing
set_config_option("PREDICTOR_OVERVIEW", "2")
HTTP/HTTPS
GDAL doc: /vsicurl/ (HTTP/HTTPS random access)
GDAL_HTTP_CONNECTTIMEOUT
Maximum delay for connection to be established before being aborted.
# max delay for connection establishment in seconds
set_config_option("GDAL_HTTP_CONNECTTIMEOUT", "<seconds>")
GDAL_HTTP_TIMEOUT
Maximum delay for the whole request to complete before being aborted.
# max delay for whole request completion in seconds
set_config_option("GDAL_HTTP_TIMEOUT", "<seconds>")
CPL_VSIL_CURL_CHUNK_SIZE
Partial downloads (requires the HTTP server to support random reading) are done with a 16 KB granularity by default. The chunk size can be configured with this option.
If the driver detects sequential reading, it will progressively
increase the chunk size up to 128 times
CPL_VSIL_CURL_CHUNK_SIZE
(so 2 MB by default) to improve
download performance. When increasing the value of
CPL_VSIL_CURL_CHUNK_SIZE
to optimize sequential reading, it
is recommended to increase CPL_VSIL_CURL_CACHE_SIZE
as well
to 128 times the value of CPL_VSIL_CURL_CHUNK_SIZE
.
# chunk size in bytes
set_config_option("CPL_VSIL_CURL_CHUNK_SIZE", "<bytes>")
CPL_VSIL_CURL_CACHE_SIZE
A global least-recently-used cache of 16 MB shared among all
downloaded content is used, and content in it may be reused after a file
handle has been closed and reopen, during the life-time of the process
or until vsi_curl_clear_cache()
is called. The size of this
global LRU cache can be modified with:
# size in bytes defaults to 16 MB
set_config_option("CPL_VSIL_CURL_CACHE_SIZE", "<bytes>")
AWS S3 buckets
GDAL doc: /vsis3/ (AWS S3 file system handler)
AWS_NO_SIGN_REQUEST
Request signing can be disabled for public buckets that do not require an AWS account:
# public bucket no AWS account required
set_config_option("AWS_NO_SIGN_REQUEST", "YES")
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_SESSION_TOKEN
AWS_REQUEST_PAYER
If authentication is required, configure credentials with:
set_config_option("AWS_ACCESS_KEY_ID", "<value>") # key ID
set_config_option("AWS_SECRET_ACCESS_KEY", "<value>") # secret access key
# used for validation if using temporary credentials:
set_config_option("AWS_SESSION_TOKEN", "<value>") # session token
# if requester pays:
set_config_option("AWS_REQUEST_PAYER", "<value>") # requester
AWS_REGION
Sets the AWS region to which requests should be sent. Defaults to
us-east-1
.
# specify region
set_config_option("AWS_REGION", "us-west-2")
Google Cloud Storage
GDAL doc: /vsigs/ (Google Cloud Storage files)
Microsoft Azure Blob
GDAL doc: /vsiaz/ (Microsoft Azure Blob files)
Recognized filenames are of the form
/vsiaz/container/key
, where container
is the
name of the container and key
is the object “key”, i.e. a
filename potentially containing subdirectories.
AZURE_NO_SIGN_REQUEST
Controls whether requests are signed.
# public access
set_config_option("AZURE_NO_SIGN_REQUEST", "YES")
AZURE_STORAGE_CONNECTION_STRING
Credential string provided in the Access Keys section of the administrative interface, containing both the account name and a secret key.
set_config_option("AZURE_STORAGE_CONNECTION_STRING", "<my_connection_string>")
AZURE_STORAGE_ACCOUNT
AZURE_STORAGE_ACCESS_TOKEN
AZURE_STORAGE_ACCESS_KEY
AZURE_STORAGE_SAS_TOKEN
Whereas an Azure connection string contains both the account name and
key, the storage account name might be set using
AZURE_STORAGE_ACCOUNT
along with one of:
-
AZURE_STORAGE_ACCESS_TOKEN
: value obtained using Microsoft Authentication Library (MSAL) -
AZURE_STORAGE_ACCESS_KEY
: value is the secret key associated withAZURE_STORAGE_ACCOUNT
-
AZURE_STORAGE_SAS_TOKEN
: value is a Shared Access Signature -
AZURE_NO_SIGN_REQUEST=YES
to disable request signing
The AZURE_STORAGE_SAS_TOKEN
is used, for example, with
Microsoft Planetary Computer as documented at: https://planetarycomputer.microsoft.com/docs/concepts/sas/
SAS token can be requested via API with the token
endpoint:
https://planetarycomputer.microsoft.com/api/sas/v1/token/{collection_id}
or
https://planetarycomputer.microsoft.com/api/sas/v1/token/{storage_account}/{container}
# e.g., Planetary Computer access to STAC items as geoparquet datasets
# https://planetarycomputer.microsoft.com/docs/quickstarts/stac-geoparquet/
set_config_option("AZURE_STORAGE_ACCOUNT", "pcstacitems")
# SAS token is the value of "token" in the JSON returned by:
# https://planetarycomputer.microsoft.com/api/sas/v1/token/pcstacitems/items
set_config_option("AZURE_STORAGE_SAS_TOKEN", "<token>")
Other authentication methods are possible for Azure. See the GDAL documentation for details.
Microsoft Azure Data Lake
GDAL doc: /vsiadls/ (Microsoft Azure Data Lake Storage Gen2)
SOZip
GDAL doc: /vsizip/ (Seek-Optimized ZIP files, GDAL >= 3.7)
The function gdalraster::addFilesInZip()
can be used to
create new or append to existing ZIP files, potentially using the seek
optimization extension. Function arguments are available for the options
below, or the configuration options can be set to change the default
behavior.
GDAL_NUM_THREADS
The GDAL_NUM_THREADS
configuration option can be set to
ALL_CPUS
or an integer value to specify the number of
threads to use for SOZip-compressed files. This option is similarly
described above for compression in GeoTiff. Note
that this option also affects several other parts of GDAL.
CPL_SOZIP_ENABLED
Defaults to AUTO
. Determines whether the SOZip
optimization should be enabled. If AUTO
, SOZip will be
enabled for uncompressed files larger than
CPL_SOZIP_MIN_FILE_SIZE
.
# SOZip optimization defaults to AUTO
set_config_option("CPL_SOZIP_ENABLED", "YES")
CPL_SOZIP_MIN_FILE_SIZE
Defaults to 1M
. Determines the minimum file size for
SOZip to be automatically enabled. Specified in bytes, or
K
, M
or G
suffix can be used
respectively to specify a value in kilobytes, megabytes or
gigabytes.
# SOZip minimum file size
set_config_option("CPL_SOZIP_MIN_FILE_SIZE", "100K")