FIESTA
’s Small Area (SA) module was set up as a platform
to integrate with current Small Area Estimators available on CRAN
including the JoSAE
(Breidenbach 2015), sae
(Molina and Marhuenda 2015), and hbsae
(Boonstra 2012)
packages that use unit-level and area-level models such as the Empirical
Best Linear Unbiased Prediction (EBLUP) estimation strategy and the
hierarchical Bayesian estimation strategy. Rao (2003) discusses the
benefits of the EBLUP for balancing potential bias of synthetic
estimators against the instability of a direct estimator. White et al
(2021) discusses the benefits of Small Area Estimation in a hierarchical
Bayesian context, especially for forestry data. The module includes
functional steps for checking, compiling, and formatting FIA plot data
and auxiliary spatial information for input to R packages, such as
JoSAE
(Breidenbach 2015), sae
(Molina and
Marhuenda 2015), or hbsae
(Boonstra 2012) and translates
integrated package output to FIESTA
output format.
Functions in FIESTA
used for fitting Small Area
Estimators include the modSAarea
function for area
estimates and modSAtree
for tree estimates. The
modSApop
function is used to get population data needed for
small area estimation. Below is a description and table of contents for
the sections related to these functions:
FUNCTION | DESCRIPTION |
---|---|
modSApop | Creates population data for small area estimation. |
modSAarea | Produces area level estimates through small area estimation. |
modSAtree | Produces tree level estimates through small area estimation. |
The main objective of this tutorial is to demonstrate how to use
FIESTA
for generating estimates using estimators from the
JoSAE
, sae
, and hbsae
R packages.
The following examples are for generating estimates and estimated
variances using standard FIA Evaluation data from FIA’s National
database, with custom Estimation unit and Stratification information.
The examples use data from three inventory years of field measurements
in the state of Wyoming, from FIADB_1.7.2.00, last updated June 20,
2018, downloaded on June 25, 2018 and stored as internal data objects in
FIESTA.
Data Frame | Description |
---|---|
WYplt | WY plot-level data |
WYcond | WY condition-level data |
WYtree | WY tree-level data |
External data | Description |
---|---|
WYbighorn_adminbnd.shp | Polygon shapefile of WY Bighorn National Forest Administrative boundary* |
WYbighorn_districtbnd.shp | Polygon shapefile of WY Bighorn National Forest District boundaries** |
WYbighorn_forest_nonforest_250m.tif | GeoTIFF raster of predicted forest/nonforest (1/0) for stratification*** |
WYbighorn_dem_250m.img | Erdas Imagine raster of elevation change, in meters**** |
*USDA Forest Service, Automated Lands Program (ALP). 2018. S_USA.AdministrativeForest (). Description: An area encompassing all the National Forest System lands administered by an administrative unit. The area encompasses private lands, other governmental agency lands, and may contain National Forest System lands within the proclaimed boundaries of another administrative unit. All National Forest System lands fall within one and only one Administrative Forest Area.
**USDA Forest Service, Automated Lands Program (ALP). 2018. S_USA.RangerDistrict (http://data.fs.usda.gov/geodata/edw). Description: A depiction of the boundary that encompasses a Ranger District.
***Based on MODIS-based classified map resampled from 250m to 500m resolution and reclassified from 3 to 2 classes: 1:forest; 2:nonforest. Projected in Albers Conical Equal Area, Datum NAD27 (Ruefenacht et al. 2008). Clipped to extent of WYbighorn_adminbnd.shp.
****USGS National Elevation Dataset (NED), resampled from 30m resolution to 250m. Projected in Albers Conical Equal Area, Datum NAD27 (U.S. Geological Survey 2017). Clipped to boundary of WYbighorn_adminbnd.shp.
First, you’ll need to load the FIESTA
library:
Next, you’ll need to set up an “outfolder”. This is just a file path
to a folder where you’d like FIESTA
to send your data
output. For our purposes in this vignette, we have saved our outfolder
file path as the outfolder
object in a temporary directory.
We also set a few default options preferred for this vignette.
outfolder <- tempdir()
Now that we’ve loaded FIESTA
and setup our outfolder, we
can retrieve the data needed to run the examples. First, we point to
some external data and predictor layers stored in FIESTA
and derive new predictor layers using the terra
package.
# File names for external spatial data
WYbhfn <- system.file("extdata", "sp_data/WYbighorn_adminbnd.shp", package="FIESTA")
WYbhdistfn <- system.file("extdata", "sp_data/WYbighorn_districtbnd.shp", package="FIESTA")
WYbhdist.att <- "DISTRICTNA"
fornffn <- system.file("extdata", "sp_data/WYbighorn_forest_nonforest_250m.tif", package="FIESTA")
demfn <- system.file("extdata", "sp_data/WYbighorn_dem_250m.img", package="FIESTA")
# Derive new predictor layers from dem
library(terra)
dem <- rast(demfn)
slpfn <- paste0(outfolder, "/WYbh_slp.img")
slp <- terra::terrain(dem,
v = "slope",
unit = "degrees",
filename = slpfn,
overwrite = TRUE,
NAflag = -99999.0)
aspfn <- paste0(outfolder, "/WYbh_asp.img")
asp <- terra::terrain(dem,
v = "aspect",
unit = "degrees",
filename = aspfn,
overwrite = TRUE,
NAflag = -99999.0)
Next, we set up our small area domains with
FIESTA::spGetSAdoms
. For more information on how to use
this function, please see the sp
vignette included with
FIESTA
(link).
smallbnd <- WYbhdistfn
smallbnd.domain <- "DISTRICTNA"
Next, we can get our FIA plot data and set up our auxiliary data. We
can get our FIA plot data with the spGetPlots
function from
FIESTA
, which accesses data through FIA’s
DataMart. Here, data are downloaded for all states intersecting
boundary, then subset to boundary.
SApltdat <- spGetPlots(bnd = WYbhdistfn,
xy_datsource = "obj",
xy = WYplt,
xy_opts = list(xy.uniqueid = "CN", xvar = "LON_PUBLIC",
yvar = "LAT_PUBLIC", xy.crs = 4269),
datsource = "obj",
istree = TRUE,
isseed = TRUE,
dbTabs = list(plot_layer = WYplt, cond_layer = WYcond,
tree_layer = WYtree, seed_layer = WYseed),
eval = "custom",
eval_opts = list(invyrs = 2011:2013),
showsteps = TRUE,
returnxy = TRUE,
savedata_opts = savedata_options(outfolder = outfolder))
## ================================================================================
str(SApltdat, max.level = 1)
## List of 11
## $ spxy :Classes 'sf' and 'data.frame': 56 obs. of 9 variables:
## ..- attr(*, "sf_column")= chr "geometry"
## ..- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: 3 3 3 3 3 3 3 3
## .. ..- attr(*, "names")= chr [1:8] "PLT_CN" "INVYR" "STATECD" "UNITCD" ...
## $ tabs :List of 4
## $ tabIDs :List of 4
## $ pltids :'data.frame': 56 obs. of 8 variables:
## $ bnd :Classes 'sf' and 'data.frame': 3 obs. of 5 variables:
## ..- attr(*, "sf_column")= chr "geometry"
## ..- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA
## .. ..- attr(*, "names")= chr [1:4] "REGION" "FORESTNUMB" "DISTRICTNU" "DISTRICTNA"
## $ puniqueid : chr "CN"
## $ xy.uniqueid: chr "PLT_CN"
## $ pjoinid : chr "CN"
## $ states : chr "Wyoming"
## $ invyrs : int [1:3] 2011 2012 2013
## $ args :List of 13
Finally, we must generate the dataset with predictors for small area
estimation. We can do this with the spGetAuxiliary
function
from FIESTA
. Again, see the sp
vignette for
further information on this function.
rastlst.cont <- c(demfn, slpfn, aspfn)
rastlst.cont.name <- c("dem", "slp", "asp")
rastlst.cat <- fornffn
rastlst.cat.name <- "fornf"
unit_layer <- WYbhdistfn
unitvar <- "DISTRICTNA"
auxdat <- spGetAuxiliary(
xyplt = SApltdat$spxy,
uniqueid = "PLT_CN",
unit_layer = unit_layer,
unitvar = "DISTRICTNA",
rastlst.cont = rastlst.cont,
rastlst.cont.name = rastlst.cont.name,
rastlst.cont.stat = "mean",
rastlst.cont.NODATA = 0,
rastlst.cat = rastlst.cat,
rastlst.cat.name = rastlst.cat.name,
asptransform = TRUE,
rast.asp = aspfn,
keepNA = FALSE,
showext = FALSE,
savedata = FALSE
)
names(auxdat)
str(auxdat, max.level = 1)
## List of 12
## $ unitvar : chr "DISTRICTNA"
## $ pltassgn :'data.frame': 56 obs. of 15 variables:
## $ pltassgnid : chr "PLT_CN"
## $ unitarea :'data.frame': 3 obs. of 2 variables:
## $ areavar : chr "ACRES_GIS"
## $ unitzonal :'data.frame': 3 obs. of 8 variables:
## $ inputdf :Classes 'data.table' and 'data.frame': 4 obs. of 7 variables:
## ..- attr(*, ".internal.selfref")=<externalptr>
## $ prednames : chr [1:5] "dem" "slp" "asp_cos" "asp_sin" ...
## $ zonalnames : chr [1:7] "dem" "slp" "asp_cos" "asp_sin" ...
## $ predfac : chr "fornf"
## $ npixelvar : chr "npixels"
## $ predfac.levels:List of 1
modSApop
modMApop
We can create our population data for model-assisted estimation. To
do so, we use the modSApop
function in FIESTA
.
We must assign our plot data with the pltdat
argument, the
auxiliary dataset with the auxdat
argument, and set
information for our small areas with the smallbnd
and
smallbnd.domain
arguments. The spGetPlots
and
spGetAuxiliary
functions have done much of the hard work
for us so far, so we can just run a simple call to
modSApop
:
SApopdat <- modSApop(pltdat = SApltdat,
auxdat = auxdat,
smallbnd = WYbhdistfn,
smallbnd.domain = smallbnd.domain)
Note that the modSApop
function returns a list with lots
of information and data for us to use. For a quick look at what this
list includes we can use the str
function:
str(SApopdat, max.level = 1)
## List of 27
## $ module : chr "SA"
## $ smallbnd :Classes 'sf' and 'data.frame': 3 obs. of 6 variables:
## ..- attr(*, "sf_column")= chr "geometry"
## ..- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA
## .. ..- attr(*, "names")= chr [1:5] "REGION" "FORESTNUMB" "DISTRICTNU" "DISTRICTNA" ...
## $ smallbnd.domain: chr "DISTRICTNA"
## $ condx :Classes 'data.table' and 'data.frame': 66 obs. of 4 variables:
## ..- attr(*, "sorted")= chr [1:2] "PLT_CN" "CONDID"
## ..- attr(*, ".internal.selfref")=<externalptr>
## $ pltcondx :Classes 'data.table' and 'data.frame': 66 obs. of 30 variables:
## ..- attr(*, ".internal.selfref")=<externalptr>
## ..- attr(*, "sorted")= chr "DSTRBCD1"
## $ pltassgnx :Classes 'data.table' and 'data.frame': 56 obs. of 9 variables:
## ..- attr(*, ".internal.selfref")=<externalptr>
## ..- attr(*, "sorted")= chr "PLT_CN"
## $ pltassgnid : chr "PLT_CN"
## $ cuniqueid : chr "PLT_CN"
## $ condid : chr "CONDID"
## $ ACI.filter : chr "COND_STATUS_CD == 1"
## $ dunitarea :Classes 'data.table' and 'data.frame': 3 obs. of 2 variables:
## ..- attr(*, ".internal.selfref")=<externalptr>
## ..- attr(*, "sorted")= chr "DOMAIN"
## $ areavar : chr "ACRES_GIS"
## $ areaunits : chr "acres"
## $ dunitvar : chr "DOMAIN"
## $ dunitlut :Classes 'data.table' and 'data.frame': 3 obs. of 9 variables:
## ..- attr(*, ".internal.selfref")=<externalptr>
## ..- attr(*, "sorted")= chr "DOMAIN"
## $ plotsampcnt :'data.frame': 2 obs. of 3 variables:
## $ condsampcnt :'data.frame': 4 obs. of 3 variables:
## $ states : chr "Wyoming"
## $ invyrs :List of 1
## $ estvar.area : chr "CONDPROP_ADJ"
## $ adj : chr "plot"
## $ treex :'data.frame': 1691 obs. of 22 variables:
## $ tuniqueid : chr "PLT_CN"
## $ adjtree : logi FALSE
## $ seedx :'data.frame': 102 obs. of 11 variables:
## $ prednames : chr [1:5] "dem" "slp" "asp_cos" "asp_sin" ...
## $ predfac : chr "fornf"
Now that we’ve created our population dataset, we can move on to estimation.
modSAarea
First, we can set up our predictors as a vector:
all_preds <- c("slp", "dem", "asp_cos", "asp_sin", "fornf")
Next, we fit the unit-level EBLUP with the JoSAE
R
package.
area1 <- modSAarea(
SApopdatlst = SApopdat, # pop - population calculations for WY, post-stratification
prednames = all_preds, # est - character vector of predictors to be used in the model
SApackage = "JoSAE", # est - character string of the R package to do the estimation
SAmethod = "unit" # est - method of small area estimation. Either "unit" or "area"
)
## REML estimate of variance ratio: 0.09281
## numerical integration of f(x): 24.85 with absolute error < 7.9e-06
## numerical integration of x*f(x): 16.63 with absolute error < 8.2e-06
## posterior mean for variance ratio: 0.6693
The modSAarea
function outputs a list, and we can see
our estimates and estimation method.
str(area1, max.level = 1)
## List of 3
## $ est :Classes 'data.table' and 'data.frame': 3 obs. of 3 variables:
## ..- attr(*, ".internal.selfref")=<externalptr>
## ..- attr(*, "sorted")= chr "DOMAIN"
## $ raw :List of 13
## $ multest:'data.frame': 3 obs. of 23 variables:
area1$est
## Key: <DOMAIN>
## DOMAIN Estimate Percent Sampling Error
## <char> <num> <num>
## 1: Medicine Wheel Ranger District 224320.1 16.68
## 2: Powder River Ranger District 149729.4 26.23
## 3: Tongue Ranger District 284851.4 13.15
area1$raw$SAmethod
## [1] "unit"
We can also look further into the raw
list below:
str(area1$raw, max.level = 1)
## List of 13
## $ dunit_totest :'data.frame': 3 obs. of 18 variables:
## $ domdat :'data.frame': 66 obs. of 15 variables:
## $ module : chr "SA"
## $ esttype : chr "AREA"
## $ SApackage : chr "JoSAE"
## $ SAmethod : chr "unit"
## $ estnm : chr "est"
## $ predselect.unit:'data.frame': 1 obs. of 8 variables:
## $ predselect.area:'data.frame': 1 obs. of 8 variables:
## $ SAobjlst :List of 1
## $ estvar : chr "AREA_ADJ"
## $ areaunits : chr "acres"
## $ estunits : chr "acres"
In this example, we fit an area level EBLUP with JoSAE
,
while only using slp as a predictor. We use only one predictor in the
area level model because at the area level, we only have three rows in
our dataset. Since we also have a random effect term, the model we fit
can have a maximum of one predictor without being exactly singular.
area2 <- modSAarea(
SApopdatlst = SApopdat, # pop - population calculations for WY, post-stratification
prednames = "slp", # est - character vector of predictors to be used in the model
SApackage = "JoSAE", # est - character string of the R package to do the estimation
SAmethod = "area", # est - method of small area estimation. Either "unit" or "area"
multest = TRUE
)
## REML estimate of variance ratio: 0.09973
## numerical integration of f(x): 23.3 with absolute error < 1.1e-05
## numerical integration of x*f(x): 16.04 with absolute error < 7.3e-06
## posterior mean for variance ratio: 0.6883
We again can see our estimates. Notably, we have slightly larger percent sampling errors to the unit-level model fit in Example 2. This is likely due to only being able to incorporate one predictor’s worth of information to the model.
area2$est
## Key: <DOMAIN>
## DOMAIN Estimate Percent Sampling Error
## <char> <num> <num>
## 1: Medicine Wheel Ranger District 246496.3 19.80
## 2: Powder River Ranger District 141310.9 29.94
## 3: Tongue Ranger District 295969.3 14.01
Since FIESTA
will attempt fit all models when running
modSAarea
, we can look at all the different modeling
approaches and their estimates with the multest
object.
area2$multest
## DOMAIN LARGEBND NBRPLT DIR DIR.se JU.Synth
## 1 Medicine Wheel Ranger District 1 16 0.6510417 0.11592491 0.6162340
## 2 Powder River Ranger District 1 19 0.3947368 0.11034865 0.5908995
## 3 Tongue Ranger District 1 21 0.7500000 0.09128709 0.6022887
## JU.GREG JU.GREG.se JU.EBLUP JU.EBLUP.se.1 hbsaeU hbsaeU.se JFH
## 1 0.6572931 0.11784552 0.6371466 0.1006813 0.6396496 0.10232397 0.6762099
## 2 0.4003265 0.11170013 0.4654312 0.1221846 0.4596133 0.10878798 0.4226600
## 3 0.7434288 0.08878553 0.6986757 0.1061928 0.7021458 0.09930096 0.7152836
## JFH.se JA.synth JA.synth.se saeA saeA.se hbsaeA hbsaeA.se
## 1 0.1339159 0.7295675 0.2534818 0.6762099 0.1339159 NA NA
## 2 0.1265380 0.4879924 0.2458881 0.4226600 0.1265380 NA NA
## 3 0.1001961 0.5965936 0.2044029 0.7152836 0.1001961 NA NA
## NBRPLT.gt0 AOI AREAUSED
## 1 11 1 364526.4
## 2 9 1 334337.0
## 3 17 1 413778.9
Notably, the hbsae
models returned NAs with this model,
likely due to computational issues with the integral they compute. Not
to worry, though, we will fit models with hbsae
in the next
example.
FIESTA
also supports the use of hierarchical Bayesian
(HB) models through the hbsae
package as an alternative to
EBLUPs. These models use the same model specification as the EBLUP,
however they fit the model using a hierarchical Bayesian framework, and
get parameter estimates through numerical integration. Luckily, we do
not have to take an integral ourselves to fit these models, we can just
change the SApackage
argument.
area3 <- modSAarea(
SApopdatlst = SApopdat, # pop - population calculations for WY, post-stratification
prednames = all_preds, # est - character vector of predictors to be used in the model
SApackage = "hbsae", # est - character string of the R package to do the estimation
SAmethod = "unit", # est - method of small area estimation. Either "unit" or "area"
multest = TRUE
)
## REML estimate of variance ratio: 0.09281
## numerical integration of f(x): 24.85 with absolute error < 7.9e-06
## numerical integration of x*f(x): 16.63 with absolute error < 8.2e-06
## posterior mean for variance ratio: 0.6693
We can again check our estimates, small area method, and small area package.
area3$est
## Key: <DOMAIN>
## DOMAIN Estimate Percent Sampling Error
## <char> <num> <num>
## 1: Medicine Wheel Ranger District 226061.8 16.01
## 2: Powder River Ranger District 147433.8 23.40
## 3: Tongue Ranger District 285934.0 12.89
area3$raw$SAmethod
## [1] "unit"
area3$raw$SApackage
## [1] "hbsae"
Notably, we can also set priors on the ratio of between and within
area variation with hbsae
. By default, FIESTA
uses a weakly informative half-Cauchy prior on this parameter as
suggested by White et al (2021), but in this example we will fit the
same model as before, but with a flat prior.
area4 <- modSAarea(
SApopdatlst = SApopdat, # pop - population calculations for WY, post-stratification
prednames = all_preds, # est - character vector of predictors to be used in the model
SApackage = "hbsae", # est - character string of the R package to do the estimation
SAmethod = "unit", # est - method of small area estimation. Either "unit" or "area"
prior = function(x) 1 # est - prior on ratio of between and within area variation
)
## REML estimate of variance ratio: 0.09281
## numerical integration of f(x):
Let’s check our results compared to Example 3 (same model with half-Cauchy prior)
area3$est
## Key: <DOMAIN>
## DOMAIN Estimate Percent Sampling Error
## <char> <num> <num>
## 1: Medicine Wheel Ranger District 226061.8 16.01
## 2: Powder River Ranger District 147433.8 23.40
## 3: Tongue Ranger District 285934.0 12.89
area4$est
## Key: <DOMAIN>
## DOMAIN Estimate Percent Sampling Error
## <char> <num> <num>
## 1: Medicine Wheel Ranger District NA NA
## 2: Powder River Ranger District NA NA
## 3: Tongue Ranger District NA NA
Due to rounding we do in FIESTA
, we see the same result.
However, the estimates are slightly different. We can see this with the
model objects supplied in the output list from FIESTA
:
JoSAE
unit level EBLUP
FIESTA
supports model variable selection via the elastic
net. To use model selection, we set the modelselect
argument to TRUE
.
area5 <- modSAarea(
SApopdatlst = SApopdat, # pop - population calculations for WY, post-stratification
prednames = all_preds, # est - character vector of predictors to be used in the model
SApackage = "JoSAE", # est - character string of the R package to do the estimation
SAmethod = "unit", # est - method of small area estimation. Either "unit" or "area"
modelselect = TRUE # est - elastic net variable selection
)
## REML estimate of variance ratio: 0.09281
## numerical integration of f(x): 24.85 with absolute error < 7.9e-06
## numerical integration of x*f(x): 16.63 with absolute error < 8.2e-06
## posterior mean for variance ratio: 0.6693
We can now look at estimates with our subset of selected predictors and the predictors that were selected.
area5$est
## Key: <DOMAIN>
## DOMAIN Estimate Percent Sampling Error
## <char> <num> <num>
## 1: Medicine Wheel Ranger District 224320.1 16.68
## 2: Powder River Ranger District 149729.4 26.23
## 3: Tongue Ranger District 284851.4 13.15
area5$raw$predselect.unit
## LARGEBND LARGEBND TOTAL slp dem asp_cos asp_sin
## 1 SApopdat 1 1 0.004745915 -0.0001050103 0.03462268 -0.06178897
## fornf2
## 1 -0.3936355
modSAtree
We will set our estimate variable and filter now. We set
estvar
to "VOLCFNET"
for net cubic foot
volume, and filter with estvar.filter
set to
"STATUSCD == 1"
so we only consider live trees in our
estimation.
estvar <- "VOLCFNET"
live_trees <- "STATUSCD == 1"
Now, we can look at the total net cubic-foot volume of live trees,
filtered for live trees that are at least 5 inches in diameter. We use
the estvar
and live_trees
objects defined
above to set our response variable and filter, and then compute the
estimates.
tree1 <- modSAtree(
SApopdatlst = SApopdat, # pop - population calculations for WY, post-stratification
prednames = all_preds, # est - character vector of predictors to be used in the model
SApackage = "JoSAE", # est - character string of the R package to do the estimation
SAmethod = "unit", # est - method of small area estimation. Either "unit" or "area"
landarea = "FOREST", # est - forest land filter
estvar = estvar, # est - net cubic-foot volume
estvar.filter = live_trees # est - live trees only
)
## REML estimate of variance ratio: 0.01551
## numerical integration of f(x): 48.34 with absolute error < 2.6e-05
## numerical integration of x*f(x): 20.58 with absolute error < 1.3e-06
## posterior mean for variance ratio: 0.4257
With both modSAtree
and modSAarea
,
FIESTA
will return your requested estimates specified with
the SApackage
and SAmethod
arguments in the
est
item, but will return all possible estimates in the
multest
item. We can see these estimates below:
tree1$est
## Key: <DOMAIN>
## DOMAIN Estimate Percent Sampling Error
## <char> <num> <num>
## 1: Medicine Wheel Ranger District 363533828 24.03
## 2: Powder River Ranger District 340071161 36.20
## 3: Tongue Ranger District 445171134 25.10
tree1$multest
## DOMAIN LARGEBND NBRPLT DIR DIR.se JU.Synth
## 1 Medicine Wheel Ranger District 1 16 1175.1682 356.2906 967.9523
## 2 Powder River Ranger District 1 19 893.9355 315.0740 1090.1786
## 3 Tongue Ranger District 1 21 1340.8077 262.0320 1035.6947
## JU.GREG JU.GREG.se JU.EBLUP JU.EBLUP.se.1 hbsaeU hbsaeU.se JFH JFH.se
## 1 1122.3465 297.2844 997.2772 239.6662 1043.9097 252.6630 NA NA
## 2 761.5703 252.7649 1017.1510 368.2101 907.4927 260.3908 NA NA
## 3 1215.3733 240.0583 1075.8671 270.0696 1134.7262 236.6279 NA NA
## JA.synth JA.synth.se saeA saeA.se hbsaeA hbsaeA.se NBRPLT.gt0 AOI
## 1 NA NA 1057.755 353.8882 NA NA 10 1
## 2 NA NA 963.751 419.8500 NA NA 8 1
## 3 NA NA 1356.026 471.4950 NA NA 16 1
## AREAUSED
## 1 364526.4
## 2 334337.0
## 3 413778.9
Notably, the area level models are NA in for this model, as there were more predictors than degrees of freedom in the model at the area level.
We can bring the modelselect
parameter into play with
modSAtree
as well as modSAarea
. In the below
code, we set modelselect = TRUE
to use the elastic net
variable selection before fitting the model.
tree2 <- modSAtree(
SApopdatlst = SApopdat, # pop - population calculations for WY, post-stratification
prednames = all_preds, # est - character vector of predictors to be used in the model
SApackage = "JoSAE", # est - character string of the R package to do the estimation
SAmethod = "unit", # est - method of small area estimation. Either "unit" or "area"
landarea = "FOREST", # est - forest land filter
estvar = estvar, # est - net cubic-foot volume
estvar.filter = live_trees, # est - live trees only
modelselect = TRUE
)
## REML estimate of variance ratio: 0.01633
## numerical integration of f(x): 47.54 with absolute error < 2e-05
## numerical integration of x*f(x): 20.28 with absolute error < 6.4e-06
## posterior mean for variance ratio: 0.4265
We now can look at the selected predictors and estimates.
tree2$raw$predselect.unit
## LARGEBND LARGEBND TOTAL slp dem asp_cos asp_sin fornf2
## 1 SApopdat 1 1 -63.70287 0.558082 0 -549.9468 -1176.35
tree2$est
## Key: <DOMAIN>
## DOMAIN Estimate Percent Sampling Error
## <char> <num> <num>
## 1: Medicine Wheel Ranger District 363864643 23.85
## 2: Powder River Ranger District 338811412 35.84
## 3: Tongue Ranger District 446043896 24.74
JoSAE
We can also use different response variables to estimate, and in this
example we chose basal area. We also returned titles by using
returntitle = TRUE
.
tree3 <- modSAtree(
SApopdatlst = SApopdat, # pop - population calculations for WY, post-stratification
prednames = all_preds, # est - character vector of predictors to be used in the model
SApackage = "JoSAE", # est - character string of the R package to do the estimation
SAmethod = "unit", # est - method of small area estimation. Either "unit" or "area"
landarea = "FOREST", # est - forest land filter
estvar = "BA", # est - net cubic-foot volume
estvar.filter = live_trees, # est - live trees only
returntitle = TRUE
)
## REML estimate of variance ratio: 0.03877
## numerical integration of f(x): 21.9 with absolute error < 8.2e-06
## numerical integration of x*f(x): 10.85 with absolute error < 4.8e-06
## posterior mean for variance ratio: 0.4955
Now we can take a look at our estimates:
tree3$est
## Key: <DOMAIN>
## DOMAIN Estimate Percent Sampling Error
## <char> <num> <num>
## 1: Medicine Wheel Ranger District 22043124 21.61
## 2: Powder River Ranger District 18865169 32.06
## 3: Tongue Ranger District 29693977 19.54
and see our title list since we set returntitle
to
TRUE
.
tree3$titlelst
## $title.estpse
## [1] "Basal area of live trees (at least 1 inch dia), in square feet, and percent sampling error on forest land DOMAIN"
##
## $title.yvar
## [1] "Basal area, in square feet"
##
## $title.estvar
## [1] "Basal area of live trees (at least 1 inch dia)"
##
## $title.unitvar
## [1] "DOMAIN"
##
## $title.ref
## [1] "Wyoming, 2011-2013"
##
## $outfn.estpse
## [1] "tree_BA_forestland"
##
## $outfn.rawdat
## [1] "tree_BA_forestland_rawdata"
##
## $title.tot
## [1] "Basal area of live trees (at least 1 inch dia), in square feet, on forest land; Wyoming, 2011-2013"
##
## $title.unitsn
## [1] "square feet"
sae
Now, we can of course fit a different model to estimate basal area.
In this case, we choose to use dem to predict basal area with an
area-level EBLUP from the sae
package.
tree4 <- modSAtree(
SApopdatlst = SApopdat, # pop - population calculations for WY, post-stratification
prednames = "dem", # est - character vector of predictors to be used in the model
SApackage = "sae", # est - character string of the R package to do the estimation
SAmethod = "area", # est - method of small area estimation. Either "unit" or "area"
landarea = "FOREST", # est - forest land filter
estvar = "BA", # est - net cubic-foot volume
estvar.filter = live_trees, # est - live trees only
returntitle = TRUE
)
## REML estimate of variance ratio: 0.02124
## numerical integration of f(x): 24.4 with absolute error < 6e-06
## numerical integration of x*f(x): 10.37 with absolute error < 5.6e-06
## posterior mean for variance ratio: 0.4248
Now we can take a look at our estimates.
tree4$est
## Key: <DOMAIN>
## DOMAIN Estimate Percent Sampling Error
## <char> <num> <num>
## 1: Medicine Wheel Ranger District 22014434 32.81
## 2: Powder River Ranger District 17546600 41.96
## 3: Tongue Ranger District 35374035 27.57
One may want to easily save FIESTA
output to your
computer, rather than just having it live in the R environment.
FIESTA
makes this easy with the use of the
savedata
and savedata_opts
arguments. If
savedata = TRUE
, by default the output will be saved to
your working directory, but we can set an outfolder
in
savedata_opts
to choose where the data should be saved.
There are many other savedata_opts
, and these can be seen
by looking at the help file for the savedata_options
function (use help(savedata_options)
with
FIESTA
loaded in your R environment).
tree5 <- modSAtree(
SApopdatlst = SApopdat, # pop - population calculations for WY, post-stratification
prednames = all_preds, # est - character vector of predictors to be used in the model
SApackage = "JoSAE", # est - character string of the R package to do the estimation
SAmethod = "unit", # est - method of small area estimation. Either "unit" or "area"
landarea = "FOREST", # est - forest land filter
estvar = "BA", # est - net cubic-foot volume
estvar.filter = live_trees, # est - live trees only
savedata = TRUE,
savedata_opts = savedata_options(
outfolder = outfolder
)
)
## REML estimate of variance ratio: 0.03877
## numerical integration of f(x): 21.9 with absolute error < 8.2e-06
## numerical integration of x*f(x): 10.85 with absolute error < 4.8e-06
## posterior mean for variance ratio: 0.4955
We can also of course use a different model to predict basal area,
and in this case we use a HB unit level model from hbsae
.
We also save to an outfolder again, this time giving a file name prefix
with the outfn.pre
arguement.
tree6 <- modSAtree(
SApopdatlst = SApopdat, # pop - population calculations for WY, post-stratification
prednames = all_preds, # est - character vector of predictors to be used in the model
SApackage = "hbsae", # est - character string of the R package to do the estimation
SAmethod = "unit", # est - method of small area estimation. Either "unit" or "area"
landarea = "FOREST", # est - forest land filter
estvar = "BA", # est - net cubic-foot volume
estvar.filter = live_trees, # est - live trees only
savedata = TRUE,
savedata_opts = savedata_options(
outfolder = outfolder,
outfn.pre = "HB_unit"
)
)
## REML estimate of variance ratio: 0.03877
## numerical integration of f(x): 21.9 with absolute error < 8.2e-06
## numerical integration of x*f(x): 10.85 with absolute error < 4.8e-06
## posterior mean for variance ratio: 0.4955
We can see the files in the outfolder here:
list.files(outfolder, pattern = "HB")
## character(0)
And the estimates here:
tree6$est
## Key: <DOMAIN>
## DOMAIN Estimate Percent Sampling Error
## <char> <num> <num>
## 1: Medicine Wheel Ranger District 22549224 21.66
## 2: Powder River Ranger District 17699101 26.52
## 3: Tongue Ranger District 30499650 17.10
Breidenbach J. 2018. JoSAE: Unit-Level and Area-Level Small Area Estimation.
Molina I, Marhuenda Y. 2015. sae: An R Package for Small Area Estimation. The R Journal, 7(1), 81–98. https://journal.r-project.org/archive/2015/RJ-2015-007/RJ-2015-007.pdf.
Rao, J.N.K. 2003. Small Area Estimation. Wiley, Hoboken, New Jersey.