Stratification and sample allocation for reference burned area data

The Fire_cci team has published a new article in Remote Sensing of Environment (DOI 10.1016/j.rse.2017.06.041).

Statistical estimation protocols are one of the key means to ensure that independent and objective information on product accuracy is communicated to end-users. Methods for validating burned area products have been developed based on a probability sample of a space by time partitioning of the population. We extend this basic methodology to improve stratification and sample allocation, key elements of a sampling design used to collect burned area reference data. We developed and evaluated an approach to partition each year and biome into low and high burned area (BA) strata. Because the threshold used to separate the sampling units into low and high BA can vary by year and biome, this approach offers a more targeted stratification than used in previous studies for which a common threshold was applied to all biomes. A hypothetical population of validation data was then used to quantitatively compare the precision of accuracy estimates derived from different stratification and sample size allocation options. We evaluated two options that had been previously examined in the BA validation literature, and extended previous studies by adding two new options specifically developed for ratio estimates. Stratification based on mapped BA reduced standard errors of the global burned area accuracy estimates from one-half to one-eighth relative to standard errors of simple random sampling. Stratifying by mapped BAwas also found to reduce standard errors of accuracy estimates for most year by biome strata indicating that this advantage of stratification and sample allocation applies generally to a range of conditions (i.e., biomes and years). The most precise estimates were obtained using a sample size per stratum allocation 

nh∝NhBA−h where Nh is the number of units in stratum h and BA−h is the mean mapped BA for stratum h. The best sampling design from our analyses was then used to select a set of 1,000 samples from a hypothetical population of validation data and confidence intervals were computed for each sample. Close to 95% of these confidence intervals contained the true population value thus confirming the validity of confidence intervals produced from the estimates and standard errors.