11 research outputs found
A simulated ‘sandbox’ for exploring the modifiable areal unit problem in aggregation and disaggregation
We present a spatial testbed of simulated boundary data based on a set of very high-resolution census-based areal units surrounding Guadalajara, Mexico. From these input areal units, we simulated 10 levels of spatial resolutions, ranging from levels with 5,515–52,388 units and 100 simulated zonal configurations for each level – totalling 1,000 simulated sets of areal units. These data facilitate interrogating various realizations of the data and the effects of the spatial coarseness and zonal configurations, the Modifiable Areal Unit Problem (MAUP), on applications such as model training, model prediction, disaggregation, and aggregation processes. Further, these data can facilitate the production of spatially explicit, non-parametric estimates of confidence intervals via bootstrapping. We provide a pre-processed version of these 1,000 simulated sets of areal units, meta- and summary data to assist in their use, and a code notebook with the means to alter and/or reproduce these data
Assessing the influence of landscape conservation and protected areas on social wellbeing using random forest machine learning
Abstract The urgency of interconnected social-ecological dilemmas such as rapid biodiversity loss, habitat loss and fragmentation, and the escalating climate crisis have led to increased calls for the protection of ecologically important areas of the planet. Protected areas (PA) are considered critical to address these dilemmas although growing divides in wellbeing can exacerbate conflict around PAs and undermine effectiveness. We investigate the influence of proximity to PAs on wellbeing outcomes. We develop a novel multi-dimensional index of wellbeing for households and across Africa and use Random Forest Machine Learning techniques to assess the importance score of households’ proximity to protected areas on their wellbeing outcomes compared with the importance scores of an array of other social, environmental, and local and national governance factors. This study makes important contributions to the conservation literature, first by expanding the ways in which wellbeing is measured and operationalized, and second, by providing additional empirical support for recent evidence that proximity to PAs is an influential factor affecting observed wellbeing outcomes, albeit likely through different pathways than the current literature suggests
Determining Global Population Distribution: Methods, Applications and Data
Evaluating the total numbers of people at risk from infectious disease
in the world requires not just tabular population data, but data that
are spatially explicit and global in extent at a moderate resolution.
This review describes the basic methods for constructing estimates of
global population distribution with attention to recent advances in
improving both spatial and temporal resolution. To evaluate the optimal
resolution for the study of disease, the native resolution of the
data inputs as well as that of the resulting outputs are discussed.
Assumptions used to produce different population data sets are also
described, with their implications for the study of infectious disease.
Lastly, the application of these population data sets in studies to
assess disease distribution and health impacts is reviewed. The data
described in this review are distributed in the accompanying DVD.JRC.H.3-Global environement monitorin
Gridded population maps informed by different built settlement products
The spatial distribution of humans on the earth is critical knowledge that informs many disciplines and is available in a spatially explicit manner through gridded population techniques. While many approaches exist to produce specialized gridded population maps, little has been done to explore how remotely sensed, built-area datasets might be used to dasymetrically constrain these estimates. This study presents the effectiveness of three different high-resolution built area datasets for producing gridded population estimates through the dasymetric disaggregation of census counts in Haiti, Malawi, Madagascar, Nepal, Rwanda, and Thailand. Modeling techniques include a binary dasymetric redistribution, a random forest with a dasymetric component, and a hybrid of the previous two. The relative merits of these approaches and the data are discussed with regards to studying human populations and related spatially explicit phenomena. Results showed that the accuracy of random forest and hybrid models was comparable in five of six countries
Evaluating nighttime lights and population distribution as proxies for mapping anthropogenic CO2 emission in Vietnam, Cambodia and Laos
Tracking spatiotemporal changes in GHG emissions is key to successful implementation of the United Nations Framework Convention on Climate Change (UNFCCC). And while emission inventories often provide a robust tool to track emission trends at the country level, subnational emission estimates are often not reported or reports vary in robustness as the estimates are often dependent on the spatial modeling approach and ancillary data used to disaggregate the emission inventories. Assessing the errors and uncertainties of the subnational emission estimates is fundamentally challenging due to the lack of physical measurements at the subnational level. To begin addressing the current performance of modeled gridded CO2 emissions, this study compares two common proxies used to disaggregate CO2 emission estimates. We use a known gridded CO2 model based on satellite-observed nighttime light (NTL) data (Open Source Data Inventory for Anthropogenic CO2, ODIAC) and a gridded population dataset driven by a set of ancillary geospatial data. We examine the association at multiple spatial scales of these two datasets for three countries in Southeast Asia: Vietnam, Cambodia and Laos and characterize the spatiotemporal similarities and differences for 2000, 2005, and 2010. We specifically highlight areas of potential uncertainty in the ODIAC model, which relies on the single use of NTL data for disaggregation of the non-point emissions estimates. Results show, over time, how a NTL-based emissions disaggregation tends to concentrate CO2 estimates in different ways than population-based estimates at the subnational level. We discuss important considerations in the disconnect between the two modeled datasets and argue that the spatial differences between data products can be useful to identify areas affected by the errors and uncertainties associated with the NTL-based downscaling in a region with uneven urbanization rates
Towards an improved large-scale gridded population dataset: a Pan-European study on the integration of 3D settlement data into population modelling
Large-scale gridded population datasets available at the global or continental scale have become an important source of information in applications related to sustainable development. In recent years, the emergence of new population models has leveraged the inclusion of more accurate and spatially detailed proxy layers describing the built-up environment (e.g., built-area and building footprint datasets), enhancing the quality, accuracy and spatial resolution of existing products. However, due to the consistent lack of vertical and functional information on the built-up environ-ment, large-scale gridded population datasets that rely on existing built-up land proxies still report large errors of under-and overestimation, especially in areas with predominantly high-rise buildings or industrial/commercial areas, respectively. This research investigates, for the first time, the potential contributions of the new World Settlement Footprint—3D (WSF3D) dataset in the field of large-scale population modelling. First, we combined a Random Forest classifier with spatial metrics derived from the WSF3D to predict the industrial versus non-industrial use of settlement pixels at the Pan-European scale. We then examined the effects of including volume and settlement use information into frameworks of dasymetric population modelling. We found that the proposed classification method can predict industrial and non-industrial areas with overall accuracies and a kappa-coefficient of ~84% and 0.68, respectively. Additionally, we found that both, integrating volume and settlement use information considerably increased the accuracy of population estimates between 10% and 30% over commonly employed models (e.g., based on a binary settlement mask as input), mainly by eliminating systematic large overestimations in industrial/commercial areas. While the proposed method shows strong promise for overcoming some of the main limitations in large-scale population modelling, future research should focus on improving the quality of the WFS3D dataset and the classification method alike, to avoid the false detection of built-up settlements and to reduce misclassification errors of industrial and high-rise buildings.</p
Recommended from our members
Global Infrastructure: The Potential of SRTM Data to Break New Ground
The Shuttle Radar Topography Mission (SRTM) data set presents a unique opportunity to obtain a global, cloud-transparent instantaneous snapshot of imagery from urban and suburban regions, together with collocated topographic information, which can be used to characterize building types,
land use, and other key population variables. The SRTM data set in synergy with other global remote sensing data sets, such as Landsat 7, ASTER and DMSP-OLS nighttime imagery, can be used to derive a number of major (but not ALL) parameters constituting a significant part of the global infrastructure data set consistently at the time period during which the SRTM data set was
collected (February 2000). The long-term goal is to create a database of global infrastructure which is useful for a multitude of physical and social scientific and operational applications. The general definition of infrastructure can be quite wide and includes all man-made structures and/or natural structures that are modified for human use. The resulting product will ultimately present a unique resource which can be used as a global reference of the state of the world in the year 2000 for future studies of urbanization, infrastructure, population and land use change
Global spatio-temporally harmonised datasets for producing high-resolution gridded population distribution datasets
Multi-temporal, globally consistent, high-resolution human population datasets provide consistent and comparable population distributions in support of mapping sub-national heterogeneities in health, wealth, and resource access, and monitoring change in these over time. The production of more reliable and spatially detailed population datasets is increasingly necessary due to the importance of improving metrics at sub-national and multi-temporal scales. This is in support of measurement and monitoring of UN Sustainable Development Goals and related agendas. In response to these agendas, a method has been developed to assemble and harmonise a unique, open access, archive of geospatial datasets. Datasets are provided as global, annual time series, where pertinent at the timescale of population analyses and where data is available, for use in the construction of population distribution layers. The archive includes sub-national census-based population estimates, matched to a geospatial layer denoting administrative unit boundaries, and a number of co-registered gridded geospatial factors that correlate strongly with population presence and density. Here, we describe these harmonised datasets and their limitations, along with the production workflow. Further, we demonstrate applications of the archive by producing multi-temporal gridded population outputs for Africa and using these to derive health and development metrics