5,559 research outputs found

    Assessment of hydrological and seasonal controls over the nitrate flushing from a forested watershed using a data mining technique

    Get PDF
    A data mining, regression tree algorithm M5 was used to review the role of mutual hydrological and seasonal settings which control the streamwater nitrate flushing during hydrological events within a forested watershed in the southwestern part of Slovenia, characterized by distinctive flushing, almost torrential hydrological regime. The basis for the research was an extensive dataset of continuous, high frequency measurements of seasonal meteorological conditions, watershed hydrological responses and streamwater nitrate concentrations. The dataset contained 16 recorded hydrographs occurring in different seasonal and hydrological conditions. Based on predefined regression tree pruning criteria, a comprehensible regression tree model was obtained in the sense of the domain knowledge, which was able to adequately describe most of the streamwater nitrate concentration variations (RMSE=1.02mg/l-N; r=0.91). The attributes which were found to be the most descriptive in the sense of streamwater nitrate concentrations were the antecedent precipitation index (API) and air temperatures in the preceding periods. The model was most successful in describing streamwater concentrations in the range 1-4 mg/l-N, covering large proportion of the dataset. The model performance was little worse in the periods of high streamwater nitrate concentration peaks during the summer hydrographs (up to 7 mg/l-N) but poor during the autumn hydrograph (up to 14 mg/l-N) related to highly variable hydrological conditions, which would require a less robust regression tree model based on the extended dataset

    Predictive mapping of wetland types and associated soils through digital elevation model analyses in the Canadian Prairie Pothole Region

    Get PDF
    Effective management strategies are needed to control phosphorus loading of prairie watersheds that contribute to the eutrophication issues of Lake Winnipeg. Prairie Pothole Region (PPR) wetlands provide many ecosystems services including reducing nutrient mobility. Preferential conservation of PPR wetlands with calcium carbonate (CaCO3)-enriched soils may be a more effective strategy for controlling phosphorus loading, as these soils have greater potential to retain phosphorus from agricultural runoff. The spatial distribution of CaCO3-enriched wetland soils is controlled by hydrologic processes that may be modellable using high-resolution digital elevation models (DEMs). Two modelling approaches were tested to map spatial distributions of wetlands and wetland soils expected to be enriched with CaCO3. The models were trained and tested with wetland salinity and soil profile information collected at three Saskatchewan PPR sites, near Swift Current, St. Denis, and Smith Creek. The first model was developed to approximate landscape-scale hydrologic processes from high-resolution DEMs to predict the distributions of fresh and solute-rich wetlands; the solute-rich wetlands represent wetlands expected to have CaCO3-enriched soils. Spill channel connections between wetlands were modelled to characterize wetlands in terms of the runoff contributions they receive, their potential for contributing runoff downslope, and their relative position within the landscape; solute-richness predictions were based on these characteristics. This model was successful and achieved acceptable predictive accuracies based on external validation tests. Digital soil mapping (DSM) methodologies were tested for predicting the spatial distribution of wetland soil classes within PPR landscapes. Target soil classes were defined by hydropedological units that reflect differences in soil CaCO3 enrichment. Multiple machine-learning techniques were tested, which incorporated many topographic attributes derived from the DEMs as predictor variables, including knowledge-based topographic attributes developed specifically to characterize the PPR’s morphology. Certain DSM models achieved acceptable predictive accuracy based on external validation tests and mapped soils in expected distributions, but none predicted the occurrence of wetlands with CaCO3-enriched soils distributed throughout their basins. Both modelling approaches could potentially be used to 1) identify wetlands with CaCO3-enriched soils to target for conservation efforts to maximize phosphorus retention and 2) create upscaled estimates of phosphorus retention across the PPR

    Characterizing Clustering Models of High-dimensional Remotely Sensed Data Using Subsampled Field-subfield Spatial Cross-validated Random Forests

    Get PDF
    Clustering models are regularly used to construct meaningful groups of observations within complex datasets, and they are an exceptional tool for spatial exploratory analysis. The clusters detected in a recent spatio-temporal cluster analysis of leaf area index (LAI) in the Columbia River Basin (CRB) require further investigation since they are only derived using a single greenness metric. It is of great interest to further understand how greening indices can be used to determine separation of sites across an array of remotely sensed environmental attributes. In this prior work, there are highly localized minority clusters that were detected to be most dissimilar from the remaining clusters as determined by annual variation in remotely sensed LAI. The objective of this study is to discern what other environmental factors are important predictors of cluster allocation from the mentioned cluster analysis, and secondarily, to construct a predictive model that prioritizes minority clusters. A random forest classification is considered to examine the importance of various site attributes in predicting cluster allocation. To satisfy these objectives, I propose an application-specific process that integrates spatial sub-sampling and cross-validation to improve the interpretability and utility of random forests for spatially autocorrelated, highly-localized, and unbalanced class-size response variables. The final random forest model identifies that the cluster allocation, using only LAI, separates sites significantly across many other environmental attributes, and further that elevation, slope, and water storage potential are the most important predictors of cluster allocation. Most importantly, the class errors rates for the clusters that are most dissimilar, as detected by the cluster model, have the best misclassification rates which fulfills the secondary objective of aligning the priorities of a predictive model with a prior cluster model

    Overcoming data scarcity in earth science.

    Get PDF
    The Data Scarcity problem is repeatedly encountered in environmental research. This may induce an inadequate representation of the response?s complexity in any environmental system to any input/change (natural and human-induced). In such a case, before getting engaged with new expensive studies to gather and analyze additional data, it is reasonable first to understand what enhancement in estimates of system performance would result if all the available data could be well exploited. The purpose of this Special Issue, "Overcoming Data Scarcity in Earth Science" in the Data journal, is to draw attention to the body of knowledge that leads at improving the capacity of exploiting the available data to better represent, understand, predict, and manage the behavior of environmental systems at meaningful space-time scales. This Special Issue contains six publications (three research articles, one review, and two data descriptors) covering a wide range of environmental fields: geophysics, meteorology/climatology, ecology, water quality, and hydrology

    Impervious surface estimation using remote sensing images and gis : how accurate is the estimate at subdivision level?

    Get PDF
    Impervious surface has long been accepted as a key environmental indicator linking development to its impacts on water. Many have suggested that there is a direct correlation between degree of imperviousness and both quantity and quality of water. Quantifying the amount of impervious surface, however, remains difficult and tedious especially in urban areas. Lately more efforts have been focused on the application of remote sensing and GIS technologies in assessing the amount of impervious surface and many have reported promising results at various pixel levels. This paper discusses an attempt at estimating the amount of impervious surface at subdivision level using remote sensing images and GIS techniques. Using Landsat ETM+ images and GIS techniques, a regression tree model is first developed for estimating pixel imperviousness. GIS zonal functions are then used to estimate the amount of impervious surface for a sample of subdivisions. The accuracy of the model is evaluated by comparing the model-predicted imperviousness to digitized imperviousness at the subdivision level. The paper then concludes with a discussion on the convenience and accuracy of using the method to estimate imperviousness for large areas

    Smart Classifiers and Bayesian Inference for Evaluating River Sensitivity to Natural and Human Disturbances: A Data Science Approach

    Get PDF
    Excessive rates of channel adjustment and riverine sediment export represent societal challenges; impacts include: degraded water quality and ecological integrity, erosion hazards to infrastructure, and compromised public safety. The nonlinear nature of sediment erosion and deposition within a watershed and the variable patterns in riverine sediment export over a defined timeframe of interest are governed by many interrelated factors, including geology, climate and hydrology, vegetation, and land use. Human disturbances to the landscape and river networks have further altered these patterns of water and sediment routing. An enhanced understanding of river sediment sources and dynamics is important for stakeholders, and will become more critical under a nonstationary climate, as sediment yields are expected to increase in regions of the world that will experience increased frequency, persistence, and intensity of storm events. Practical tools are needed to predict sediment erosion, transport and deposition and to characterize sediment sources within a reasonable measure of uncertainty. Water resource scientists and engineers use multidimensional data sets of varying types and quality to answer management-related questions, and the temporal and spatial resolution of these data are growing exponentially with the advent of automated samplers and in situ sensors (i.e., “big data”). Data-driven statistics and classifiers have great utility for representing system complexity and can often be more readily implemented in an adaptive management context than process-based models. Parametric statistics are often of limited efficacy when applied to data of varying quality, mixed types (continuous, ordinal, nominal), censored or sparse data, or when model residuals do not conform to Gaussian distributions. Data-driven machine-learning algorithms and Bayesian statistics have advantages over Frequentist approaches for data reduction and visualization; they allow for non-normal distribution of residuals and greater robustness to outliers. This research applied machine-learning classifiers and Bayesian statistical techniques to multidimensional data sets to characterize sediment source and flux at basin, catchment, and reach scales. These data-driven tools enabled better understanding of: (1) basin-scale spatial variability in concentration-discharge patterns of instream suspended sediment and nutrients; (2) catchment-scale sourcing of suspended sediments; and (3) reach-scale sediment process domains. The developed tools have broad management application and provide insights into landscape drivers of channel dynamics and riverine solute and sediment export

    Predicting flood insurance claims with hydrologic and socioeconomic demographics via machine learning: exploring the roles of topography, minority populations, and political dissimilarity

    Get PDF
    Current research on flooding risk often focuses on understanding hazards, de-emphasizing the complex pathways of exposure and vulnerability. We investigated the use of both hydrologic and social demographic data for flood exposure mapping with Random Forest (RF) regression and classification algorithms trained to predict both parcel- and tract-level flood insurance claims within New York State, US. Topographic characteristics best described flood claim frequency, but RF prediction skill was improved at both spatial scales when socioeconomic data was incorporated. Substantial improvements occurred at the tract-level when the percentage of minority residents, housing stock value and age, and the political dissimilarity index of voting precincts were used to predict insurance claims. Census tracts with higher numbers of claims and greater densities of low-lying tax parcels tended to have low proportions of minority residents, newer houses, and less political similarity to state level government. We compared this data-driven approach and a physically-based pluvial flood routing model for prediction of the spatial extents of flooding claims in two nearby catchments of differing land use. The floodplain we defined with physically based modeling agreed well with existing federal flood insurance rate maps, but underestimated the spatial extents of historical claim generating areas. In contrast, RF classification incorporating hydrologic and socioeconomic demographic data likely overestimated the flood-exposed areas. Our research indicates that quantitative incorporation of social data can improve flooding exposure estimates

    A ranking of hydrological signatures based on their predictability in space

    Get PDF
    Hydrological signatures are now used for a wide range of purposes, including catchment classification, process exploration and hydrological model calibration. The recent boost in the popularity and number of signatures has however not been accompanied by the development of clear guidance on signature selection. Here we propose that exploring the predictability of signatures in space provides important insights into their drivers, their sensitivity to data uncertainties, and is hence useful for signature selection. We use three complementary approaches to compare and rank 15 commonly‐used signatures, which we evaluate in 671 US catchments from the CAMELS data set (Catchment Attributes and MEteorology for Large‐sample Studies). Firstly, we employ machine learning (random forests) to explore how attributes characterizing the climatic conditions, topography, land cover, soil and geology influence (or not) the signatures. Secondly, we use simulations of a conceptual hydrological model (Sacramento) to benchmark the random forest predictions. Thirdly, we take advantage of the large sample of CAMELS catchments to characterize the spatial auto‐correlation (using Moran's I) of the signature field. These three approaches lead to remarkably similar rankings of the signatures. We show i) that signatures with the noisiest spatial pattern tend to be poorly captured by hydrological simulations, ii) that their relationship to catchments attributes are elusive (in particular they are not correlated to climatic indices) and iii) that they are particularly sensitive to discharge uncertainties. We suggest that a better understanding of their drivers and better characterization of their uncertainties would increase their value in hydrological studies

    LONG-TERM FOREST GROWTH AND DECADAL SOIL CHEMISTRY CHANGE ACROSS THE WHITE MOUNTAIN NATIONAL FOREST

    Get PDF
    Forest ecosystems are subject to a variety of stressors including human land use, air pollution and climate change. A challenge for detecting temporal change, however, is disentangling heterogeneity at multiple spatial scales. Therefore, we need a better understanding of the mechanisms influencing forest growth and soil formation, how to improve existing long-term sample designs, and quantify variability at multiple spatial scales. Identifying the areal extent of bedrock outcrops and shallow soils has important implications in understanding the spatial dynamics of surrounding vegetation, stream chemistry gradients, and soil properties. Manual methods of delineating bedrock outcrops and associated shallow soil are still commonly employed in the northeastern US and retain numerous limitations associated with the geometry of polygon units. Chapter 2 objectives were to assess the accuracy of visually interpreted high-resolution relief maps for locating bedrock outcrops and associated shallow soil as well as automate the delineation soil using predictive analytics. Visual interpretation of lidar-derived 1 m shaded relief maps at Hubbard Brook Experimental Forest resulted in a 92% accuracy in distinguishing the presence of bedrock outcrops and shallow soil. A generalized additive model had 88.1% overall accuracy using independent validation data and 90.1% overall accuracy predicting bedrock outcrops-shallow soil presence and absence in a second validation watershed 16 km northwest. Chapter 3 objectives were to predict the asymptotic range of stand relative density and biomass as well as explore the influence of topographic metrics as proxies for site quality. Predicting the asymptotic range of stand relative density and biomass has important implications on silvicultural practices and understanding forest carbon pools in late successional forests across northeastern US. In addition, quantifying the influence of site quality on carrying capacity in a mixed species forest is a long-standing challenge and has not been thoroughly tested with long-term longitudinal data. Logistic and Chapman-Richards growth functions were fitted to eight decades of Bartlett Experimental Forest, New Hampshire inventory measurements from 1931-32, 1939-40, 1991-92, 2001-03, and 2015-17 as nonlinear mixed effects models. The variability associated with the plot-level random effects suggested broad differences in structure among the plots could be accounted for with topographic metrics. The variance components of predicted stand relative density and biomass asymptotes in this study influenced by a topographic covariate highlighted the importance of incorporating landscape-soil-water dynamics when characterizing stand dynamics and growth. Finally, Chapter 4 objectives were to investigate overall change in soil chemistry of mid-elevation, northern hardwood Spodosols across the White Mountain National Forest, calculate the variance components at multiple spatial scales, and stratify sampling sites by dominant hydrologic pathways to determine if groundwater influenced soils were more responsive to acidification recovery processes compared to soils developed vertically via unsaturated flow. Forty permanent plots were sampled across the White Mountain National Forest (WMNF), USA in 2001-02 and resampled in 2014. Paired t-tests detected significant increases in carbon and base cations and a decrease in Al in the Oa horizon while base cations decreased and Al increased in some mineral horizons. Additionally, within-site variability was comparable to overall variability across the WMNF. When study sites were stratified into hydrologic groups, we found a strong signal of increasing carbon and base cation concentrations from 2001-02 to 2014 for the Oa horizon, suggesting that soils influenced by shallow groundwater contributions from upslope were more responsive to acidification recovery than soils influenced only by vertical percolation. The combined approach to hydrologic stratification and estimating variance components simultaneously at the landscape and within-plot scales is crucial for calculating sample size needed to detect temporal change
    corecore