10,674 research outputs found

    Ensemble decision tree models using RUSBoost for estimating risk of iron failure in drinking water distribution systems

    Get PDF
    Safe, trusted drinking water is fundamental to society. Discolouration is a key aesthetic indicator visible to customers. Investigations to understand discolouration and iron failures in water supply systems require assessment of large quantities of disparate, inconsistent, multidimensional data from multiple corporate systems. A comprehensive data matrix was assembled for a seven year period across the whole of a UK water company (serving three million people). From this a novel data driven tool for assessment of iron risk was developed based on a yearly update and ranking procedure, for a subset of the best quality data. To avoid a ‘black box’ output, and provide an element of explanatory (human readable) interpretation, classification decision trees were utilised. Due to the very limited number of iron failures, results from many weak learners were melded into one high-quality ensemble predictor using the RUSBoost algorithm which is designed for class imbalance. Results, exploring simplicity vs predictive power, indicate enough discrimination between variable relationships in the matrix to produce ensemble decision tree classification models with good accuracy for iron failure estimation at District Management Area (DMA) scale. Two model variants were explored: ‘Nowcast’ (situation at end of calendar year) and ‘Futurecast’ (predict end of next year situation from this year’s data). The Nowcast 2014 model achieved 100% True Positive Rate (TPR) and 95.3% True Negative Rate (TNR), with 3.3% of DMAs classified High Risk for un-sampled instances. The Futurecast 2014 achieved 60.5% TPR and 75.9% TNR, with 25.7% of DMAs classified High Risk for un-sampled instances. The output can be used to focus preventive measures to improve iron compliance

    Artificial neural networks as emulators of process-based models to analyse bathing water quality in estuaries

    Get PDF
    This study aims to provide a method for developing artificial neural networks in estuaries as emulators ofprocess-based models to analyse bathing water quality and its variability over time and space. Themethodology forecasts the concentration of faecal indicator organisms, integrating the accuracy andreliability offield measurements, the spatial and temporal resolution of process-based modelling, andthe decrease in computational costs by artificial neural networks whilst preserving the accuracy of re-sults. Thus, the overall approach integrates a coupled hydrodynamic-bacteriological model previouslycalibrated withfield data at the bathing sites into a low-order emulator by using artificial neural net-works, which are trained by the process-based model outputs. The application of the method to the EoEstuary, located on the northwestern coast of Spain, demonstrated that artificial neural networks areviable surrogates of highly nonlinear process-based models and highly variable forcings. The resultsshowed that the process-based model and the neural networks conveniently reproduced the measure-ments ofEscherichia coli(E. coli) concentrations, indicating a slightly betterfit for the process-basedmodel (R2¼0.87) than for the neural networks (R2¼0.83). This application also highlighted that dur-ing the model setup of both predictive tools, the computational time of the process-based approach was0.78 times lower than that of the artificial neural networks (ANNs) approach due to the additional timespent on ANN development. Conversely, the computational costs of forecasting are considerably reducedby the neural networks compared with the process-based model, with a decrease in hours of 25, 600,3900, and 31633 times for forecasting 1 h, 1 day, 1 month, and 1 bathing season, respectively. Therefore,the longer the forecasting period, the greater the reduction in computational time by artificial neuralnetworks

    Multivariate data mining for estimating the rate of discolouration material accumulation in drinking water distribution systems

    Get PDF
    Particulate material accumulates over time as cohesive layers on internal pipeline surfaces in water distribution systems (WDS). When mobilised, this material can cause discolouration. This paper explores factors expected to be involved in this accumulation process. Two complementary machine learning methodologies are applied to significant amounts of real world field data from both a qualitative and a quantitative perspective. First, Kohonen self-organising maps were used for integrative and interpretative multivariate data mining of potential factors affecting accumulation. Second, evolutionary polynomial regression (EPR), a hybrid data-driven technique, was applied that combines genetic algorithms with numerical regression for developing easily interpretable mathematical model expressions. EPR was used to explore producing novel simple expressions to highlight important accumulation factors. Three case studies are presented: UK national and two Dutch local studies. The results highlight bulk water iron concentration, pipe material and looped network areas as key descriptive parameters for the UK study. At the local level, a significantly increased third data set allowed K-fold cross validation. The mean cross validation coefficient of determination was 0.945 for training data and 0.930 for testing data for an equation utilising amount of material mobilised and soil temperature for estimating daily regeneration rate. The approach shows promise for developing transferable expressions usable for pro-active WDS management

    Addressing Uncertainty in TMDLS: Short Course at Arkansas Water Resources Center 2001 Annual Conference

    Get PDF
    Management of a critical natural resource like water requires information on the status of that resource. The US Environmental Protection Agency (EPA) reported in the 1998 National Water Quality Inventory that more than 291,000 miles of assessed rivers and streams and 5 million acres of lakes do not meet State water quality standards. This inventory represents a compilation of State assessments of 840,000 miles of rivers and 17.4 million acres of lakes; a 22 percent increase in river miles and 4 percent increase in lake acres over their 1996 reports. Siltation, bacteria, nutrients and metals were the leading pollutants of impaired waters, according to EPA. The sources of these pollutants were presumed to be runoff from agricultural lands and urban areas. EPA suggests that the majority of Americans-over 218 million-live within ten miles of a polluted waterbody. This seems to contradict the recent proclamations of the success of the Clean Water Act, the Nation\u27s water pollution control law. EPA also claims that, while water quality is still threatened in the US, the amount of water safe for fishing and swimming has doubled since 1972, and that the number of people served by sewage treatment plants has more than doubled

    Self-Organizing Maps For Knowledge Discovery From Corporate Databases To Develop Risk Based Prioritization For Stagnation 

    Get PDF
    Stagnation or low turnover of water within water distribution systems may result in water quality issues, even for relatively short durations of stagnation / low turnover if other factors such as deteriorated aging pipe infrastructure are present. As leakage management strategies, including the creation of smaller pressure management zones, are implemented increasingly more dead ends are being created within networks and hence potentially there is an increasing risk to water quality due to stagnation / low turnover. This paper presents results of applying data driven tools to the large corporate databases maintained by UK water companies. These databases include multiple information sources such as asset data, regulatory water quality sampling, customer complaints etc. A range of techniques exist for exploring the interrelationships between various types of variables, with a number of studies successfully using Artificial Neural Networks (ANNs) to probe complex data sets. Self Organising Maps (SOMs), are a class of unsupervised ANN that perform dimensionality reduction of the feature space to yield topologically ordered maps, have been used successfully for similar problems to that posed here. Notably for this application, SOM are trained without classes attached in an unsupervised fashion. Training combines competitive learning (learning the position of a data cloud) and co-operative learning (self-organising of neighbourhoods). Specifically, in this application SOMs performed multidimensional data analysis of a case study area (covering a town for an eight year period). The visual output of the SOM analysis provides a rapid and intuitive means of examining covariance between variables and exploring hypotheses for increased understanding. For example, water age (time from system entry, from hydraulic modelling) in combination with high pipe specific residence time and old cast iron pipe were found to be strong explanatory variables. This derived understanding could ultimately be captured in a tool providing risk based prioritisation scores

    Integration of artificial neural network and geographic information system applications in simulating groundwater quality

    Get PDF
    Background: Although experiments on water quality are time consuming and expensive, models are often employed as supplement to simulate water quality. Artificial neural network (ANN) is an efficient tool in hydrologic studies, yet it cannot predetermine its results in the forms of maps and geo-referenced data. Methods: In this study, ANN was applied to simulate groundwater quality and geographic information system (GIS) was used as pre-processing and post-processing tool in simulating water quality in the Mazandaran Plain (Caspian southern coasts, Iran). Groundwater quality was simulated using multilayer perceptron (MLP) network. The determination of groundwater quality index (GWQI) and the estimation of effective factors in groundwater quality were also undertaken. After modeling in ANN, the model validation was carried out. Also, the study area was divided with the pixels 1×1 km (raster format) in GIS medium. Then, the model input layers were combined and a raster layer which comprised the model inputs values and geographic coordinate was generated. Using geographic coordinate, the values of pixels (model inputs) were inputted into ANN (Neuro Solutions software). Groundwater quality was simulated using the validated optimum network in the sites without water quality experiments. In the next step, the results of ANN simulation were entered into GIS medium and groundwater quality map was generated based on the simulated results of ANN. Results: The results revealed that the integration of capabilities of ANN and GIS have high accuracy and efficiency in the simulation of groundwater quality. Conclusion: This method can be employed in an extensive area to simulate hydrologic parameters. Keywords: Water quality, GWQI, MLP, Mazandaran Plai

    INTEGRATED AQUIFER VULNERABILITY ASSESSMENT OF NITRATE CONTAMINATION IN CENTRAL INDIANA

    Get PDF
    Groundwater is not easily contaminated, but it is difficult to restore once contaminated. Therefore, groundwater management is important to prevent pollutants from reaching groundwater. A common step in developing groundwater management plans is assessment of aquifer risk using computational models. Groundwater modeling with a geographic information system (GIS) for efficient groundwater management can provide maps of regions where groundwater is contaminated or may be vulnerable and also can help select the optimal number of groundwater monitoring locations

    Representing local dynamics within water resource systems through a data-driven emulation approach

    Get PDF
    Growing population and socio-economic activities along with looming effects of climate change have led to enormous pressures on water resource systems. To diagnose and quantify potential vulnerabilities, effective tools are required to represent the interactions between limited water availability and competing water demands across a range of spatial and temporal scales. Despite significant progresses in integrated modeling of water resource systems, the majority of existing models are still unable to fully describe the contemplating dynamics within and between elements of water resource systems across all relevant scales and/or variables. Here, a data-driven approach is suggested to represent local details of a water resource system through emulating an existing water resource system model, in which these details have been missed. This is through advising a set of interconnected functional mappings, i.e. integrated emulators, parameterized using the simulation results of the existing model at a common scale and/or variable but can support process representation with finer resolution and/or details. The proposed approach is applied to a complex water resource system in Southern Alberta, Canada, to provide a detailed understanding of the system’s dynamics at the Oldman Reservoir, which is the key to provision of effective water resource management in this semi-arid and already stressed cold region. By proposing a rigorous setup/falsification procedure, a set of alternative hypotheses for emulators describing the local dynamics of local irrigation demand and withdrawals along with reservoir release and evaporation is developed. Findings show that emulators formed using Artificial Neural Networks mainly outperform simpler emulators developed for the variables considered. The non-falsified emulators are then coupled to represent the local dynamics of the water resource system at the reservoir location, considering the underlying interplays with hydro-climatological conditions and human decision on the irrigation area. It is found that emulators with input variables identified through expert knowledge can outperform fully data-driven emulators in which proxies were selected based on an input variable selection method. The top non-falsified coupled models are able to capture the dynamic of lake evaporation, water withdrawal, irrigation demand, reservoir release and storage with coefficient of determination of 0.80 to 0.82, 0.45 to 0.55, 0.52 to 0.59, 0.98 to 0.99 and 0.72 to 0.88, respectively. The practical utility of the proposed approach is demonstrated through an impact assessment study by analysing four performance criteria, corresponding to reservoir’s storage, local irrigation demand, number of spill events and median reservoir release, in three stress-tests. These stress tests asses the local sensitivity of water resource system at the Oldman reservoir at three different levels, corresponding to (1) changing incoming streamflow to the basin in a bottom-up approach; (2) joint scenario of changing streamflow and warming climate, using a coupled bottom-up/top-down approach; and (3) specific changes in incoming streamflow, climate and irrigation area in a heuristic approach. For the first experimentation, weekly realizations for possible water availability are stochastically reconstructed and fed into the top non-falsified integrated emulator. By defining warm/dry, historical and cold/wet flow conditions, we found through alteration from dry to wet regime condition, the expected number of low storage duration is not changed, and expected annual water deficit is declined. Moreover, the expected number of spill events increases whereas median reservoir release increases. In the next impact assessment study, different scenarios of warming climate obtained from NASA-NEX downscaled global climate projections and the joint impact of changing streamflow and temperature on the system’s behaviour is evaluated. This assessment demonstrated that in warmer climate, the expected number of low storage duration in dry condition increases whereas in historical and wet conditions, the low storage duration does not change. In addition, the expected annual water deficit increases while the expected number of spill events decreases in the three flow regime conditions. Moreover, the expected median reservoir release increases in the dry, historical and wet regime conditions. In the final level of assessment, vulnerability of the system under changing streamflow, climate including temperature and precipitation and changing irrigation area is assessed. Results show that increasing irrigation area combined with declining inflow can considerably increase the duration of low reservoir storage in the Oldman Reservoir. Increasing temperature can lead to decline in both reservoir storage and outflow. In addition, when combined with declining inflow, increasing temperature can severely increase the annual water deficit for irrigation sector. Furthermore, it is noted that although the performance of unfalsified models are identical in representing the dynamics of the Oldman Reservoir under the historical data, but assessment can be slightly to moderately different depending on the defined scenarios of change. This is due to the choice of model configuration and can address the uncertainty regarding the system’s behaviour. Our study shows the promise of data-driven emulation approach as a tool for developing more enhanced water resource system models to face emerging management problems in the era of change
    • …
    corecore