37 research outputs found

    Self-Organizing Maps For Knowledge Discovery From Corporate Databases To Develop Risk Based Prioritization For Stagnation 

    Get PDF
    Stagnation or low turnover of water within water distribution systems may result in water quality issues, even for relatively short durations of stagnation / low turnover if other factors such as deteriorated aging pipe infrastructure are present. As leakage management strategies, including the creation of smaller pressure management zones, are implemented increasingly more dead ends are being created within networks and hence potentially there is an increasing risk to water quality due to stagnation / low turnover. This paper presents results of applying data driven tools to the large corporate databases maintained by UK water companies. These databases include multiple information sources such as asset data, regulatory water quality sampling, customer complaints etc. A range of techniques exist for exploring the interrelationships between various types of variables, with a number of studies successfully using Artificial Neural Networks (ANNs) to probe complex data sets. Self Organising Maps (SOMs), are a class of unsupervised ANN that perform dimensionality reduction of the feature space to yield topologically ordered maps, have been used successfully for similar problems to that posed here. Notably for this application, SOM are trained without classes attached in an unsupervised fashion. Training combines competitive learning (learning the position of a data cloud) and co-operative learning (self-organising of neighbourhoods). Specifically, in this application SOMs performed multidimensional data analysis of a case study area (covering a town for an eight year period). The visual output of the SOM analysis provides a rapid and intuitive means of examining covariance between variables and exploring hypotheses for increased understanding. For example, water age (time from system entry, from hydraulic modelling) in combination with high pipe specific residence time and old cast iron pipe were found to be strong explanatory variables. This derived understanding could ultimately be captured in a tool providing risk based prioritisation scores

    Self-Organizing Maps For Knowledge Discovery From Corporate Databases To Develop Risk Based Prioritization For Stagnation

    Full text link
    Stagnation or low turnover of water within water distribution systems may result in water quality issues, even for relatively short durations of stagnation / low turnover if other factors such as deteriorated aging pipe infrastructure are present. As leakage management strategies, including the creation of smaller pressure management zones, are implemented increasingly more dead ends are being created within networks and hence potentially there is an increasing risk to water quality due to stagnation / low turnover. This paper presents results of applying data driven tools to the large corporate databases maintained by UK water companies. These databases include multiple information sources such as asset data, regulatory water quality sampling, customer complaints etc. A range of techniques exist for exploring the interrelationships between various types of variables, with a number of studies successfully using Artificial Neural Networks (ANNs) to probe complex data sets. Self Organising Maps (SOMs), are a class of unsupervised ANN that perform dimensionality reduction of the feature space to yield topologically ordered maps, have been used successfully for similar problems to that posed here. Notably for this application, SOM are trained without classes attached in an unsupervised fashion. Training combines competitive learning (learning the position of a data cloud) and co-operative learning (self-organising of neighbourhoods). Specifically, in this application SOMs performed multidimensional data analysis of a case study area (covering a town for an eight year period). The visual output of the SOM analysis provides a rapid and intuitive means of examining covariance between variables and exploring hypotheses for increased understanding. For example, water age (time from system entry, from hydraulic modelling) in combination with high pipe specific residence time and old cast iron pipe were found to be strong explanatory variables. This derived understanding could ultimately be captured in a tool providing risk based prioritisation scores

    Prévision de la profondeur de la nappe phréatique d'un champ de canneberges à l'aide de deux approches de modélisation des arbres de décision

    Get PDF
    La gestion intégrée de l’eau souterraine constitue un défi majeur pour les activités industrielles, agricoles et domestiques. Dans certains systèmes agricoles, une gestion optimisée de la nappe phréatique représente un facteur important pour améliorer les rendements des cultures et l’utilisation de l'eau. La prévision de la profondeur de la nappe phréatique (PNP) devient l’une des stratégies utiles pour planifier et gérer en temps réel l’eau souterraine. Cette étude propose une approche de modélisation basée sur les arbres de décision pour prédire la PNP en fonction des précipitations, des précédentes PNP et de l'évapotranspiration pour la gestion de l’eau souterraine des champs de canneberges. Premièrement, deux modèles: « Random Forest (RF) » et « Extreme Gradient Boosting (XGB) » ont été paramétrisés et comparés afin de prédirela PNP jusqu'à 48 heures. Deuxièmement, l’importance des variables prédictives a été déterminée pour analyser leur influence sur la simulation de PNP. Les mesures de PNP de trois puits d'observation dans un champ de canneberges, pour la période de croissance du 8 juillet au 30 août 2017, ont été utilisées pour entraîner et valider les modèles. Des statistiques tels que l’erreur quadratique moyenne, le coefficient de détermination et le coefficient d’efficacité de Nash-Sutcliffe sont utilisés pour mesurer la performance des modèles. Les résultats montrent que l'algorithme XGB est plus performant que le modèle RF pour prédire la PNP et est sélectionné comme le modèle optimal. Parmi les variables prédictives, les valeurs précédentes de PNP étaient les plus importantes pour la simulation de PNP, suivie par la précipitation. L’erreur de prédiction du modèle optimal pour la plage de PNP était de ± 5 cm pour les simulations de 1, 12, 24, 36 et 48 heures. Le modèle XGB fournit des informations utiles sur la dynamique de PNP et une simulation rigoureuse pour la gestion de l’irrigation des canneberges.Integrated ground water management is a major challenge for industrial, agricultural and domestic activities. In some agricultural production systems, optimized water table management represents a significant factor to improve crop yields and water use. Therefore, predicting water table depth (WTD) becomes an important means to enable real-time planning and management of groundwater resources. This study proposes a decision-tree-based modelling approach for WTD forecasting as a function of precipitation, previous WTD values and evapotranspiration with applications in groundwater resources management for cranberry farming. Firstly, two models-based decision trees, namely Random Forest (RF) and Extrem Gradient Boosting (XGB), were parameterized and compared to predict the WTD up to 48-hours ahead for a cranberry farm located in Québec, Canada. Secondly, the importance of the predictor variables was analyzed to determine their influence on WTD simulation results. WTD measurements at three observation wells within acranberry field, for the growing period from July 8, 2017 to August 30, 2017, were used for training and testing the models. Statistical parameters such as the mean squared error, coefficient of determination and Nash-Sutcliffe efficiency coefficient were used to measure models performance. The results show that the XGB algorithm outperformed the RF model for predictions of WTD and was selected as the optimal model. Among the predictor variables, the antecedent WTD was the most important for water table depth simulation, followed by the precipitation. Base on the most important variables and optimal model, the prediction error for entire WTD range was within ± 5 cm for 1-, 12-, 24-, 26-and 48-hour prediction. The XGB model can provide useful information on the WTD dynamics and a rigorous simulation for irrigation planning and management in cranberry fields

    Ensemble decision tree models using RUSBoost for estimating risk of iron failure in drinking water distribution systems

    Get PDF
    Safe, trusted drinking water is fundamental to society. Discolouration is a key aesthetic indicator visible to customers. Investigations to understand discolouration and iron failures in water supply systems require assessment of large quantities of disparate, inconsistent, multidimensional data from multiple corporate systems. A comprehensive data matrix was assembled for a seven year period across the whole of a UK water company (serving three million people). From this a novel data driven tool for assessment of iron risk was developed based on a yearly update and ranking procedure, for a subset of the best quality data. To avoid a ‘black box’ output, and provide an element of explanatory (human readable) interpretation, classification decision trees were utilised. Due to the very limited number of iron failures, results from many weak learners were melded into one high-quality ensemble predictor using the RUSBoost algorithm which is designed for class imbalance. Results, exploring simplicity vs predictive power, indicate enough discrimination between variable relationships in the matrix to produce ensemble decision tree classification models with good accuracy for iron failure estimation at District Management Area (DMA) scale. Two model variants were explored: ‘Nowcast’ (situation at end of calendar year) and ‘Futurecast’ (predict end of next year situation from this year’s data). The Nowcast 2014 model achieved 100% True Positive Rate (TPR) and 95.3% True Negative Rate (TNR), with 3.3% of DMAs classified High Risk for un-sampled instances. The Futurecast 2014 achieved 60.5% TPR and 75.9% TNR, with 25.7% of DMAs classified High Risk for un-sampled instances. The output can be used to focus preventive measures to improve iron compliance

    Artificial neural network to estimate an index of water quality

    Get PDF
    The artificial neural network (RNA) is a computational model that emulates the biological neural system in information processing. The originating models are suitable for the purpose of describing long-term specifics, in addition to nonlinear relationships. This tool is used to predict physical chemical and microbiological parameters that influence water quality. The United States National Sanitation Foundation proposed a water quality index, known as the NSF WQI. This article describes the design, training and use of the three-layer neural perceptron neural model for the calculation of the NSF WQI of the Utcubamba River and its tributaries. Using the Matlab software and applying the Levenberg-Marquardt training algorithm, the optimal RNA architecture was found to be 6-12-1, plus the percentage for the training, validation, and test sets of 70 %, 10 %, and 20 % respectively. RNA performance has been evaluated using the root of the root mean square error (RMSE) and the correlation coefficient (R). High correlations (greater than 0.94) were made between the measured and predicted values. Finally, the RNA proposal offers a useful alternative for the calculation and prediction of the water quality index in relation to dissolved oxygen (DO), biochemical demand for oxygen (BOD), nitrates, fecal coliforms, potential for hydrogen ions (pH) and turbidity

    Predicting water allocation trade prices using a hybrid Artificial Neural Network-Bayesian modelling approach

    Get PDF
    This paper proposes an integrated (hybrid) Artificial Neural Network-Bayesian (ANN-B) modelling approach to improve the accuracy of predicting seasonal water allocation prices in Australia’s Murry Irrigation Area, which is part of one of the world’s largest interconnected water markets. Three models (basic, intermediate and full), accommodating different levels of data availability, were considered. Data were analyzed using both ANN and hybrid ANN-B approaches. Using the ANN-B modelling approach, which can simulate complex and non-linear processes, water allocation prices were predicted with a high degree of accuracy (RBASIC = 0.93, RINTER. = 0.96 and RFULL = 0.99); this was a higher level of accuracy than realized using ANN. This approach can potentially be integrated with online data systems to predict water allocation prices, enable better water allocation trade decisions, and improve the productivity and profitability of irrigated agriculture

    On how data are partitioned in model development and evaluation: Confronting the elephant in the room to enhance model generalization

    Get PDF
    Models play a pivotal role in advancing our understanding of Earth\u27s physical nature and environmental systems, aiding in their efficient planning and management. The accuracy and reliability of these models heavily rely on data, which are generally partitioned into subsets for model development and evaluation. Surprisingly, how this partitioning is done is often not justified, even though it determines what model we end up with, how we assess its performance and what decisions we make based on the resulting model outputs. In this study, we shed light on the paramount importance of meticulously considering data partitioning in the model development and evaluation process, and its significant impact on model generalization. We identify flaws in existing data-splitting approaches and propose a forward-looking strategy to effectively confront the “elephant in the room”, leading to improved model generalization capabilities

    Multivariate data mining for estimating the rate of discolouration material accumulation in drinking water distribution systems

    Get PDF
    Particulate material accumulates over time as cohesive layers on internal pipeline surfaces in water distribution systems (WDS). When mobilised, this material can cause discolouration. This paper explores factors expected to be involved in this accumulation process. Two complementary machine learning methodologies are applied to significant amounts of real world field data from both a qualitative and a quantitative perspective. First, Kohonen self-organising maps were used for integrative and interpretative multivariate data mining of potential factors affecting accumulation. Second, evolutionary polynomial regression (EPR), a hybrid data-driven technique, was applied that combines genetic algorithms with numerical regression for developing easily interpretable mathematical model expressions. EPR was used to explore producing novel simple expressions to highlight important accumulation factors. Three case studies are presented: UK national and two Dutch local studies. The results highlight bulk water iron concentration, pipe material and looped network areas as key descriptive parameters for the UK study. At the local level, a significantly increased third data set allowed K-fold cross validation. The mean cross validation coefficient of determination was 0.945 for training data and 0.930 for testing data for an equation utilising amount of material mobilised and soil temperature for estimating daily regeneration rate. The approach shows promise for developing transferable expressions usable for pro-active WDS management
    corecore