2 research outputs found

    Data-driven models for predicting microbial water quality in the drinking water source using E. coli monitoring and hydrometeorological data

    Get PDF
    Rapid changes in microbial water quality in surface waters pose challenges for production of safe drinking water. If not treated to an acceptable level, microbial pathogens present in the drinking water can result in severe consequences for public health. The aim of this paper was to evaluate the suitability of data-driven models of different complexity for predicting the concentrations of E. coli in the river G\uf6ta \ue4lv at the water intake of the drinking water treatment plant in Gothenburg, Sweden. The objectives were to (i) assess how the complexity of the model affects the model performance; and (ii) identify relevant factors and assess their effect as predictors of E. coli levels. To forecast E. coli levels one day ahead, the data on laboratory measurements of E. coli and total coliforms, Colifast measurements of E. coli, water temperature, turbidity, precipitation, and water flow were used. The baseline approaches included Exponential Smoothing and ARIMA (Autoregressive Integrated Moving Average), which are commonly used univariate methods, and a naive baseline that used the previous observed value as its next prediction. Also, models common in the machine learning domain were included: LASSO (Least Absolute Shrinkage and Selection Operator) Regression and Random Forest, and a tool for optimising machine learning pipelines – TPOT (Tree-based Pipeline Optimization Tool). Also, a multivariate autoregressive model VAR (Vector Autoregression) was included. The models that included multiple predictors performed better than univariate models. Random Forest and TPOT resulted in higher performance but showed a tendency of overfitting. Water temperature, microbial concentrations upstream and at the water intake, and precipitation upstream were shown to be important predictors. Data-driven modelling enables water producers to interpret the measurements in the context of what concentrations can be expected based on the recent historic data, and thus identify unexplained deviations warranting further investigation of their origin

    Comparison of Adaptive Neuro-Fuzzy Inference System (ANFIS) and Gaussian Process for Machine Learning (GPML) Algorithms for the Prediction of Norovirus Concentration in Drinking Water Supply

    No full text
    Monitoring of Norovirus in drinking water supply is a complicated, rather expensive, process. Norovirus represent a leading cause of acute gastroenteritis in most developed countries. Modeling of general microbial occurrence in drinking water is a very active field of study and provides reliable information for predicting microbial risks in drinking water. In this work, adaptive neuro-fuzzy inference system (ANFIS) and Gaussian Process for Machine Learning (GPML) are proposed as predicting models for the total number of Norovirus in raw surface water in terms of water quality parameters such as water pH, turbidity, conductivity, temperature and rain. The predictive models were based on data from Nødre Romrike Vannverk water treatment plant in Oslo, Norway. Based on the model performance indices used in this study, the GPML model showed comparable accuracy to the ANFIS model. However, the ANFIS model generally demonstrated more superior prediction ability of the number of Norovirus in drinking water, with lower MSE and MAE values relative to the GPML model. In addition, the ability of the ANFIS model to explain potential effects of interactions among the water quality variables on the number of Norovirus in the raw water makes the technique more efficient for use in water quality modeling
    corecore