2 research outputs found

    Towards better process management in wastewater treatment plants : Process analytics based on SHAP values for tree-based machine learning methods

    No full text
    Understanding the mechanisms of pollutant removal in Wastewater Treatment Plants (WWTPs) is crucial for controlling effluent quality efficiently. However, the numerous treatment units, operational factors, and the underlying interactions between these units and factors usually obfuscate the comprehensive and precise understanding of the processes. We have previously proposed a machine learning (ML) framework to uncover complex cause-and-effect relationships in WWTPs. However, only one interpretable ML model, Random forest (RF), was studied and the interpretation method was not granular enough to reveal very detailed relationships between operational factors and effluent parameters. Thus, in this paper, we present an upgraded framework involving three interpretable tree-based models (RF, XGboost and LightGBM), three metrics (R2, Root mean squared error (RMSE), and Mean absolute error (MAE)) and a more advanced interpretation system SHapley Additive exPlanations (SHAP). Details of the framework are provided along with a demonstration of its practical applicability based on a case study of the Umeå WWTP in Sweden. Results show that, for both labels TSSe (Total suspended solids in effluent) and PO4e (Phosphate in effluent), the XGBoost models are optimal whereas the RF models are the least optimal, due to overfitting and polarized fitting. This study has yielded multiple new and significant findings with respect to the control of TSSe and PO4e in the Umeå WWTP and other similarly configured WWTPs. Additionally, this study has produced two important generic findings relating to ML applications for WWTPs (or even other process industries) in terms of cause-and-effect investigations. First, the model comparison should be carried out from multiple perspectives to ensure that underlying details are fully revealed and examined. Second, using a precise, robust, and granular (feature attribution available for individual instances) explanation method can bring extra insight into both model comparison and model interpretation. SHAP is recommended as we found it to be of great value in this study.EcoChang

    A machine learning framework to improve effluent quality control in wastewater treatment plants

    No full text
    Due to the intrinsic complexity of wastewater treatment plant (WWTP) processes, it is always challenging to respond promptly and appropriately to the dynamic process conditions in order to ensure the quality of the effluent, especially when operational cost is a major concern. Machine Learning (ML) methods have therefore been used to model WWTP processes in order to avoid various shortcomings of conventional mechanistic models. However, to the best of the authors' knowledge, no ML applications have focused on investigating how operational factors can affect effluent quality. Additionally, the time lags between process steps have always been neglected, making it difficult to explain the relationships between operational factors and effluent quality. Therefore, this paper presents a novel ML-based framework designed to improve effluent quality control in WWTPs by clarifying the relationships between operational variables and effluent parameters. The framework consists of Random Forest (RF) models, Deep Neural Network (DNN) models, Variable Importance Measure (VIM) analyses, and Partial Dependence Plot (PDP) analyses, and uses a novel approach to account for the impact of time lags between processes. Details of the framework are provided along with a demonstration of its practical applicability based on a case study of the Umeå WWTP in Sweden involving a large number of samples (105763) representing the full scale of the plant's operations. Two effluent parameters, Total Suspended Solids in effluent (TSSe) and Phosphate in effluent (PO4e), and thirty-two operational variables are studied. RF models are developed, validated using DNN models as references, and shown to be suitable for VIM and PDP analyses. VIM identifies the variables that most strongly influence TSSe and PO4e, while PDP elucidates their specific effects on TSSe and PO4e. The major findings are: (1) Influent temperature is the most influential variable for both TSSe and PO4e, but it affects them in different ways; (2) PO4e depends strongly on the TSS in aeration basins – higher TSS concentrations in aeration basins generally promote PO4 removal, but excess TSS can have negative effects; (3) In general, the impact of TSS in aeration basins on TSSe and PO4e increases with the distances of the basin from the merging outlet, so more attention should be paid to the TSS concentration in the third or fourth aeration basins than the first and second ones; (4) Returning excessive amounts of sludge through the second return sludge pipe should be avoided because of its adverse impact on TSSe removal. These results could support the development of more advanced control strategies to increase control precision and reduce running costs in the Umeå WWTP and other similarly configured WWTPs. The framework could also be applied to other parameters in WWTPs and industrial processes in general if sufficient high-resolution data are available
    corecore