Machine learning methods for detecting and correcting data errors in water level telemetry systems

Abstract

Water level data from telemetry stations can be used for early warning to prevent risk situations, such as floods and droughts. However, there is a possibility that the equipment in the telemetry station may fail, which will lead to errors in the data, resulting in false alarms or no warning of true alarms. Manually examining data is time-consuming and require expertise. As a result, the automated system is required. There are several algorithms available for detecting and correcting anomalous data, but the question remains as to which algorithm would be most suitable for telemetry data. To investigate and identify such an algorithm, statistical models, machine learning models, deep learning models, and reinforcement learning models are implemented and evaluated. For anomaly detection, we first evaluated statistical models using our modified sliding window algorithm called Only Normal Sliding Windows (ONSW) to assess their performance. We then proposed Deep Reinforcement Learning (DRL) models and compared them to Deep Learning models to determine their suitability for the task. Additionally, we developed a feature extraction approach that combines the saliency map and nearest neighbor extracted feature (SM+NNFE) to improve model performance. Various ensemble approaches were also implemented and compared to other competitive methods. For data imputation, we developed the Full Subsequence Matching (FSM) technique, which fills in missing values by imitating values from the most similar subsequence. Based on the results, machine learning models with ONSW are the best option for identifying abnormalities in telemetry water level data. Additionally, a deep reinforcement learning model could be used to identify abnormalities in crucial stations requiring further attention. Regarding data imputation, our technique outperforms other competitive approaches when dealing with water level data influenced by tides. However, relying solely on a single or limited number of models may be risky, as their performance could deteriorate in the future without being realized. Therefore, building models using ensemble techniques is a viable option for reducing errors caused by this issue

    Similar works