2,103 research outputs found

    A survey of machine learning methods applied to anomaly detection on drinking-water quality data

    Get PDF
    Abstract: Traditional machine learning (ML) techniques such as support vector machine, logistic regression, and artificial neural network have been applied most frequently in water quality anomaly detection tasks. This paper presents a review of progress and advances made in detecting anomalies in water quality data using ML techniques. The review encompasses both traditional ML and deep learning (DL) approaches. Our findings indicate that: 1) Generally, DL approaches outperform traditional ML techniques in terms of feature learning accuracy and fewer false positive rates. However, is difficult to make a fair comparison between studies because of different datasets, models and parameters employed. 2) We notice that despite advances made and the advantages of the extreme learning machine (ELM), application of ELM is sparsely exploited in this domain. This study also proposes a hybrid DL-ELM framework as a possible solution that could be investigated further and used to detect anomalies in water quality data

    LSTM Networks for Detection and Classification of Anomalies in Raw Sensor Data

    Get PDF
    In order to ensure the validity of sensor data, it must be thoroughly analyzed for various types of anomalies. Traditional machine learning methods of anomaly detections in sensor data are based on domain-specific feature engineering. A typical approach is to use domain knowledge to analyze sensor data and manually create statistics-based features, which are then used to train the machine learning models to detect and classify the anomalies. Although this methodology is used in practice, it has a significant drawback due to the fact that feature extraction is usually labor intensive and requires considerable effort from domain experts. An alternative approach is to use deep learning algorithms. Research has shown that modern deep neural networks are very effective in automated extraction of abstract features from raw data in classification tasks. Long short-term memory networks, or LSTMs in short, are a special kind of recurrent neural networks that are capable of learning long-term dependencies. These networks have proved to be especially effective in the classification of raw time-series data in various domains. This dissertation systematically investigates the effectiveness of the LSTM model for anomaly detection and classification in raw time-series sensor data. As a proof of concept, this work used time-series data of sensors that measure blood glucose levels. A large number of time-series sequences was created based on a genuine medical diabetes dataset. Anomalous series were constructed by six methods that interspersed patterns of common anomaly types in the data. An LSTM network model was trained with k-fold cross-validation on both anomalous and valid series to classify raw time-series sequences into one of seven classes: non-anomalous, and classes corresponding to each of the six anomaly types. As a control, the accuracy of detection and classification of the LSTM was compared to that of four traditional machine learning classifiers: support vector machines, Random Forests, naive Bayes, and shallow neural networks. The performance of all the classifiers was evaluated based on nine metrics: precision, recall, and the F1-score, each measured in micro, macro and weighted perspective. While the traditional models were trained on vectors of features, derived from the raw data, that were based on knowledge of common sources of anomaly, the LSTM was trained on raw time-series data. Experimental results indicate that the performance of the LSTM was comparable to the best traditional classifiers by achieving 99% accuracy in all 9 metrics. The model requires no labor-intensive feature engineering, and the fine-tuning of its architecture and hyper-parameters can be made in a fully automated way. This study, therefore, finds LSTM networks an effective solution to anomaly detection and classification in sensor data

    Detection of water quality failure events at treatment works using a hybrid two-stage method with CUSUM and random forest algorithms

    Get PDF
    This is the final version. Available from IWA Publishing via the DOI in this record. Data cannot be made publicly available; readers should contact the corresponding author for details.Near-real-time event detection is crucial for water utilities to be able to detect failure events in their water treatment works (WTW) quickly and efficiently. This paper presents a new method for an automated, near-real-time recognition of failure events at WTWs by the application of combined statistical process control and machine-learning techniques. The resulting novel hybrid CUSUM event recognition system (HC-ERS) uses two distinct detection methodologies: one for fault detection at the level of individual water quality signals and the second for the recognition of faulty processes at the WTW level. HC-ERS was tested and validated on historical failure events at a real-life UK WTW. The new methodology proved to be effective in the detection of failure events, achieving a high true-detection rate of 82% combined with a low false-alarm rate (average 0.3 false alarms per week), reaching a peak F1 score of 84% as a measure of accuracy. The new method also demonstrated higher accuracy compared with the CANARY detection methodology. When applied to real-world data, the HC-ERS method showed the capability to detect faulty processes at WTW automatically and reliably, and hence potential for practical application in the water industry.Engineering and Physical Sciences Research Council (EPSRC

    MSFC Skylab Orbital Workshop, volume 5

    Get PDF
    The various programs involved in the development of the Skylab Orbital Workshop are discussed. The subjects considered include the following: (1) reliability program, (2) system safety program, (3) testing program, (4) engineering program management, (5) mission operations support, and (6) aerospace applications

    End-to-end anomaly detection in stream data

    Get PDF
    Nowadays, huge volumes of data are generated with increasing velocity through various systems, applications, and activities. This increases the demand for stream and time series analysis to react to changing conditions in real-time for enhanced efficiency and quality of service delivery as well as upgraded safety and security in private and public sectors. Despite its very rich history, time series anomaly detection is still one of the vital topics in machine learning research and is receiving increasing attention. Identifying hidden patterns and selecting an appropriate model that fits the observed data well and also carries over to unobserved data is not a trivial task. Due to the increasing diversity of data sources and associated stochastic processes, this pivotal data analysis topic is loaded with various challenges like complex latent patterns, concept drift, and overfitting that may mislead the model and cause a high false alarm rate. Handling these challenges leads the advanced anomaly detection methods to develop sophisticated decision logic, which turns them into mysterious and inexplicable black-boxes. Contrary to this trend, end-users expect transparency and verifiability to trust a model and the outcomes it produces. Also, pointing the users to the most anomalous/malicious areas of time series and causal features could save them time, energy, and money. For the mentioned reasons, this thesis is addressing the crucial challenges in an end-to-end pipeline of stream-based anomaly detection through the three essential phases of behavior prediction, inference, and interpretation. The first step is focused on devising a time series model that leads to high average accuracy as well as small error deviation. On this basis, we propose higher-quality anomaly detection and scoring techniques that utilize the related contexts to reclassify the observations and post-pruning the unjustified events. Last but not least, we make the predictive process transparent and verifiable by providing meaningful reasoning behind its generated results based on the understandable concepts by a human. The provided insight can pinpoint the anomalous regions of time series and explain why the current status of a system has been flagged as anomalous. Stream-based anomaly detection research is a principal area of innovation to support our economy, security, and even the safety and health of societies worldwide. We believe our proposed analysis techniques can contribute to building a situational awareness platform and open new perspectives in a variety of domains like cybersecurity, and health

    Smart Urban Water Networks

    Get PDF
    This book presents the paper form of the Special Issue (SI) on Smart Urban Water Networks. The number and topics of the papers in the SI confirm the growing interest of operators and researchers for the new paradigm of smart networks, as part of the more general smart city. The SI showed that digital information and communication technology (ICT), with the implementation of smart meters and other digital devices, can significantly improve the modelling and the management of urban water networks, contributing to a radical transformation of the traditional paradigm of water utilities. The paper collection in this SI includes different crucial topics such as the reliability, resilience, and performance of water networks, innovative demand management, and the novel challenge of real-time control and operation, along with their implications for cyber-security. The SI collected fourteen papers that provide a wide perspective of solutions, trends, and challenges in the contest of smart urban water networks. Some solutions have already been implemented in pilot sites (i.e., for water network partitioning, cyber-security, and water demand disaggregation and forecasting), while further investigations are required for other methods, e.g., the data-driven approaches for real time control. In all cases, a new deal between academia, industry, and governments must be embraced to start the new era of smart urban water systems

    Behavioral modeling for anomaly detection in industrial control systems

    Get PDF
    In 1990s, industry demanded the interconnection of corporate and production networks. Thus, Industrial Control Systems (ICSs) evolved from 1970s proprietary and close hardware and software to nowadays Commercial Off-The-Shelf (COTS) devices. Although this transformation carries several advantages, such as simplicity and cost-efficiency, the use of COTS hardware and software implies multiple Information Technology vulnerabilities. Specially tailored worms like Stuxnet, Duqu, Night Dragon or Flame showed their potential to damage and get information about ICSs. Anomaly Detection Systems (ADSs), are considered suitable security mechanisms for ICSs due to the repetitiveness and static architecture of industrial processes. ADSs base their operation in behavioral models that require attack-free training data or an extensive description of the process for their creation. This thesis work proposes a new approach to analyze binary industrial protocols payloads and automatically generate behavioral models synthesized in rules. In the same way, through this work we develop a method to generate realistic network traffic in laboratory conditions without the need for a real ICS installation. This contribution establishes the basis of future ADS as well as it could support experimentation through the recreation of realistic traffic in simulated environments. Furthermore, a new approach to correct delay and jitter issues is proposed. This proposal improves the quality of time-based ADSs by reducing the false positive rate. We experimentally validate the proposed approaches with several statistical methods, ADSs quality measures and comparing the results with traffic taken from a real installation. We show that a payload-based ADS is possible without needing to understand the payload data, that the generation of realistic network traffic in laboratory conditions is feasible and that delay and jitter correction improves the quality of behavioral models. As a conclusion, the presented approaches provide both, an ADS able to work with private industrial protocols, together with a method to create behavioral models for open ICS protocols which does not requite training data.90. hamarkadan industriak sare korporatibo eta industrialen arteko konexioa eskatu zuen. Horrela, Kontrol Sistema Industrialak (KSI) 70. hamarkadako hardware eta software jabedun eta itxitik gaur eguneko gailu estandarretara egin zuten salto. Eraldaketa honek hainbat onura ekarri baditu ere, era berean gailu estandarren erabilerak hainbat Informazio Teknologietako (IT) zaurkortasun ekarri ditu. Espezialki diseinatutako zizareek, Stuxnet, Duque, Night Dragon eta Flame esaterako, ondorio latzak gauzatu eta informazioa lapurtzean beraien potentzia erakutsi dute. Anomalia Detekzio Sistemak (ADS) KSI-etako segurtasun mekanismo egoki bezala kontsideraturik daude, azken hauen errepikakortasun eta arkitektura estatikoa dela eta. ADS-ak erasorik gabeko datu garbietan ikasitako edo prozesuen deskripzio sakona behar duten jarrera modeloetan oinarritzen dira. Tesi honek protokolo industrial binarioak aztertu eta automatikoki jarrera modeloak sortu eta erregeletan sintetizatzen dituen ikuspegia proposatzen du. Era berean lan honen bidez laborategi kondizioetan sare trafiko errealista sortzeko metodo bat aurkezten da, KSI-rik behar ez duena. Ekarpen honek etorkizuneko ADS baten oinarriak finkatzen ditu, baita esperimentazioa bultzatu ere simulazio inguruneetan sare trafiko errealista sortuz. Gainera, atzerapen eta sortasun arazoak hobetzen dituen ekarpen berri bat egiten da. Ekarpen honek denboran oinarritutako ADS-en kalitatea hobetzen du, positibo faltsuen ratioa jaitsiz. Esperimentazio bidez ekarpen ezberdinak balioztatu dira, hainbat metodo estatistiko, ADS-en kalitate neurri eta trafiko erreal eta simulatuak alderatuz. Datu erabilgarriak ulertzeko beharrik gabeko ADS-ak posible direla demostratu dugu, trafiko errealista laborategi kondizioetan sortzea posible dela eta atzerapen eta sortasunaren zuzenketak jarrera modeloen kalitatea hobetzen dutela. Ondorio bezala, protokolo industrial pribatuekin lan egiteko ADS bat eta jarrera modeloa sortzeko entrenamendu daturik behar ez duen eta KSI-en protokolo irekiekin lan egiteko gai den metodoa aurkeztu dira.En los años 90, la industria proclamó la interconexión de las redes corporativas y los de producción. Así, los Sistemas de Control Industrial (SCI) evolucionaron desde el hardware y software propietario de los 70 hasta los dispositivos comunes de hoy en día. Incluso si esta adopción implicó diversas ventajas, como el uso de hardware y software comunes, conlleva múltiples vulnerabilidades. Gusanos especialmente desarrollados como Stuxnet, Duqu, Night Dragon y Flame mostraron su potencial para causar daños y obtener información. Los Sistemas de Detección de Anomalías (SDA) están considerados como mecanismos de seguridad apropiados para los SCI debido a la repetitividad y la arquitectura estática de los procesos industriales. Los SDA basan su operación en modelos de comportamiento que requieren datos libres de ataque o extensas descripciones de proceso para su creación. Esta tesis propone un nuevo enfoque para el análisis de los datos de la carga útil del tráfico de protocolos industriales binarios y la generación automática de modelos de comportamiento sintetizados en reglas. Así mismo, mediante este trabajo se ha desarrollado un método para generar tráfico de red realista en condiciones de laboratorio sin la necesidad de instalaciones SCI reales. Esta contribución establece las bases de un futuro SDA así como el respaldo a la experimentación mediante la recreación de tráfico realista en entornos simulados. Además, se ha propuesto un nuevo enfoque para la corrección de retraso y latencia. Esta propuesta mejora la calidad del SDA basados en tiempo reduciendo el ratio de falsos positivos. Mediante la experimentación se han validado los enfoques propuestos utilizando algunos métodos estadísticos, medidas de calidad de SDA y comparando los resultados con tráfico obtenido a partir de instalaciones reales. Se ha demostrado que son posibles los SDA basados en carga útil sin la necesidad de entender el contenido de la carga, que la generación de tráfico realista en condiciones de laboratorio es posible y que la corrección del retraso y la latencia mejoran la calidad de los modelos de comportamiento. Como conclusión, las propuestas presentadas proporcionan un SDA capaz de trabajar con protocolos privados de control industrial a la vez que un método para la creación de modelos de comportamiento para SCI sin la necesidad de datos de entrenamiento
    corecore