31 research outputs found

    Distributed Real-time Anomaly Detection in Networked Industrial Sensing Systems

    No full text
    Reliable real-time sensing plays a vital role in ensuring the reliability and safety of industrial cyber-physical systems (CPSs) such as wireless sensor and actuator networks. For many reasons, such as harsh industrial environments, fault-prone sensors, or malicious attacks, sensor readings may be abnormal or faulty. This could lead to serious system performance degradation or even catastrophic failure. Current anomaly detection approaches are either centralized and complicated or restricted due to strict assumptions, which are not suitable for practical large-scale networked industrial sensing systems (NISSs), where sensing devices are connected via digital communications, such as wireless sensor networks or smart grid systems. In this paper, we introduce a fully distributed general anomaly detection (GAD) scheme, which uses graph theory and exploits spatiotemporal correlations of physical processes to carry out real-time anomaly detection for general large-scale NISSs. We formally prove the scalability of our GAD approach and evaluate the performance of GAD for two industrial applications: building structure monitoring and smart grids. Extensive trace-driven simulations validate our theoretical analysis and demonstrate that our approach can significantly outperform state-of-the-art approaches in terms of detection accuracy and efficiency

    Network anomaly detection research: a survey

    Get PDF
    Data analysis to identifying attacks/anomalies is a crucial task in anomaly detection and network anomaly detection itself is an important issue in network security. Researchers have developed methods and algorithms for the improvement of the anomaly detection system. At the same time, survey papers on anomaly detection researches are available. Nevertheless, this paper attempts to analyze futher and to provide alternative taxonomy on anomaly detection researches focusing on methods, types of anomalies, data repositories, outlier identity and the most used data type. In addition, this paper summarizes information on application network categories of the existing studies

    Lightweight Anomaly Detection Scheme Using Incremental Principal Component Analysis and Support Vector Machine

    Get PDF
    Wireless Sensors Networks have been the focus of significant attention from research and development due to their applications of collecting data from various fields such as smart cities, power grids, transportation systems, medical sectors, military, and rural areas. Accurate and reliable measurements for insightful data analysis and decision-making are the ultimate goals of sensor networks for critical domains. However, the raw data collected by WSNs usually are not reliable and inaccurate due to the imperfect nature of WSNs. Identifying misbehaviours or anomalies in the network is important for providing reliable and secure functioning of the network. However, due to resource constraints, a lightweight detection scheme is a major design challenge in sensor networks. This paper aims at designing and developing a lightweight anomaly detection scheme to improve efficiency in terms of reducing the computational complexity and communication and improving memory utilization overhead while maintaining high accuracy. To achieve this aim, oneclass learning and dimension reduction concepts were used in the design. The One-Class Support Vector Machine (OCSVM) with hyper-ellipsoid variance was used for anomaly detection due to its advantage in classifying unlabelled and multivariate data. Various One-Class Support Vector Machine formulations have been investigated and Centred-Ellipsoid has been adopted in this study due to its effectiveness. Centred-Ellipsoid is the most effective kernel among studies formulations. To decrease the computational complexity and improve memory utilization, the dimensions of the data were reduced using the Candid Covariance-Free Incremental Principal Component Analysis (CCIPCA) algorithm. Extensive experiments were conducted to evaluate the proposed lightweight anomaly detection scheme. Results in terms of detection accuracy, memory utilization, computational complexity, and communication overhead show that the proposed scheme is effective and efficient compared few existing schemes evaluated. The proposed anomaly detection scheme achieved the accuracy higher than 98%, with O(nd) memory utilization and no communication overhead

    Outlier detection in wireless sensor network based on time series approach

    Get PDF
    Sensory data inWireless Sensor Network (WSN) is not always reliable because of open environmental factors such as noise, weak received signal strength or intrusion attacks. The process of detecting highly noisy data and noisy sensor node is called outlier detection. Outlier detection is one of the fundamental tasks of time series analysis that relates to predictive modeling, cluster analysis and association analysis. It has been widely researched in various disciplines besides WSN. The challenge of noise detection in WSN is when it has to be done inside a sensor with limited computational and communication capabilities. Furthermore, there are only a few outlier detection techniques in WSNs and there are no algorithms to detect outliers on real data with high level of accuracy locally and select the most effective neighbors for collaborative detection globally. Hence, this research designed a local and global time series outlier detection in WSN. The Local Outlier Detection Algorithm (LODA) as a decentralized noise detection algorithm runs on each sensor node by identifying intrinsic features, determining the memory size of data histogram to accomplish effective available memory, and making classification for predicting outlier data was developed. Next, the Global Outlier Detection Algorithm (GODA)was developed using adaptive Gray Coding and Entropy techniques for best neighbor selection for spatial correlation amongst sensor nodes. Beside GODA also adopts Adaptive Random Forest algorithm for best results. Finally, this research developed a Compromised SensorNode Detection Algorithm (CSDA) as a centralized algorithm processed at the base station for detecting compromised sensor nodes regardless of specific cause of the anomalies. To measure the effectiveness and accuracy of these algorithms, a comprehensive scenario was simulated. Noisy data were injected into the data randomly and the sensor nodes. The results showed that LODA achieved 89% accuracy in the prediction of the outliers, GODA detected anomalies up to 99% accurately and CSDA identified accurately up to 80% of the sensor nodes that have been compromised. In conclusion, the proposed algorithms have proven the anomaly detection locally and globally, and compromised sensor node detection in WSN

    Performance Evaluation of Network Anomaly Detection Systems

    Get PDF
    Nowadays, there is a huge and growing concern about security in information and communication technology (ICT) among the scientific community because any attack or anomaly in the network can greatly affect many domains such as national security, private data storage, social welfare, economic issues, and so on. Therefore, the anomaly detection domain is a broad research area, and many different techniques and approaches for this purpose have emerged through the years. Attacks, problems, and internal failures when not detected early may badly harm an entire Network system. Thus, this thesis presents an autonomous profile-based anomaly detection system based on the statistical method Principal Component Analysis (PCADS-AD). This approach creates a network profile called Digital Signature of Network Segment using Flow Analysis (DSNSF) that denotes the predicted normal behavior of a network traffic activity through historical data analysis. That digital signature is used as a threshold for volume anomaly detection to detect disparities in the normal traffic trend. The proposed system uses seven traffic flow attributes: Bits, Packets and Number of Flows to detect problems, and Source and Destination IP addresses and Ports, to provides the network administrator necessary information to solve them. Via evaluation techniques, addition of a different anomaly detection approach, and comparisons to other methods performed in this thesis using real network traffic data, results showed good traffic prediction by the DSNSF and encouraging false alarm generation and detection accuracy on the detection schema. The observed results seek to contribute to the advance of the state of the art in methods and strategies for anomaly detection that aim to surpass some challenges that emerge from the constant growth in complexity, speed and size of today’s large scale networks, also providing high-value results for a better detection in real time.Atualmente, existe uma enorme e crescente preocupação com segurança em tecnologia da informação e comunicação (TIC) entre a comunidade científica. Isto porque qualquer ataque ou anomalia na rede pode afetar a qualidade, interoperabilidade, disponibilidade, e integridade em muitos domínios, como segurança nacional, armazenamento de dados privados, bem-estar social, questões econômicas, e assim por diante. Portanto, a deteção de anomalias é uma ampla área de pesquisa, e muitas técnicas e abordagens diferentes para esse propósito surgiram ao longo dos anos. Ataques, problemas e falhas internas quando não detetados precocemente podem prejudicar gravemente todo um sistema de rede. Assim, esta Tese apresenta um sistema autônomo de deteção de anomalias baseado em perfil utilizando o método estatístico Análise de Componentes Principais (PCADS-AD). Essa abordagem cria um perfil de rede chamado Assinatura Digital do Segmento de Rede usando Análise de Fluxos (DSNSF) que denota o comportamento normal previsto de uma atividade de tráfego de rede por meio da análise de dados históricos. Essa assinatura digital é utilizada como um limiar para deteção de anomalia de volume e identificar disparidades na tendência de tráfego normal. O sistema proposto utiliza sete atributos de fluxo de tráfego: bits, pacotes e número de fluxos para detetar problemas, além de endereços IP e portas de origem e destino para fornecer ao administrador de rede as informações necessárias para resolvê-los. Por meio da utilização de métricas de avaliação, do acrescimento de uma abordagem de deteção distinta da proposta principal e comparações com outros métodos realizados nesta tese usando dados reais de tráfego de rede, os resultados mostraram boas previsões de tráfego pelo DSNSF e resultados encorajadores quanto a geração de alarmes falsos e precisão de deteção. Com os resultados observados nesta tese, este trabalho de doutoramento busca contribuir para o avanço do estado da arte em métodos e estratégias de deteção de anomalias, visando superar alguns desafios que emergem do constante crescimento em complexidade, velocidade e tamanho das redes de grande porte da atualidade, proporcionando também alta performance. Ainda, a baixa complexidade e agilidade do sistema proposto contribuem para que possa ser aplicado a deteção em tempo real

    Machine Learning for Internet of Things Data Analysis: A Survey

    Get PDF
    Rapid developments in hardware, software, and communication technologies have facilitated the emergence of Internet-connected sensory devices that provide observations and data measurements from the physical world. By 2020, it is estimated that the total number of Internet-connected devices being used will be between 25 and 50 billion. As these numbers grow and technologies become more mature, the volume of data being published will increase. The technology of Internet-connected devices, referred to as Internet of Things (IoT), continues to extend the current Internet by providing connectivity and interactions between the physical and cyber worlds. In addition to an increased volume, the IoT generates big data characterized by its velocity in terms of time and location dependency, with a variety of multiple modalities and varying data quality. Intelligent processing and analysis of this big data are the key to developing smart IoT applications. This article assesses the various machine learning methods that deal with the challenges presented by IoT data by considering smart cities as the main use case. The key contribution of this study is the presentation of a taxonomy of machine learning algorithms explaining how different techniques are applied to the data in order to extract higher level information. The potential and challenges of machine learning for IoT data analytics will also be discussed. A use case of applying a Support Vector Machine (SVM) to Aarhus smart city traffic data is presented for a more detailed exploration

    Distributed Anomaly Detection Using Minimum Volume Elliptical Principal Component Analysis

    Get PDF
    Principal component analysis and the residual error is an effective anomaly detection technique. In an environment where anomalies are present in the training set, the derived principal components can be skewed by the anomalies. A further aspect of anomaly detection is that data might be distributed across different nodes in a network and their communication to a centralized processing unit is prohibited due to communication cost. Current solutions to distributed anomaly detection rely on a hierarchical network infrastructure to aggregate data or models; however, in this environment, links close to the root of the tree become critical and congested. In this paper, an algorithm is proposed that is more robust in its derivation of the principal components of a training set containing anomalies. A distributed form of the algorithm is then derived where each node in a network can iterate towards the centralized solution by exchanging small matrices with neighboring nodes. Experimental evaluations on both synthetic and real-world data sets demonstrate the superior performance of the proposed approach in comparison to principal component analysis and alternative anomaly detection techniques. In addition, it is shown that in a variety of network infrastructures, the distributed form of the anomaly detection model is able to derive a close approximation of the centralized model

    Improved hybrid teaching learning based optimization-jaya and support vector machine for intrusion detection systems

    Get PDF
    Most of the currently existing intrusion detection systems (IDS) use machine learning algorithms to detect network intrusion. Machine learning algorithms have widely been adopted recently to enhance the performance of IDSs. While the effectiveness of some machine learning algorithms in detecting certain types of network intrusion has been ascertained, the situation remains that no single method currently exists that can achieve consistent results when employed for the detection of multiple attack types. Hence, the detection of network attacks on computer systems has remain a relevant field of research for some time. The support vector machine (SVM) is one of the most powerful machine learning algorithms with excellent learning performance characteristics. However, SVM suffers from many problems, such as high rates of false positive alerts, as well as low detection rates of rare but dangerous attacks that affects its performance; feature selection and parameters optimization are important operations needed to increase the performance of SVM. The aim of this work is to develop an improved optimization method for IDS that can be efficient and effective in subset feature selection and parameters optimization. To achieve this goal, an improved Teaching Learning-Based Optimization (ITLBO) algorithm was proposed in dealing with subset feature selection. Meanwhile, an improved parallel Jaya (IPJAYA) algorithm was proposed for searching the best parameters (C, Gama) values of SVM. Hence, a hybrid classifier called ITLBO-IPJAYA-SVM was developed in this work for the improvement of the efficiency of network intrusion on data sets that contain multiple types of attacks. The performance of the proposed approach was evaluated on NSL-KDD and CICIDS intrusion detection datasets and from the results, the proposed approaches exhibited excellent performance in the processing of large datasets. The results also showed that SVM optimization algorithm achieved accuracy values of 0.9823 for NSL-KDD dataset and 0.9817 for CICIDS dataset, which were higher than the accuracy of most of the existing paradigms for classifying network intrusion detection datasets. In conclusion, this work has presented an improved optimization algorithm that can improve the accuracy of IDSs in the detection of various types of network attack

    Distributed anomaly detection models for industrial wireless sensor networks

    Get PDF
    Wireless Sensor Networks (WSNs) are firmly established as an integral technology that enables automation and control through pervasive monitoring for many industrial applications. These range from environmental applications and healthcare applications to major industrial monitoring applications such as infrastructure and structural monitoring. The key features that are common to such applications can be noted as involving large amounts of data, consisting of dynamic observation environments, non-homogeneous data distributions with evolving patterns and sensing functionality leading to data-driven control. Also in most industrial applications a major requirement is to have near real-time decision support. Accordingly there is a vital need to have a secure continuous and reliable sensing mechanism in integrated WSNs where integrity of the data is assured. However, in practice WSNs are vulnerable to different security attacks, faults and malfunction due to inherent resource constraints, openly commoditised wireless technologies employed and naive modes of implementation. Misbehaviour resulting from such threats manifest as anomalies in the sensed data streams in critically compromising the systems. Therefore, it is vital that effective techniques are introduced in accurately detecting anomalies and assuring the integrity of the data. This research focuses on investigating such models for large scale industrial wireless sensor networks. Focusing on achieving an anomaly detection framework that is adaptable and scalable, a hierarchical data partitioning approach with fuzzy data modelling is introduced first. In this model unsupervised data partitioning is performed in a distributed manner by adapting fuzzy c-means clustering in an incremental model over a hierarchical node topology. It is found that non-parametric and non-probabilistic determination of anomalies can be done by evaluating the fuzzy membership scores and inter-cluster distances adaptively over the node hierarchy. Considering heterogeneous data distributions with evolving patterns, a granular anomaly detection model that uses an entropy criterion to dynamically partition the data is proposed next. This successfully overcomes the issue of determining the proper number of expected clusters in a dynamic manner. In this approach the data is partitioned on to different cohesive regions using cumulative point-wise entropy directly. The effect of differential density distributions when relying on an entropy criterion is mitigated by introducing an average relative density measure to segregate isolated outliers prior to the partitioning. The combination of these two factors is shown to be significantly successful in determining anomalies adaptively in a fully dynamic manner. The need for near real-time anomaly evaluation is focused next on this thesis. Building upon the entropy based data partitioning model that is also proposed, a Point-of-View (PoV) entropy evaluation model is developed next. This employs an incremental data processing model as opposed to batch-wise data processing. Three unique points-of-view are introduced as the reference points over which point-wise entropy is computed in evaluating its relative change as the data streams evolve. Overall this thesis proposes efficient unsupervised anomaly detection models that employ distributed in-network data processing for accurate determination of anomalies. The resource constrained environment is taken in to account in each of the models with innovations made to achieve non-parametric and non-probabilistic detection
    corecore