2,226 research outputs found

    AI Solutions for MDS: Artificial Intelligence Techniques for Misuse Detection and Localisation in Telecommunication Environments

    Get PDF
    This report considers the application of Articial Intelligence (AI) techniques to the problem of misuse detection and misuse localisation within telecommunications environments. A broad survey of techniques is provided, that covers inter alia rule based systems, model-based systems, case based reasoning, pattern matching, clustering and feature extraction, articial neural networks, genetic algorithms, arti cial immune systems, agent based systems, data mining and a variety of hybrid approaches. The report then considers the central issue of event correlation, that is at the heart of many misuse detection and localisation systems. The notion of being able to infer misuse by the correlation of individual temporally distributed events within a multiple data stream environment is explored, and a range of techniques, covering model based approaches, `programmed' AI and machine learning paradigms. It is found that, in general, correlation is best achieved via rule based approaches, but that these suffer from a number of drawbacks, such as the difculty of developing and maintaining an appropriate knowledge base, and the lack of ability to generalise from known misuses to new unseen misuses. Two distinct approaches are evident. One attempts to encode knowledge of known misuses, typically within rules, and use this to screen events. This approach cannot generally detect misuses for which it has not been programmed, i.e. it is prone to issuing false negatives. The other attempts to `learn' the features of event patterns that constitute normal behaviour, and, by observing patterns that do not match expected behaviour, detect when a misuse has occurred. This approach is prone to issuing false positives, i.e. inferring misuse from innocent patterns of behaviour that the system was not trained to recognise. Contemporary approaches are seen to favour hybridisation, often combining detection or localisation mechanisms for both abnormal and normal behaviour, the former to capture known cases of misuse, the latter to capture unknown cases. In some systems, these mechanisms even work together to update each other to increase detection rates and lower false positive rates. It is concluded that hybridisation offers the most promising future direction, but that a rule or state based component is likely to remain, being the most natural approach to the correlation of complex events. The challenge, then, is to mitigate the weaknesses of canonical programmed systems such that learning, generalisation and adaptation are more readily facilitated

    NeuDetect: A neural network data mining system for wireless network intrusion detection

    Get PDF
    This thesis proposes an Intrusion Detection System, NeuDetect, which applies Neural Network technique to wireless network packets captured through hardware sensors for purposes of real time detection of anomalous packets. To address the problem of high false alarm rate confronted by the current wireless intrusion detection systems, this thesis presents a method of applying the artificial neural networks technique to the wireless network intrusion detection system. The proposed system solution approach is to find normal and anomalous patterns on preprocessed wireless packet records by comparing them with training data using Back-propagation algorithm. An anomaly score is assigned to each packet by calculating the difference between the output error and threshold. If the anomaly score is positive then the wireless packet is flagged as anomalous and is negative then the packet is flagged as normal. If the anomaly score is zero or close to zero it will be flagged as an unknown attack and will be sent back to training process for re-evaluation

    Statistical anomaly denial of service and reconnaissance intrusion detection

    Get PDF
    This dissertation presents the architecture, methods and results of the Hierarchical Intrusion Detection Engine (HIDE) and the Reconnaissance Intrusion Detection System (RIDS); the former is denial-of-service (DoS) attack detector while the latter is a scan and probe (P&S) reconnaissance detector; both are statistical anomaly systems. The HIDE is a packet-oriented, observation-window using, hierarchical, multi-tier, anomaly based network intrusion detection system, which monitors several network traffic parameters simultaneously, constructs a 64-bin probability density function (PDF) for each, statistically compares it to a reference PDF of normal behavior using a similarity metric, then combines the results into an anomaly status vector that is classified by a neural network classifier. Three different data sets have been utilized to test the performance of HIDE; they are OPNET simulation data, DARPA\u2798 intrusion detection evaluation data and the CONEX TESTBED attack data. The results showed that HIDE can reliably detect DoS attacks with high accuracy and very low false alarm rates on all data sets. In particular, the investigation using the DARPA\u2798 data set yielded an overall total misclassification rate of 0.13%, false negative rate of 1.42%, and false positive rate of 0.090%; the latter implies a rate of only about 2.6 false alarms per day. The RIDS is a session oriented, statistical tool, that relies on training to model the parameters of its algorithms, capable of detecting even distributed stealthy reconnaissance attacks. It consists of two main functional modules or stages: the Reconnaissance Activity Profiler (RAP) and the Reconnaissance Alert Correlater (RAC). The RAP is a session-oriented module capable of detecting stealthy scanning and probing attacks, while the RAG is an alert-correlation module that fuses the RAP alerts into attack scenarios and discovers the distributed stealthy attack scenarios. RIDS has been evaluated against two data sets: (a) the DARPA\u2798 data, and (b) 3 weeks of experimental data generated using the CONEX TESTBED network. The RIDS has demonstrably achieved remarkable success; the false positive, false negative and misclassification rates found are low, less than 0.1%, for most reconnaissance attacks; they rise to about 6% for distributed highly stealthy attacks; the latter is a most challenging type of attack, which has been difficult to detect effectively until now

    Oil and Gas flow Anomaly Detection on offshore naturally flowing wells using Deep Neural Networks

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceThe Oil and Gas industry, as never before, faces multiple challenges. It is being impugned for being dirty, a pollutant, and hence the more demand for green alternatives. Nevertheless, the world still has to rely heavily on hydrocarbons, since it is the most traditional and stable source of energy, as opposed to extensively promoted hydro, solar or wind power. Major operators are challenged to produce the oil more efficiently, to counteract the newly arising energy sources, with less of a climate footprint, more scrutinized expenditure, thus facing high skepticism regarding its future. It has to become greener, and hence to act in a manner not required previously. While most of the tools used by the Hydrocarbon E&P industry is expensive and has been used for many years, it is paramount for the industry’s survival and prosperity to apply predictive maintenance technologies, that would foresee potential failures, making production safer, lowering downtime, increasing productivity and diminishing maintenance costs. Many efforts were applied in order to define the most accurate and effective predictive methods, however data scarcity affects the speed and capacity for further experimentations. Whilst it would be highly beneficial for the industry to invest in Artificial Intelligence, this research aims at exploring, in depth, the subject of Anomaly Detection, using the open public data from Petrobras, that was developed by experts. For this research the Deep Learning Neural Networks, such as Recurrent Neural Networks with LSTM and GRU backbones, were implemented for multi-class classification of undesirable events on naturally flowing wells. Further, several hyperparameter optimization tools were explored, mainly focusing on Genetic Algorithms as being the most advanced methods for such kind of tasks. The research concluded with the best performing algorithm with 2 stacked GRU and the following vector of hyperparameters weights: [1, 47, 40, 14], which stand for timestep 1, number of hidden units 47, number of epochs 40 and batch size 14, producing F1 equal to 0.97%. As the world faces many issues, one of which is the detrimental effect of heavy industries to the environment and as result adverse global climate change, this project is an attempt to contribute to the field of applying Artificial Intelligence in the Oil and Gas industry, with the intention to make it more efficient, transparent and sustainable

    Spatial-Temporal Data Mining for Ocean Science: Data, Methodologies, and Opportunities

    Full text link
    With the increasing amount of spatial-temporal~(ST) ocean data, numerous spatial-temporal data mining (STDM) studies have been conducted to address various oceanic issues, e.g., climate forecasting and disaster warning. Compared with typical ST data (e.g., traffic data), ST ocean data is more complicated with some unique characteristics, e.g., diverse regionality and high sparsity. These characteristics make it difficult to design and train STDM models. Unfortunately, an overview of these studies is still missing, hindering computer scientists to identify the research issues in ocean while discouraging researchers in ocean science from applying advanced STDM techniques. To remedy this situation, we provide a comprehensive survey to summarize existing STDM studies in ocean. Concretely, we first summarize the widely-used ST ocean datasets and identify their unique characteristics. Then, typical ST ocean data quality enhancement techniques are discussed. Next, we classify existing STDM studies for ocean into four types of tasks, i.e., prediction, event detection, pattern mining, and anomaly detection, and elaborate the techniques for these tasks. Finally, promising research opportunities are highlighted. This survey will help scientists from the fields of both computer science and ocean science have a better understanding of the fundamental concepts, key techniques, and open challenges of STDM in ocean

    Application of Hierarchical Temporal Memory to Anomaly Detection of Vital Signs for Ambient Assisted Living

    Get PDF
    This thesis presents the development of a framework for anomaly detection of vital signs for an Ambient Assisted Living (AAL) health monitoring scenario. It is driven by spatiotemporal reasoning of vital signs that Cortical Learning Algorithms (CLA) based on Hierarchal Temporal Memory (HTM) theory undertakes in an AAL health monitoring scenario to detect anomalous data points preceding cardiac arrest. This thesis begins with a literature review on the existing Ambient intelligence (AmI) paradigm, AAL technologies and anomaly detection algorithms used in a health monitoring scenario. The research revealed the significance of the temporal and spatial reasoning in the vital signs monitoring as the spatiotemporal patterns of vital signs provide a basis to detect irregularities in the health status of elderly people. The HTM theory is yet to be adequately deployed in an AAL health monitoring scenario. Hence HTM theory, network and core operations of the CLA are explored. Despite the fact that standard implementation of the HTM theory comprises of a single-level hierarchy, multiple vital signs, specifically the correlation between them is not sufficiently considered. This insufficiency is of particular significance considering that vital signs are correlated in time and space, which are used in the health monitoring applications for diagnosis and prognosis tasks. This research proposes a novel framework consisting of multi-level HTM networks. The lower level consists of four models allocated to the four vital signs, Systolic Blood Pressure (SBP), Diastolic Blood Pressure (DBP), Heart Rate (HR) and peripheral capillary oxygen saturation (SpO2) in order to learn the spatiotemporal patterns of each vital sign. Additionally, a higher level is introduced to learn spatiotemporal patterns of the anomalous data point detected from the four vital signs. The proposed hierarchical organisation improves the model’s performance by using the semantically improved representation of the sensed data because patterns learned at each level of the hierarchy are reused when combined in novel ways at higher levels. To investigate and evaluate the performance of the proposed framework, several data selection techniques are studied, and accordingly, a total record of 247 elderly patients is extracted from the MIMIC-III clinical database. The performance of the proposed framework is evaluated and compared against several state-of-the-art anomaly detection algorithms using both online and traditional metrics. The proposed framework achieved 83% NAB score which outperforms the HTM and k-NN algorithms by 15%, the HBOS and INFLO SVD by 16% and the k-NN PCA by 21% while the SVM scored 34%. The results prove that multiple HTM networks can achieve better performance when dealing with multi-dimensional data, i.e. data collected from more than one source/sensor

    Detecting Anomalies From Big Data System Logs

    Get PDF
    Nowadays, big data systems (e.g., Hadoop and Spark) are being widely adopted by many domains for offering effective data solutions, such as manufacturing, healthcare, education, and media. A common problem about big data systems is called anomaly, e.g., a status deviated from normal execution, which decreases the performance of computation or kills running programs. It is becoming a necessity to detect anomalies and analyze their causes. An effective and economical approach is to analyze system logs. Big data systems produce numerous unstructured logs that contain buried valuable information. However manually detecting anomalies from system logs is a tedious and daunting task. This dissertation proposes four approaches that can accurately and automatically analyze anomalies from big data system logs without extra monitoring overhead. Moreover, to detect abnormal tasks in Spark logs and analyze root causes, we design a utility to conduct fault injection and collect logs from multiple compute nodes. (1) Our first method is a statistical-based approach that can locate those abnormal tasks and calculate the weights of factors for analyzing the root causes. In the experiment, four potential root causes are considered, i.e., CPU, memory, network, and disk I/O. The experimental results show that the proposed approach is accurate in detecting abnormal tasks as well as finding the root causes. (2) To give a more reasonable probability result and avoid ad-hoc factor weights calculating, we propose a neural network approach to analyze root causes of abnormal tasks. We leverage General Regression Neural Network (GRNN) to identify root causes for abnormal tasks. The likelihood of reported root causes is presented to users according to the weighted factors by GRNN. (3) To further improve anomaly detection by avoiding feature extraction, we propose a novel approach by leveraging Convolutional Neural Networks (CNN). Our proposed model can automatically learn event relationships in system logs and detect anomaly with high accuracy. Our deep neural network consists of logkey2vec embeddings, three 1D convolutional layers, a dropout layer, and max pooling. According to our experiment, our CNN-based approach has better accuracy compared to other approaches using Long Short-Term Memory (LSTM) and Multilayer Perceptron (MLP) on detecting anomaly in Hadoop DistributedFile System (HDFS) logs. (4) To analyze system logs more accurately, we extend our CNN-based approach with two attention schemes to detect anomalies in system logs. The proposed two attention schemes focus on different features from CNN\u27s output. We evaluate our approaches with several benchmarks, and the attention-based CNN model shows the best performance among all state-of-the-art methods

    Deep graph learning for anomalous citation detection

    Get PDF
    Anomaly detection is one of the most active research areas in various critical domains, such as healthcare, fintech, and public security. However, little attention has been paid to scholarly data, that is, anomaly detection in a citation network. Citation is considered as one of the most crucial metrics to evaluate the impact of scientific research, which may be gamed in multiple ways. Therefore, anomaly detection in citation networks is of significant importance to identify manipulation and inflation of citations. To address this open issue, we propose a novel deep graph learning model, namely graph learning for anomaly detection (GLAD), to identify anomalies in citation networks. GLAD incorporates text semantic mining to network representation learning by adding both node attributes and link attributes via graph neural networks (GNNs). It exploits not only the relevance of citation contents, but also hidden relationships between papers. Within the GLAD framework, we propose an algorithm called Citation PUrpose (CPU) to discover the purpose of citation based on citation context. The performance of GLAD is validated through a simulated anomalous citation dataset. Experimental results demonstrate the effectiveness of GLAD on the anomalous citation detection task. © 2012 IEEE
    • …
    corecore