1,139 research outputs found

    Hoeffding Tree Algorithms for Anomaly Detection in Streaming Datasets: A Survey

    Get PDF
    This survey aims to deliver an extensive and well-constructed overview of using machine learning for the problem of detecting anomalies in streaming datasets. The objective is to provide the effectiveness of using Hoeffding Trees as a machine learning algorithm solution for the problem of detecting anomalies in streaming cyber datasets. In this survey we categorize the existing research works of Hoeffding Trees which can be feasible for this type of study into the following: surveying distributed Hoeffding Trees, surveying ensembles of Hoeffding Trees and surveying existing techniques using Hoeffding Trees for anomaly detection. These categories are referred to as compositions within this paper and were selected based on their relation to streaming data and the flexibility of their techniques for use within different domains of streaming data. We discuss the relevance of how combining the techniques of the proposed research works within these compositions can be used to address the anomaly detection problem in streaming cyber datasets. The goal is to show how a combination of techniques from different compositions can solve a prominent problem, anomaly detection

    A Hierarchical Temporal Memory Sequence Classifier for Streaming Data

    Get PDF
    Real-world data streams often contain concept drift and noise. Additionally, it is often the case that due to their very nature, these real-world data streams also include temporal dependencies between data. Classifying data streams with one or more of these characteristics is exceptionally challenging. Classification of data within data streams is currently the primary focus of research efforts in many fields (i.e., intrusion detection, data mining, machine learning). Hierarchical Temporal Memory (HTM) is a type of sequence memory that exhibits some of the predictive and anomaly detection properties of the neocortex. HTM algorithms conduct training through exposure to a stream of sensory data and are thus suited for continuous online learning. This research developed an HTM sequence classifier aimed at classifying streaming data, which contained concept drift, noise, and temporal dependencies. The HTM sequence classifier was fed both artificial and real-world data streams and evaluated using the prequential evaluation method. Cost measures for accuracy, CPU-time, and RAM usage were calculated for each data stream and compared against a variety of modern classifiers (e.g., Accuracy Weighted Ensemble, Adaptive Random Forest, Dynamic Weighted Majority, Leverage Bagging, Online Boosting ensemble, and Very Fast Decision Tree). The HTM sequence classifier performed well when the data streams contained concept drift, noise, and temporal dependencies, but was not the most suitable classifier of those compared against when provided data streams did not include temporal dependencies. Finally, this research explored the suitability of the HTM sequence classifier for detecting stalling code within evasive malware. The results were promising as they showed the HTM sequence classifier capable of predicting coding sequences of an executable file by learning the sequence patterns of the x86 EFLAGs register. The HTM classifier plotted these predictions in a cardiogram-like graph for quick analysis by reverse engineers of malware. This research highlights the potential of HTM technology for application in online classification problems and the detection of evasive malware

    A Systematic Review of Learning based Notion Change Acceptance Strategies for Incremental Mining

    Get PDF
    The data generated contemporarily from different communication environments is dynamic in content different from the earlier static data environments. The high speed streams have huge digital data transmitted with rapid context changes unlike static environments where the data is mostly stationery. The process of extracting, classifying, and exploring relevant information from enormous flowing and high speed varying streaming data has several inapplicable issues when static data based strategies are applied. The learning strategies of static data are based on observable and established notion changes for exploring the data whereas in high speed data streams there are no fixed rules or drift strategies existing beforehand and the classification mechanisms have to develop their own learning schemes in terms of the notion changes and Notion Change Acceptance by changing the existing notion, or substituting the existing notion, or creating new notions with evaluation in the classification process in terms of the previous, existing, and the newer incoming notions. The research in this field has devised numerous data stream mining strategies for determining, predicting, and establishing the notion changes in the process of exploring and accurately predicting the next notion change occurrences in Notion Change. In this context of feasible relevant better knowledge discovery in this paper we have given an illustration with nomenclature of various contemporarily affirmed models of benchmark in data stream mining for adapting the Notion Change

    Automating Cyberdeception Evaluation with Deep Learning

    Get PDF
    A machine learning-based methodology is proposed and implemented for conducting evaluations of cyberdeceptive defenses with minimal human involvement. This avoids impediments associated with deceptive research on humans, maximizing the efficacy of automated evaluation before human subjects research must be undertaken. Leveraging recent advances in deep learning, the approach synthesizes realistic, interactive, and adaptive traffic for consumption by target web services. A case study applies the approach to evaluate an intrusion detection system equipped with application-layer embedded deceptive responses to attacks. Results demonstrate that synthesizing adaptive web traffic laced with evasive attacks powered by ensemble learning, online adaptive metric learning, and novel class detection to simulate skillful adversaries constitutes a challenging and aggressive test of cyberdeceptive defenses
    corecore