469 research outputs found

    A new data stream mining algorithm for interestingness-rich association rules

    Get PDF
    Frequent itemset mining and association rule generation is a challenging task in data stream. Even though, various algorithms have been proposed to solve the issue, it has been found out that only frequency does not decides the significance interestingness of the mined itemset and hence the association rules. This accelerates the algorithms to mine the association rules based on utility i.e. proficiency of the mined rules. However, fewer algorithms exist in the literature to deal with the utility as most of them deals with reducing the complexity in frequent itemset/association rules mining algorithm. Also, those few algorithms consider only the overall utility of the association rules and not the consistency of the rules throughout a defined number of periods. To solve this issue, in this paper, an enhanced association rule mining algorithm is proposed. The algorithm introduces new weightage validation in the conventional association rule mining algorithms to validate the utility and its consistency in the mined association rules. The utility is validated by the integrated calculation of the cost/price efficiency of the itemsets and its frequency. The consistency validation is performed at every defined number of windows using the probability distribution function, assuming that the weights are normally distributed. Hence, validated and the obtained rules are frequent and utility efficient and their interestingness are distributed throughout the entire time period. The algorithm is implemented and the resultant rules are compared against the rules that can be obtained from conventional mining algorithms

    Design Tools for Dynamic, Data-Driven, Stream Mining Systems

    Get PDF
    The proliferation of sensing devices and cost- and energy-efficient embedded processors has contributed to an increasing interest in adaptive stream mining (ASM) systems. In this class of signal processing systems, knowledge is extracted from data streams in real-time as the data arrives, rather than in a store-now, process later fashion. The evolution of machine learning methods in many application areas has contributed to demands for efficient and accurate information extraction from streams of data arriving at distributed, mobile, and heterogeneous processing nodes. To enhance accuracy, and meet the stringent constraints in which they must be deployed, it is important for ASM systems to be effective in adapting knowledge extraction approaches and processing configurations based on data characteristics and operational conditions. In this thesis, we address these challenges in design and implementation of ASM systems. We develop systematic methods and supporting design tools for ASM systems that integrate (1) foundations of dataflow modeling for high level signal processing system design, and (2) the paradigm on Dynamic Data-Driven Application Systems (DDDAS). More specifically, the contributions of this thesis can be broadly categorized in to three major directions: 1. We develop a new design framework that systematically applies dataflow methodologies for high level signal processing system design, and adaptive stream mining based on dynamic topologies of classifiers. In particular, we introduce a new design environment, called the lightweight dataflow for dynamic data driven application systems environment (LiD4E). LiD4E provides formal semantics, rooted in dataflow principles, for design and implementation of a broad class of stream mining topologies. Using this novel application of dataflow methods, LiD4E facilitates the efficient and reliable mapping and adaptation of classifier topologies into implementations on embedded platforms. 2. We introduce new design methods for data-driven digital signal processing (DSP) systems that are targeted to resource- and energy-constrained embedded environments, such as unmanned areal vehicles (UAVs), mobile communication platforms, and wireless sensor networks. We develop a design and implementation framework for multi-mode, data driven embedded signal processing systems, where application modes with complementary trade-offs are selected, configured, executed, and switched dynamically, in a data-driven manner. We demonstrate the utility of our proposed new design methods on an energy-constrained, multi-mode face detection application. 3. We introduce new methods for multiobjective, system-level optimization that have been incorporated into the LiD4E design tool described previously. More specifically, we develop new methods for integrated modeling and optimization of real-time stream mining constraints, multidimensional stream mining performance (e.g., precision and recall), and energy efficiency. Using a design methodology centered on data-driven control of and coordination between alternative dataflow subsystems for stream mining (classification modes), we develop systematic methods for exploring complex, multidimensional design spaces associated with dynamic stream mining systems, and deriving sets of Pareto-optimal system configurations that can be switched among based on data characteristics and operating constraints

    Graph-based feature enrichment for online intrusion detection in virtual networks

    Get PDF
    The increasing number of connected devices to provide the required ubiquitousness of Internet of Things paves the way for distributed network attacks at an unprecedented scale. Graph theory, strengthened by machine learning techniques, improves an automatic discovery of group behavior patterns of network threats often omitted by traditional security systems. Furthermore, Network Function Virtualization is an emergent technology that accelerates the provisioning of on-demand security function chains tailored to an application. Therefore, repeatable compliance tests and performance comparison of such function chains are mandatory. The contributions of this dissertation are divided in two parts. First, we propose an intrusion detection system for online threat detection enriched by a graph-learning analysis. We develop a feature enrichment algorithm that infers metrics from a graph analysis. By using different machine learning techniques, we evaluated our algorithm for three network traffic datasets. We show that the proposed graph-based enrichment improves the threat detection accuracy up to 15.7% and significantly reduces the false positives rate. Second, we aim to evaluate intrusion detection systems deployed as virtual network functions. Therefore, we propose and develop SFCPerf, a framework for an automatic performance evaluation of service function chaining. To demonstrate SFCPerf functionality, we design and implement a prototype of a security service function chain, composed of our intrusion detection system and a firewall. We show the results of a SFCPerf experiment that evaluates the chain prototype on top of the open platform for network function virtualization (OPNFV).O crescente número de dispositivos IoT conectados contribui para a ocorrência de ataques distribuídos de negação de serviço a uma escala sem precedentes. A Teoria de Grafos, reforçada por técnicas de aprendizado de máquina, melhora a descoberta automática de padrões de comportamento de grupos de ameaças de rede, muitas vezes omitidas pelos sistemas tradicionais de segurança. Nesse sentido, a virtualização da função de rede é uma tecnologia emergente que pode acelerar o provisionamento de cadeias de funções de segurança sob demanda para uma aplicação. Portanto, a repetição de testes de conformidade e a comparação de desempenho de tais cadeias de funções são obrigatórios. As contribuições desta dissertação são separadas em duas partes. Primeiro, é proposto um sistema de detecção de intrusão que utiliza um enriquecimento baseado em grafos para aprimorar a detecção de ameaças online. Um algoritmo de enriquecimento de características é desenvolvido e avaliado através de diferentes técnicas de aprendizado de máquina. Os resultados mostram que o enriquecimento baseado em grafos melhora a acurácia da detecção de ameaças até 15,7 % e reduz significativamente o número de falsos positivos. Em seguida, para avaliar sistemas de detecção de intrusões implantados como funções virtuais de rede, este trabalho propõe e desenvolve o SFCPerf, um framework para avaliação automática de desempenho do encadeamento de funções de rede. Para demonstrar a funcionalidade do SFCPerf, ´e implementado e avaliado um protótipo de uma cadeia de funções de rede de segurança, composta por um sistema de detecção de intrusão (IDS) e um firewall sobre a plataforma aberta para virtualização de função de rede (OPNFV)

    Cloud intrusion detection systems: fuzzy logic and classifications

    Get PDF
    Cloud Computing (CC), as defned by national Institute of Standards and Technology (NIST), is a new technology model for enabling convenient, on-demand network access to a shared pool of configurable computing resources such as networks, servers, storage, applications, and services that can be rapidly provisioned and released with minimal management effort or service-provider interaction. CC is a fast growing field; yet, there are major concerns regarding the detection of security threats, which in turn have urged experts to explore solutions to improve its security performance through conventional approaches, such as, Intrusion Detection System (IDS). In the literature, there are two most successful current IDS tools that are used worldwide: Snort and Suricata; however, these tools are not flexible to the uncertainty of intrusions. The aim of this study is to explore novel approaches to uplift the CC security performance using Type-1 fuzzy logic (T1FL) technique with IDS when compared to IDS alone. All experiments in this thesis were performed within a virtual cloud that was built within an experimental environment. By combining fuzzy logic technique (FL System) with IDSs, namely SnortIDS and SuricataIDS, SnortIDS and SuricataIDS for detection systems were used twice (with and without FL) to create four detection systems (FL-SnortIDS, FL-SuricataIDS, SnortIDS, and SuricataIDS) using Intrusion Detection Evaluation Dataset (namely ISCX). ISCX comprised two types of traffic (normal and threats); the latter was classified into four classes including Denial of Service, User-to-Root, Root-to-Local, and Probing. Sensitivity, specificity, accuracy, false alarms and detection rate were compared among the four detection systems. Then, Fuzzy Intrusion Detection System model was designed (namely FIDSCC) in CC based on the results of the aforementioned four detection systems. The FIDSCC model comprised of two individual systems pre-and-post threat detecting systems (pre-TDS and post-TDS). The pre-TDS was designed based on the number of threats in the aforementioned classes to assess the detection rate (DR). Based on the output of this DR and false positives of the four detection systems, the post-TDS was designed in order to assess CC security performance. To assure the validity of the results, classifier algorithms (CAs) were introduced to each of the four detection systems and four threat classes for further comparison. The classifier algorithms were OneR, Naive Bayes, Decision Tree (DT), and K-nearest neighbour. The comparison was made based on specific measures including accuracy, incorrect classified instances, mean absolute error, false positive rate, precision, recall, and ROC area. The empirical results showed that FL-SnortIDS was superior to FL-SuricataIDS, SnortIDS, and SuricataIDS in terms of sensitivity. However, insignificant difference was found in specificity, false alarms and accuracy among the four detection systems. Furthermore, among the four CAs, the combination of FL-SnortIDS and DT was shown to be the best detection method. The results of these studies showed that FIDSCC model can provide a better alternative to detecting threats and reducing the false positive rates more than the other conventional approaches

    Big-Data Streaming Applications Scheduling Based on Staged Multi-armed Bandits

    Get PDF
    Several techniques have been recently proposed to adapt Big-Data streaming applications to existing many core platforms. Among these techniques, online reinforcement learning methods have been proposed that learn how to adapt at run-time the throughput and resources allocated to the various streaming tasks depending on dynamically changing data stream characteristics and the desired applications performance (e.g., accuracy). However, most of state-of-the-art techniques consider only one single stream input in its application model input and assume that the system knows the amount of resources to allocate to each task to achieve a desired performance. To address these limitations, in this paper we propose a new systematic and efficient methodology and associated algorithms for online learning and energy-efficient scheduling of Big-Data streaming applications with multiple streams on many core systems with resource constraints. We formalize the problem of multi-stream scheduling as a staged decision problem in which the performance obtained for various resource allocations is unknown. The proposed scheduling methodology uses a novel class of online adaptive learning techniques which we refer to as staged multi-armed bandits (S-MAB). Our scheduler is able to learn online which processing method to assign to each stream and how to allocate its resources over time in order to maximize the performance on the fly, at run-time, without having access to any offline information. The proposed scheduler, applied on a face detection streaming application and without using any offline information, is able to achieve similar performance compared to an optimal semi-online solution that has full knowledge of the input stream where the differences in throughput, observed quality, resource usage and energy efficiency are less than 1%, 0.3%, 0.2% and 4% respectively

    Big-Data Streaming Applications Scheduling Based on Staged Multi-Armed Bandits

    Get PDF
    Several techniques have been recently proposed to adapt Big-Data streaming applications to existing many core platforms. Among these techniques, online reinforcement learning methods have been proposed that learn how to adapt at run-time the throughput and resources allocated to the various streaming tasks depending on dynamically changing data stream characteristics and the desired applications performance (e.g., accuracy). However, most of state-of-the-art techniques consider only one single stream input in its application model input and assume that the system knows the amount of resources to allocate to each task to achieve a desired performance. To address these limitations, in this paper we propose a new systematic and efficient methodology and associated algorithms for online learning and energy-efficient scheduling of Big-Data streaming applications with multiple streams on many core systems with resource constraints. We formalize the problem of multi-stream scheduling as a staged decision problem in which the performance obtained for various resource allocations is unknown. The proposed scheduling methodology uses a novel class of online adaptive learning techniques which we refer to as staged multi-armed bandits (S-MAB). Our scheduler is able to learn online which processing method to assign to each stream and how to allocate its resources over time in order to maximize the performance on the fly, at run-time, without having access to any offline information. The proposed scheduler, applied on a face detection streaming application and without using any offline information, is able to achieve similar performance compared to an optimal semi-online solution that has full knowledge of the input stream where the differences in throughput, observed quality, resource usage and energy efficiency are less than 1, 0.3, 0.2 and 4 percent respectively. � 2016 IEEE

    Real-time big data processing for anomaly detection : a survey

    Get PDF
    The advent of connected devices and omnipresence of Internet have paved way for intruders to attack networks, which leads to cyber-attack, financial loss, information theft in healthcare, and cyber war. Hence, network security analytics has become an important area of concern and has gained intensive attention among researchers, off late, specifically in the domain of anomaly detection in network, which is considered crucial for network security. However, preliminary investigations have revealed that the existing approaches to detect anomalies in network are not effective enough, particularly to detect them in real time. The reason for the inefficacy of current approaches is mainly due the amassment of massive volumes of data though the connected devices. Therefore, it is crucial to propose a framework that effectively handles real time big data processing and detect anomalies in networks. In this regard, this paper attempts to address the issue of detecting anomalies in real time. Respectively, this paper has surveyed the state-of-the-art real-time big data processing technologies related to anomaly detection and the vital characteristics of associated machine learning algorithms. This paper begins with the explanation of essential contexts and taxonomy of real-time big data processing, anomalous detection, and machine learning algorithms, followed by the review of big data processing technologies. Finally, the identified research challenges of real-time big data processing in anomaly detection are discussed. © 2018 Elsevier Lt

    Cloud intrusion detection systems: fuzzy logic and classifications

    Get PDF
    Cloud Computing (CC), as defned by national Institute of Standards and Technology (NIST), is a new technology model for enabling convenient, on-demand network access to a shared pool of configurable computing resources such as networks, servers, storage, applications, and services that can be rapidly provisioned and released with minimal management effort or service-provider interaction. CC is a fast growing field; yet, there are major concerns regarding the detection of security threats, which in turn have urged experts to explore solutions to improve its security performance through conventional approaches, such as, Intrusion Detection System (IDS). In the literature, there are two most successful current IDS tools that are used worldwide: Snort and Suricata; however, these tools are not flexible to the uncertainty of intrusions. The aim of this study is to explore novel approaches to uplift the CC security performance using Type-1 fuzzy logic (T1FL) technique with IDS when compared to IDS alone. All experiments in this thesis were performed within a virtual cloud that was built within an experimental environment. By combining fuzzy logic technique (FL System) with IDSs, namely SnortIDS and SuricataIDS, SnortIDS and SuricataIDS for detection systems were used twice (with and without FL) to create four detection systems (FL-SnortIDS, FL-SuricataIDS, SnortIDS, and SuricataIDS) using Intrusion Detection Evaluation Dataset (namely ISCX). ISCX comprised two types of traffic (normal and threats); the latter was classified into four classes including Denial of Service, User-to-Root, Root-to-Local, and Probing. Sensitivity, specificity, accuracy, false alarms and detection rate were compared among the four detection systems. Then, Fuzzy Intrusion Detection System model was designed (namely FIDSCC) in CC based on the results of the aforementioned four detection systems. The FIDSCC model comprised of two individual systems pre-and-post threat detecting systems (pre-TDS and post-TDS). The pre-TDS was designed based on the number of threats in the aforementioned classes to assess the detection rate (DR). Based on the output of this DR and false positives of the four detection systems, the post-TDS was designed in order to assess CC security performance. To assure the validity of the results, classifier algorithms (CAs) were introduced to each of the four detection systems and four threat classes for further comparison. The classifier algorithms were OneR, Naive Bayes, Decision Tree (DT), and K-nearest neighbour. The comparison was made based on specific measures including accuracy, incorrect classified instances, mean absolute error, false positive rate, precision, recall, and ROC area. The empirical results showed that FL-SnortIDS was superior to FL-SuricataIDS, SnortIDS, and SuricataIDS in terms of sensitivity. However, insignificant difference was found in specificity, false alarms and accuracy among the four detection systems. Furthermore, among the four CAs, the combination of FL-SnortIDS and DT was shown to be the best detection method. The results of these studies showed that FIDSCC model can provide a better alternative to detecting threats and reducing the false positive rates more than the other conventional approaches
    corecore