8 research outputs found

    A local feature engineering strategy to improve network anomaly detection

    Get PDF
    The dramatic increase in devices and services that has characterized modern societies in recent decades, boosted by the exponential growth of ever faster network connections and the predominant use of wireless connection technologies, has materialized a very crucial challenge in terms of security. The anomaly-based intrusion detection systems, which for a long time have represented some of the most efficient solutions to detect intrusion attempts on a network, have to face this new and more complicated scenario. Well-known problems, such as the difficulty of distinguishing legitimate activities from illegitimate ones due to their similar characteristics and their high degree of heterogeneity, today have become even more complex, considering the increase in the network activity. After providing an extensive overview of the scenario under consideration, this work proposes a Local Feature Engineering (LFE) strategy aimed to face such problems through the adoption of a data preprocessing strategy that reduces the number of possible network event patterns, increasing at the same time their characterization. Unlike the canonical feature engineering approaches, which take into account the entire dataset, it operates locally in the feature space of each single event. The experiments conducted on real-world data showed that this strategy, which is based on the introduction of new features and the discretization of their values, improves the performance of the canonical state-of-the-art solutions

    Leveraging the Training Data Partitioning to Improve Events Characterization in Intrusion Detection Systems

    Get PDF
    The ever-increasing use of services based on computer networks, even in crucial areas unthinkable until a few years ago, has made the security of these networks a crucial element for anyone, also in consideration of the increasingly sophisticated techniques and strategies available to attackers. In this context, Intrusion Detection Systems (IDSs) play a primary role since they are responsible for analyzing and classifying each network activity as legitimate or illegitimate, allowing us to take the necessary countermeasures at the appropriate time. However, these systems are not infallible due to several reasons, the most important of which are the constant evolution of the attacks (e.g., zero-day attacks) and the problem that many of the attacks have behavior similar to those of legitimate activities, and therefore they are very hard to identify. This work relies on the hypothesis that the subdivision of the training data used for the IDS classification model definition into a certain number of partitions, in terms of events and features, can improve the characterization of the network events, improving the system performance. The non-overlapping data partitions train independent classification models, classifying the event according to a majority-voting rule. A series of experiments conducted on a benchmark real-world dataset support the initial hypothesis, showing a performance improvement with respect to a canonical training approach

    A Region-based Training Data Segmentation Strategy to Credit Scoring

    Get PDF
    The rating of users requesting financial services is a growing task, especially in this historical period of the COVID-19 pandemic characterized by a dramatic increase in online activities, mainly related to e-commerce. This kind of assessment is a task manually performed in the past that today needs to be carried out by automatic credit scoring systems, due to the enormous number of requests to process. It follows that such systems play a crucial role for financial operators, as their effectiveness is directly related to gains and losses of money. Despite the huge investments in terms of financial and human resources devoted to the development of such systems, the state-of-the-art solutions are transversally affected by some well-known problems that make the development of credit scoring systems a challenging task, mainly related to the unbalance and heterogeneity of the involved data, problems to which it adds the scarcity of public datasets. The Region-based Training Data Segmentation (RTDS) strategy proposed in this work revolves around a divide-and-conquer approach, where the user classification depends on the results of several sub-classifications. In more detail, the training data is divided into regions that bound different users and features, which are used to train several classification models that will lead toward the final classification through a majority voting rule. Such a strategy relies on the consideration that the independent analysis of different users and features can lead to a more accurate classification than that offered by a single evaluation model trained on the entire dataset. The validation process carried out using three public real-world datasets with a different number of features. samples, and degree of data imbalance demonstrates the effectiveness of the proposed strategy. which outperforms the canonical training one in the context of all the datasets

    FLAGS : a methodology for adaptive anomaly detection and root cause analysis on sensor data streams by fusing expert knowledge with machine learning

    Get PDF
    Anomalies and faults can be detected, and their causes verified, using both data-driven and knowledge-driven techniques. Data-driven techniques can adapt their internal functioning based on the raw input data but fail to explain the manifestation of any detection. Knowledge-driven techniques inherently deliver the cause of the faults that were detected but require too much human effort to set up. In this paper, we introduce FLAGS, the Fused-AI interpretabLe Anomaly Generation System, and combine both techniques in one methodology to overcome their limitations and optimize them based on limited user feedback. Semantic knowledge is incorporated in a machine learning technique to enhance expressivity. At the same time, feedback about the faults and anomalies that occurred is provided as input to increase adaptiveness using semantic rule mining methods. This new methodology is evaluated on a predictive maintenance case for trains. We show that our method reduces their downtime and provides more insight into frequently occurring problems. (C) 2020 The Authors. Published by Elsevier B.V

    Computational Optimizations for Machine Learning

    Get PDF
    The present book contains the 10 articles finally accepted for publication in the Special Issue “Computational Optimizations for Machine Learning” of the MDPI journal Mathematics, which cover a wide range of topics connected to the theory and applications of machine learning, neural networks and artificial intelligence. These topics include, among others, various types of machine learning classes, such as supervised, unsupervised and reinforcement learning, deep neural networks, convolutional neural networks, GANs, decision trees, linear regression, SVM, K-means clustering, Q-learning, temporal difference, deep adversarial networks and more. It is hoped that the book will be interesting and useful to those developing mathematical algorithms and applications in the domain of artificial intelligence and machine learning as well as for those having the appropriate mathematical background and willing to become familiar with recent advances of machine learning computational optimization mathematics, which has nowadays permeated into almost all sectors of human life and activity

    A discretized extended feature space (DEFS) model to improve the anomaly detection performance in network intrusion detection systems

    No full text
    The unbreakable bond that exists today between devices and network connections makes the security of the latter a crucial element for our society. For this reason, in recent decades we have witnessed an exponential growth in research efforts aimed at identifying increasingly efficient techniques able to tackle this type of problem, such as the Intrusion Detection System (IDS). If on the one hand an IDS plays a key role, since it is designed to classify the network events as normal or intrusion, on the other hand it has to face several well-known problems that reduce its effectiveness. The most important of them is the high number of false positives related to its inability to detect event patterns not occurred in the past (i.e. zero-day attacks). This paper introduces a novel Discretized Extended Feature Space (DEFS) model that presents a twofold advantage: first, through a discretization process it reduces the event patterns by grouping those similar in terms of feature values, reducing the issues related to the classification of unknown events; second, it balances such a discretization by extending the event patterns with a series of meta-information able to well characterize them. The approach has been evaluated by using a real-world dataset (NSL-KDD) and by adopting both the in-sample/out-of-sample and time series cross-validation strategies in order to avoid that the evaluation is biased by over-fitting. The experimental results show how the proposed DEFS model is able to improve the classification performance in the most challenging scenarios (unbalanced samples), with regard to the canonical state-of-the-art solutions

    Tools and Algorithms for the Construction and Analysis of Systems

    Get PDF
    This open access two-volume set constitutes the proceedings of the 27th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2021, which was held during March 27 – April 1, 2021, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021. The conference was planned to take place in Luxembourg and changed to an online format due to the COVID-19 pandemic. The total of 41 full papers presented in the proceedings was carefully reviewed and selected from 141 submissions. The volume also contains 7 tool papers; 6 Tool Demo papers, 9 SV-Comp Competition Papers. The papers are organized in topical sections as follows: Part I: Game Theory; SMT Verification; Probabilities; Timed Systems; Neural Networks; Analysis of Network Communication. Part II: Verification Techniques (not SMT); Case Studies; Proof Generation/Validation; Tool Papers; Tool Demo Papers; SV-Comp Tool Competition Papers
    corecore