1,328 research outputs found

    Efficient Intrusion Detection Model Using Ensemble Methods

    Get PDF
    Ensemble method or any combination model train multiple learners to solve the classification or regression problems, not by simply ordinary learning approaches that can able to construct one learner from training data rather construct a set of learners and combine them. Boosting algorithm is one of the most important recent developments in the area of classification methodology. Boosting belongs to a family of algorithms that has the capability to convert a group of weak learners to strong learners. Boosting works in a sequential manner by adding a classification algorithm to the next updated weight of the training samples by doing the majority voting technique of the sequence of classifiers. The boosting method combines the weak models to produce a powerful one and reduces the bias of the combined model. AdaBoost algorithm is the most influential algorithm that efficiently combines the weak learners to generate a strong classifier that could be able to classify a training data with better accuracy. AdaBoost differs from the current existing boosting methods in detection accuracy, error cost minimization, computational time and detection rate. Detection accuracy and computational cost are the two main metrics used to analyze the performance of AdaBoost classification algorithm. From the simulation result, it is evident that AdaBoost algorithm could able to achieve high detection accuracy with less computational time, and minimum cost compared to a single classifier. We have proposed a predictive model to classify normal class and attack class and an online inference engine is being imposed, either to allow or deny access to a network

    Privacy-Preserving intrusion detection over network data

    Get PDF
    Effective protection against cyber-attacks requires constant monitoring and analysis of system data such as log files and network packets in an IT infrastructure, which may contain sensitive information. To this end, security operation centers (SOC) are established to detect, analyze, and respond to cyber-security incidents. Security officers at SOC are not necessarily trusted with handling the content of the sensitive and private information, especially in case when SOC services are outsourced as maintaining in-house expertise and capability in cyber-security is expensive. Therefore, an end-to-end security solution is needed for the system data. SOC often utilizes detection models either for known types of attacks or for an anomaly and applies them to the collected data to detect cyber-security incidents. The models are usually constructed from historical data that contains records pertaining to attacks and normal functioning of the IT infrastructure under monitoring; e.g., using machine learning techniques. SOC is also motivated to keep its models confidential for three reasons: i) to capitalize on the models that are its propriety expertise, ii) to protect its detection strategies against adversarial machine learning, in which intelligent and adaptive adversaries carefully manipulate their attack strategy to avoid detection, and iii) the model might have been trained on sensitive information, whereby revealing the model can violate certain laws and regulations. Therefore, detection models are also private. In this dissertation, we propose a scenario in which privacy of both system data and detection models is protected and information leakage is either prevented altogether or quantifiably decreased. Our main approach is to provide an end-to-end encryption for system data and detection models utilizing lattice-based cryptography that allows homomorphic operations over the encrypted data. Assuming that the detection models are previously obtained from training data by SOC, we apply the models to system data homomorphically, whereby the model is encrypted. We take advantage of three different machine learning algorithms to extract intrusion models by training historical data. Using different data sets (two recent data sets, and one outdated but widely used in the intrusion detection literature), the performance of each algorithm is evaluated via the following metrics: i) the time that takes to extract the rules, ii) the time that takes to apply the rules on data homomorphically, iii) the accuracy of the rules in detecting intrusions, and iv) the number of rules. Our experiments demonstrates that the proposed privacy-preserving intrusion detection system (IDS) is feasible in terms of execution times and reliable in terms of accurac

    Discriminative models for multi-instance problems with tree-structure

    Full text link
    Modeling network traffic is gaining importance in order to counter modern threats of ever increasing sophistication. It is though surprisingly difficult and costly to construct reliable classifiers on top of telemetry data due to the variety and complexity of signals that no human can manage to interpret in full. Obtaining training data with sufficiently large and variable body of labels can thus be seen as prohibitive problem. The goal of this work is to detect infected computers by observing their HTTP(S) traffic collected from network sensors, which are typically proxy servers or network firewalls, while relying on only minimal human input in model training phase. We propose a discriminative model that makes decisions based on all computer's traffic observed during predefined time window (5 minutes in our case). The model is trained on collected traffic samples over equally sized time window per large number of computers, where the only labels needed are human verdicts about the computer as a whole (presumed infected vs. presumed clean). As part of training the model itself recognizes discriminative patterns in traffic targeted to individual servers and constructs the final high-level classifier on top of them. We show the classifier to perform with very high precision, while the learned traffic patterns can be interpreted as Indicators of Compromise. In the following we implement the discriminative model as a neural network with special structure reflecting two stacked multi-instance problems. The main advantages of the proposed configuration include not only improved accuracy and ability to learn from gross labels, but also automatic learning of server types (together with their detectors) which are typically visited by infected computers

    Innovative machine learning techniques for security detection problems

    Full text link
    University of Technology, Sydney. Faculty of Engineering and Information Technology.Most of the currently available network security techniques cannot cope with the dynamic and increasingly complex nature of the attacks on distributed computer systems. Therefore, an automated and adaptive defensive tool is imperative for computer networks. Alongside the existing techniques for preventing intrusions such as encryption and firewalls, Intrusion Detection System (IDS) technology has established itself as an emerging field that is able to detect unauthorized access and abuse of computer systems from both internal users and external offenders. Most of the novel approaches in this field have adopted Artificial Intelligence (AI) technologies such as Artificial Neural Networks (ANN) to improve detection performance. The true power and advantage of ANN lie in its ability to represent both linear and non-linear underlying functions and learn these functions directly from the data being modeled. However, ANN is computationally expensive due to its demanding processing power and this leads to the overfitting problem, i.e. the network is unable to extrapolate accurately once the input is outside of the training data range. These limitations challenge security systems with low detection rate, high false alarm rate and excessive computation cost. In this research, a novel Machine Learning (ML) algorithm is developed to alleviate those difficulties of conventional detection techniques used in available IDS. By implementing Adaptive Boosting and Semi-parametric radial-basis-function neural networks, this model aims at minimizing learning bias (how well the model fits the available sample data) and generalization variance (how stable the model is for unseen instances) at an affordable cost of computation. The proposed method is applied to a set of Security Detection Problems which aim to detect security breaches within computer networks. In particular, we consider two benchmarking problems: intrusion detection and anti-spam filtering. It is empirically shown that our technique outperforms other state-of-the-art predictive algorithms in both of the problems, with significantly increased detection accuracy, minimal false alarms and relatively low computation

    Machine Learning

    Get PDF
    Machine Learning can be defined in various ways related to a scientific domain concerned with the design and development of theoretical and implementation tools that allow building systems with some Human Like intelligent behavior. Machine learning addresses more specifically the ability to improve automatically through experience

    Time series classification based on fractal properties

    Full text link
    The article considers classification task of fractal time series by the meta algorithms based on decision trees. Binomial multiplicative stochastic cascades are used as input time series. Comparative analysis of the classification approaches based on different features is carried out. The results indicate the advantage of the machine learning methods over the traditional estimating the degree of self-similarity.Comment: 4 pages, 2 figures, 3 equations, 1 tabl

    Intrusion Detection: Embedded Software Machine Learning and Hardware Rules Based Co-Designs

    Get PDF
    Security of innovative technologies in future generation networks such as (Cyber Physical Systems (CPS) and Wi-Fi has become a critical universal issue for individuals, economy, enterprises, organizations and governments. The rate of cyber-attacks has increased dramatically, and the tactics used by the attackers are continuing to evolve and have become ingenious during the attacks. Intrusion Detection is one of the solutions against these attacks. One approach in designing an intrusion detection system (IDS) is software-based machine learning. Such approach can predict and detect threats before they result in major security incidents. Moreover, despite the considerable research in machine learning based designs, there is still a relatively small body of literature that is concerned with imbalanced class distributions from the intrusion detection system perspective. In addition, it is necessary to have an effective performance metric that can compare multiple multi-class as well as binary-class systems with respect to class distribution. Furthermore, the expectant detection techniques must have the ability to identify real attacks from random defects, ingrained defects in the design, misconfigurations of the system devices, system faults, human errors, and software implementation errors. Moreover, a lightweight IDS that is small, real-time, flexible and reconfigurable enough to be used as permanent elements of the system's security infrastructure is essential. The main goal of the current study is to design an effective and accurate intrusion detection framework with minimum features that are more discriminative and representative. Three publicly available datasets representing variant networking environments are adopted which also reflect realistic imbalanced class distributions as well as updated attack patterns. The presented intrusion detection framework is composed of three main modules: feature selection and dimensionality reduction, handling imbalanced class distributions, and classification. The feature selection mechanism utilizes searching algorithms and correlation based subset evaluation techniques, whereas the feature dimensionality reduction part utilizes principal component analysis and auto-encoder as an instance of deep learning. Various classifiers, including eight single-learning classifiers, four ensemble classifiers, one stacked classifier, and five imbalanced class handling approaches are evaluated to identify the most efficient and accurate one(s) for the proposed intrusion detection framework. A hardware-based approach to detect malicious behaviors of sensors and actuators embedded in medical devices, in which the safety of the patient is critical and of utmost importance, is additionally proposed. The idea is based on a methodology that transforms a device's behavior rules into a state machine to build a Behavior Specification Rules Monitoring (BSRM) tool for four medical devices. Simulation and synthesis results demonstrate that the BSRM tool can effectively identify the expected normal behavior of the device and detect any deviation from its normal behavior. The performance of the BSRM approach has also been compared with a machine learning based approach for the same problem. The FPGA module of the BSRM can be embedded in medical devices as an IDS and can be further integrated with the machine learning based approach. The reconfigurable nature of the FPGA chip adds an extra advantage to the designed model in which the behavior rules can be easily updated and tailored according to the requirements of the device, patient, treatment algorithm, and/or pervasive healthcare application

    Combining K-Means and XGBoost Models for Anomaly Detection Using Log Datasets

    Get PDF
    Abstract: Computing and networking systems traditionally record their activity in log files, which have been used for multiple purposes, such as troubleshooting, accounting, post-incident analysis of security breaches, capacity planning and anomaly detection. In earlier systems those log files were processed manually by system administrators, or with the support of basic applications for filtering, compiling and pre-processing the logs for specific purposes. However, as the volume of these log files continues to grow (more logs per system, more systems per domain), it is becoming increasingly difficult to process those logs using traditional tools, especially for less straightforward purposes such as anomaly detection. On the other hand, as systems continue to become more complex, the potential of using large datasets built of logs from heterogeneous sources for detecting anomalies without prior domain knowledge becomes higher. Anomaly detection tools for such scenarios face two challenges. First, devising appropriate data analysis solutions for effectively detecting anomalies from large data sources, possibly without prior domain knowledge. Second, adopting data processing platforms able to cope with the large datasets and complex data analysis algorithms required for such purposes. In this paper we address those challenges by proposing an integrated scalable framework that aims at efficiently detecting anomalous events on large amounts of unlabeled data logs. Detection is supported by clustering and classification methods that take advantage of parallel computing environments. We validate our approach using the the well known NASA Hypertext Transfer Protocol (HTTP) logs datasets. Fourteen features were extracted in order to train a k-means model for separating anomalous and normal events in highly coherent clusters. A second model, making use of the XGBoost system implementing a gradient tree boosting algorithm, uses the previous binary clustered data for producing a set of simple interpretable rules. These rules represent the rationale for generalizing its application over a massive number of unseen events in a distributed computing environment. The classified anomaly events produced by our framework can be used, for instance, as candidates for further forensic and compliance auditing analysis in security management.info:eu-repo/semantics/publishedVersio
    corecore