8 research outputs found

    Multi-Model Network Intrusion Detection System Using Distributed Feature Extraction and Supervised Learning

    Get PDF
    Intrusion Detection Systems (IDSs) monitor network traffic and system activities to identify any unauthorized or malicious behaviors. These systems usually leverage the principles of data science and machine learning to detect any deviations from normalcy by learning from the data associated with normal and abnormal patterns. The IDSs continue to suffer from issues like distributed high-dimensional data, inadequate robustness, slow detection, and high false-positive rates (FPRs). We investigate these challenges, determine suitable strategies, and propose relevant solutions based on the appropriate mathematical and computational concepts. To handle high-dimensional data in a distributed network, we optimize the feature space in a distributed manner using the PCA-based feature extraction method. The experimental results display that the classifiers built upon the features so extracted perform well by giving a similar level of accuracy as given by the ones that use the centrally extracted features. This method also significantly reduces the cumulative time needed for extraction. By utilizing the extracted features, we construct a distributed probabilistic classifier based on Naïve Bayes. Each node counts the local frequencies and passes those on to the central coordinator. The central coordinator accumulates the local frequencies and computes the global frequencies, which are used by the nodes to compute the required prior probabilities to perform classifications. Each node, being evenly trained, is capable of detecting intrusions individually to improve the overall robustness of the system. We also propose a similarity measure-based classification (SMC) technique that works by computing the cosine similarities between the class-specific frequential weights of the values in an observed instance and the average frequency-based data centroid. An instance is classified into the class whose weights for the values in it share the highest level of similarity with the centroid. SMC contributes alongside Naïve Bayes in a multi-model classification approach, which we introduce to reduce the FPR and improve the detection accuracy. This approach utilizes the similarities associated with each class label determined by SMC and the probabilities associated with each class label determined by Naïve Bayes. The similarities and probabilities are aggregated, separately, to form new features that are used to train and validate a tertiary classifier. We demonstrate that such a multi-model approach can attain a higher level of accuracy compared with the single-model classification techniques. The contributions made by this dissertation to enhance the scalability, robustness, and accuracy can help improve the efficacy of IDSs

    Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning

    Full text link
    We formally study how ensemble of deep learning models can improve test accuracy, and how the superior performance of ensemble can be distilled into a single model using knowledge distillation. We consider the challenging case where the ensemble is simply an average of the outputs of a few independently trained neural networks with the SAME architecture, trained using the SAME algorithm on the SAME data set, and they only differ by the random seeds used in the initialization. We show that ensemble/knowledge distillation in Deep Learning works very differently from traditional learning theory (such as boosting or NTKs, neural tangent kernels). To properly understand them, we develop a theory showing that when data has a structure we refer to as ``multi-view'', then ensemble of independently trained neural networks can provably improve test accuracy, and such superior test accuracy can also be provably distilled into a single model by training a single model to match the output of the ensemble instead of the true label. Our result sheds light on how ensemble works in deep learning in a way that is completely different from traditional theorems, and how the ``dark knowledge'' is hidden in the outputs of the ensemble and can be used in distillation. In the end, we prove that self-distillation can also be viewed as implicitly combining ensemble and knowledge distillation to improve test accuracy.Comment: v2/V3 polishes writin

    Development of unsupervised learning methods with applications to life sciences data

    Get PDF
    Machine Learning makes computers capable of performing tasks typically requiring human intelligence. A domain where it is having a considerable impact is the life sciences, allowing to devise new biological analysis protocols, develop patients’ treatments efficiently and faster, and reduce healthcare costs. This Thesis work presents new Machine Learning methods and pipelines for the life sciences focusing on the unsupervised field. At a methodological level, two methods are presented. The first is an “Ab Initio Local Principal Path” and it is a revised and improved version of a pre-existing algorithm in the manifold learning realm. The second contribution is an improvement over the Import Vector Domain Description (one-class learning) through the Kullback-Leibler divergence. It hybridizes kernel methods to Deep Learning obtaining a scalable solution, an improved probabilistic model, and state-of-the-art performances. Both methods are tested through several experiments, with a central focus on their relevance in life sciences. Results show that they improve the performances achieved by their previous versions. At the applicative level, two pipelines are presented. The first one is for the analysis of RNA-Seq datasets, both transcriptomic and single-cell data, and is aimed at identifying genes that may be involved in biological processes (e.g., the transition of tissues from normal to cancer). In this project, an R package is released on CRAN to make the pipeline accessible to the bioinformatic Community through high-level APIs. The second pipeline is in the drug discovery domain and is useful for identifying druggable pockets, namely regions of a protein with a high probability of accepting a small molecule (a drug). Both these pipelines achieve remarkable results. Lastly, a detour application is developed to identify the strengths/limitations of the “Principal Path” algorithm by analyzing Convolutional Neural Networks induced vector spaces. This application is conducted in the music and visual arts domains

    Model selection and the vectorial misspecification-resistant information criterion in multivariate time series

    Get PDF
    The thesis deals with the problem of Model Selection (MS) motivated by information and prediction theory, focusing on parametric time series (TS) models. The main contribution of the thesis is the extension to the multivariate case of the Misspecification-Resistant Information Criterion (MRIC), a criterion introduced recently that solves Akaike’s original research problem posed 50 years ago, which led to the definition of the AIC. The importance of MS is witnessed by the huge amount of literature devoted to it and published in scientific journals of many different disciplines. Despite such a widespread treatment, the contributions that adopt a mathematically rigorous approach are not so numerous and one of the aims of this project is to review and assess them. Chapter 2 discusses methodological aspects of MS from information theory. Information criteria (IC) for the i.i.d. setting are surveyed along with their asymptotic properties; and the cases of small samples, misspecification, further estimators. Chapter 3 surveys criteria for TS. IC and prediction criteria are considered for: univariate models (AR, ARMA) in the time and frequency domain, parametric multivariate (VARMA, VAR); nonparametric nonlinear (NAR); and high-dimensional models. The MRIC answers Akaike’s original question on efficient criteria, for possibly-misspecified (PM) univariate TS models in multi-step prediction with high-dimensional data and nonlinear models. Chapter 4 extends the MRIC to PM multivariate TS models for multi-step prediction introducing the Vectorial MRIC (VMRIC). We show that the VMRIC is asymptotically efficient by proving the decomposition of the MSPE matrix and the consistency of its Method-of-Moments Estimator (MoME), for Least Squares multi-step prediction with univariate regressor. Chapter 5 extends the VMRIC to the general multiple regressor case, by showing that the MSPE matrix decomposition holds, obtaining consistency for its MoME, and proving its efficiency. The chapter concludes with a digression on the conditions for PM VARX models
    corecore