3,385 research outputs found

    A traffic classification method using machine learning algorithm

    Get PDF
    Applying concepts of attack investigation in IT industry, this idea has been developed to design a Traffic Classification Method using Data Mining techniques at the intersection of Machine Learning Algorithm, Which will classify the normal and malicious traffic. This classification will help to learn about the unknown attacks faced by IT industry. The notion of traffic classification is not a new concept; plenty of work has been done to classify the network traffic for heterogeneous application nowadays. Existing techniques such as (payload based, port based and statistical based) have their own pros and cons which will be discussed in this literature later, but classification using Machine Learning techniques is still an open field to explore and has provided very promising results up till now

    Maximum Margin Clustering for State Decomposition of Metastable Systems

    Full text link
    When studying a metastable dynamical system, a prime concern is how to decompose the phase space into a set of metastable states. Unfortunately, the metastable state decomposition based on simulation or experimental data is still a challenge. The most popular and simplest approach is geometric clustering which is developed based on the classical clustering technique. However, the prerequisites of this approach are: (1) data are obtained from simulations or experiments which are in global equilibrium and (2) the coordinate system is appropriately selected. Recently, the kinetic clustering approach based on phase space discretization and transition probability estimation has drawn much attention due to its applicability to more general cases, but the choice of discretization policy is a difficult task. In this paper, a new decomposition method designated as maximum margin metastable clustering is proposed, which converts the problem of metastable state decomposition to a semi-supervised learning problem so that the large margin technique can be utilized to search for the optimal decomposition without phase space discretization. Moreover, several simulation examples are given to illustrate the effectiveness of the proposed method

    Nonparametric Hierarchical Clustering of Functional Data

    Full text link
    In this paper, we deal with the problem of curves clustering. We propose a nonparametric method which partitions the curves into clusters and discretizes the dimensions of the curve points into intervals. The cross-product of these partitions forms a data-grid which is obtained using a Bayesian model selection approach while making no assumptions regarding the curves. Finally, a post-processing technique, aiming at reducing the number of clusters in order to improve the interpretability of the clustering, is proposed. It consists in optimally merging the clusters step by step, which corresponds to an agglomerative hierarchical classification whose dissimilarity measure is the variation of the criterion. Interestingly this measure is none other than the sum of the Kullback-Leibler divergences between clusters distributions before and after the merges. The practical interest of the approach for functional data exploratory analysis is presented and compared with an alternative approach on an artificial and a real world data set

    Unsupervised Discretization by Two-dimensional MDL-based Histogram

    Full text link
    Unsupervised discretization is a crucial step in many knowledge discovery tasks. The state-of-the-art method for one-dimensional data infers locally adaptive histograms using the minimum description length (MDL) principle, but the multi-dimensional case is far less studied: current methods consider the dimensions one at a time (if not independently), which result in discretizations based on rectangular cells of adaptive size. Unfortunately, this approach is unable to adequately characterize dependencies among dimensions and/or results in discretizations consisting of more cells (or bins) than is desirable. To address this problem, we propose an expressive model class that allows for far more flexible partitions of two-dimensional data. We extend the state of the art for the one-dimensional case to obtain a model selection problem based on the normalised maximum likelihood, a form of refined MDL. As the flexibility of our model class comes at the cost of a vast search space, we introduce a heuristic algorithm, named PALM, which partitions each dimension alternately and then merges neighbouring regions, all using the MDL principle. Experiments on synthetic data show that PALM 1) accurately reveals ground truth partitions that are within the model class (i.e., the search space), given a large enough sample size; 2) approximates well a wide range of partitions outside the model class; 3) converges, in contrast to its closest competitor IPD; and 4) is self-adaptive with regard to both sample size and local density structure of the data despite being parameter-free. Finally, we apply our algorithm to two geographic datasets to demonstrate its real-world potential.Comment: 30 pages, 9 figure

    Classification methods for Hilbert data based on surrogate density

    Get PDF
    An unsupervised and a supervised classification approaches for Hilbert random curves are studied. Both rest on the use of a surrogate of the probability density which is defined, in a distribution-free mixture context, from an asymptotic factorization of the small-ball probability. That surrogate density is estimated by a kernel approach from the principal components of the data. The focus is on the illustration of the classification algorithms and the computational implications, with particular attention to the tuning of the parameters involved. Some asymptotic results are sketched. Applications on simulated and real datasets show how the proposed methods work.Comment: 33 pages, 11 figures, 6 table
    • …
    corecore