3,385 research outputs found
A traffic classification method using machine learning algorithm
Applying concepts of attack investigation in IT industry, this idea has been developed to design
a Traffic Classification Method using Data Mining techniques at the intersection of Machine
Learning Algorithm, Which will classify the normal and malicious traffic. This classification will
help to learn about the unknown attacks faced by IT industry. The notion of traffic classification
is not a new concept; plenty of work has been done to classify the network traffic for
heterogeneous application nowadays. Existing techniques such as (payload based, port based
and statistical based) have their own pros and cons which will be discussed in this
literature later, but classification using Machine Learning techniques is still an open field to explore and has provided very promising results up till now
Maximum Margin Clustering for State Decomposition of Metastable Systems
When studying a metastable dynamical system, a prime concern is how to
decompose the phase space into a set of metastable states. Unfortunately, the
metastable state decomposition based on simulation or experimental data is
still a challenge. The most popular and simplest approach is geometric
clustering which is developed based on the classical clustering technique.
However, the prerequisites of this approach are: (1) data are obtained from
simulations or experiments which are in global equilibrium and (2) the
coordinate system is appropriately selected. Recently, the kinetic clustering
approach based on phase space discretization and transition probability
estimation has drawn much attention due to its applicability to more general
cases, but the choice of discretization policy is a difficult task. In this
paper, a new decomposition method designated as maximum margin metastable
clustering is proposed, which converts the problem of metastable state
decomposition to a semi-supervised learning problem so that the large margin
technique can be utilized to search for the optimal decomposition without phase
space discretization. Moreover, several simulation examples are given to
illustrate the effectiveness of the proposed method
Nonparametric Hierarchical Clustering of Functional Data
In this paper, we deal with the problem of curves clustering. We propose a
nonparametric method which partitions the curves into clusters and discretizes
the dimensions of the curve points into intervals. The cross-product of these
partitions forms a data-grid which is obtained using a Bayesian model selection
approach while making no assumptions regarding the curves. Finally, a
post-processing technique, aiming at reducing the number of clusters in order
to improve the interpretability of the clustering, is proposed. It consists in
optimally merging the clusters step by step, which corresponds to an
agglomerative hierarchical classification whose dissimilarity measure is the
variation of the criterion. Interestingly this measure is none other than the
sum of the Kullback-Leibler divergences between clusters distributions before
and after the merges. The practical interest of the approach for functional
data exploratory analysis is presented and compared with an alternative
approach on an artificial and a real world data set
Unsupervised Discretization by Two-dimensional MDL-based Histogram
Unsupervised discretization is a crucial step in many knowledge discovery
tasks. The state-of-the-art method for one-dimensional data infers locally
adaptive histograms using the minimum description length (MDL) principle, but
the multi-dimensional case is far less studied: current methods consider the
dimensions one at a time (if not independently), which result in
discretizations based on rectangular cells of adaptive size. Unfortunately,
this approach is unable to adequately characterize dependencies among
dimensions and/or results in discretizations consisting of more cells (or bins)
than is desirable. To address this problem, we propose an expressive model
class that allows for far more flexible partitions of two-dimensional data. We
extend the state of the art for the one-dimensional case to obtain a model
selection problem based on the normalised maximum likelihood, a form of refined
MDL. As the flexibility of our model class comes at the cost of a vast search
space, we introduce a heuristic algorithm, named PALM, which partitions each
dimension alternately and then merges neighbouring regions, all using the MDL
principle. Experiments on synthetic data show that PALM 1) accurately reveals
ground truth partitions that are within the model class (i.e., the search
space), given a large enough sample size; 2) approximates well a wide range of
partitions outside the model class; 3) converges, in contrast to its closest
competitor IPD; and 4) is self-adaptive with regard to both sample size and
local density structure of the data despite being parameter-free. Finally, we
apply our algorithm to two geographic datasets to demonstrate its real-world
potential.Comment: 30 pages, 9 figure
Classification methods for Hilbert data based on surrogate density
An unsupervised and a supervised classification approaches for Hilbert random
curves are studied. Both rest on the use of a surrogate of the probability
density which is defined, in a distribution-free mixture context, from an
asymptotic factorization of the small-ball probability. That surrogate density
is estimated by a kernel approach from the principal components of the data.
The focus is on the illustration of the classification algorithms and the
computational implications, with particular attention to the tuning of the
parameters involved. Some asymptotic results are sketched. Applications on
simulated and real datasets show how the proposed methods work.Comment: 33 pages, 11 figures, 6 table
- …