Search CORE

3,385 research outputs found

A traffic classification method using machine learning algorithm

Author: Chishti Hamayoun Rauf
Publication venue: University of Bedfordshire
Publication date: 01/01/2013
Field of study

Applying concepts of attack investigation in IT industry, this idea has been developed to design a Traffic Classification Method using Data Mining techniques at the intersection of Machine Learning Algorithm, Which will classify the normal and malicious traffic. This classification will help to learn about the unknown attacks faced by IT industry. The notion of traffic classification is not a new concept; plenty of work has been done to classify the network traffic for heterogeneous application nowadays. Existing techniques such as (payload based, port based and statistical based) have their own pros and cons which will be discussed in this literature later, but classification using Machine Learning techniques is still an open field to explore and has provided very promising results up till now

University of Bedfordshire Repository

Maximum Margin Clustering for State Decomposition of Metastable Systems

Author: Allwein
Becker
Berglund
Biancalani
Bowman
Boyd
Chema
Chodera
Chodera
Chodera
Crammer
Daura
Deuflhard
Deuflhard
Elmer
Genova
Glättli
Groningen
Hao Wu
Hastie
Horn
Jain
Keller
Kellogg
Kloeden
Kwak
McGibbon
Mehrmann
Noé
Noé
Noé
Noé
Noé
Nüske
Prinz
Pryor
Pérez-Hernández
Rahimi
Sarich
Schwantes
Shalev-Shwartz
Shao
Sorin
Swope
Vapnik
Wu
Xu
Yao
Zhang
Publication venue
Publication date: 31/12/2014
Field of study

When studying a metastable dynamical system, a prime concern is how to decompose the phase space into a set of metastable states. Unfortunately, the metastable state decomposition based on simulation or experimental data is still a challenge. The most popular and simplest approach is geometric clustering which is developed based on the classical clustering technique. However, the prerequisites of this approach are: (1) data are obtained from simulations or experiments which are in global equilibrium and (2) the coordinate system is appropriately selected. Recently, the kinetic clustering approach based on phase space discretization and transition probability estimation has drawn much attention due to its applicability to more general cases, but the choice of discretization policy is a difficult task. In this paper, a new decomposition method designated as maximum margin metastable clustering is proposed, which converts the problem of metastable state decomposition to a semi-supervised learning problem so that the large margin technique can be utilized to search for the optimal decomposition without phase space discretization. Moreover, several simulation examples are given to illustrate the effectiveness of the proposed method

arXiv.org e-Print Archive

Crossref

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

Nonparametric Hierarchical Clustering of Functional Data

Author: C. Abraham
D.M. Blei
F. Chamroukhi
G. Delaigle
G. Hébrail
J. Rissanen
M. Abramowitz
P. Hansen
R.M. Neal
T. Cover
T. Gasser
X. Nguyen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

In this paper, we deal with the problem of curves clustering. We propose a nonparametric method which partitions the curves into clusters and discretizes the dimensions of the curve points into intervals. The cross-product of these partitions forms a data-grid which is obtained using a Bayesian model selection approach while making no assumptions regarding the curves. Finally, a post-processing technique, aiming at reducing the number of clusters in order to improve the interpretability of the clustering, is proposed. It consists in optimally merging the clusters step by step, which corresponds to an agglomerative hierarchical classification whose dissimilarity measure is the variation of the criterion. Interestingly this measure is none other than the sum of the Kullback-Leibler divergences between clusters distributions before and after the merges. The practical interest of the approach for functional data exploratory analysis is presented and compared with an alternative approach on an artificial and a real world data set

arXiv.org e-Print Archive

Crossref

HAL-Paris1

Unsupervised Discretization by Two-dimensional MDL-based Histogram

Author: Baratchi Mitra
van Leeuwen Matthijs
Yang Lincen
Publication venue
Publication date: 02/06/2020
Field of study

Unsupervised discretization is a crucial step in many knowledge discovery tasks. The state-of-the-art method for one-dimensional data infers locally adaptive histograms using the minimum description length (MDL) principle, but the multi-dimensional case is far less studied: current methods consider the dimensions one at a time (if not independently), which result in discretizations based on rectangular cells of adaptive size. Unfortunately, this approach is unable to adequately characterize dependencies among dimensions and/or results in discretizations consisting of more cells (or bins) than is desirable. To address this problem, we propose an expressive model class that allows for far more flexible partitions of two-dimensional data. We extend the state of the art for the one-dimensional case to obtain a model selection problem based on the normalised maximum likelihood, a form of refined MDL. As the flexibility of our model class comes at the cost of a vast search space, we introduce a heuristic algorithm, named PALM, which partitions each dimension alternately and then merges neighbouring regions, all using the MDL principle. Experiments on synthetic data show that PALM 1) accurately reveals ground truth partitions that are within the model class (i.e., the search space), given a large enough sample size; 2) approximates well a wide range of partitions outside the model class; 3) converges, in contrast to its closest competitor IPD; and 4) is self-adaptive with regard to both sample size and local density structure of the data despite being parameter-free. Finally, we apply our algorithm to two geographic datasets to demonstrate its real-world potential.Comment: 30 pages, 9 figure

arXiv.org e-Print Archive

Classification methods for Hilbert data based on surrogate density

Author: Bongiorno Enea G.
Goia Aldo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

An unsupervised and a supervised classification approaches for Hilbert random curves are studied. Both rest on the use of a surrogate of the probability density which is defined, in a distribution-free mixture context, from an asymptotic factorization of the small-ball probability. That surrogate density is estimated by a kernel approach from the principal components of the data. The focus is on the illustration of the classification algorithms and the computational implications, with particular attention to the tuning of the parameters involved. Some asymptotic results are sketched. Applications on simulated and real datasets show how the proposed methods work.Comment: 33 pages, 11 figures, 6 table

arXiv.org e-Print Archive

Crossref

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale