32,043 research outputs found
Enhance density peak clustering algorithm for anomaly intrusion detection system
In this paper proposed new model of Density Peak Clustering algorithm to enhance clustering of intrusion attacks. The Anomaly Intrusion Detection System (AIDS) by using original density peak clustering algorithm shows the stable in result to be applied to data-mining module of the intrusion detection system. The proposed system depends on two objectives; the first objective is to analyzing the disadvantage of DPC; however, we propose a novel improvement of DPC algorithm by modifying the calculation of local density method based on cosine similarity instead of the cat off distance parameter to improve the operation of selecting the peak points. The second objective is using the Gaussian kernel measure as a distance metric instead of Euclidean distance to improve clustering of high-dimensional complex nonlinear inseparable network traffic data and reduce the noise. The experimentations evaluated with NSL-KDD dataset
HIERARCHICAL CLUSTERING USING LEVEL SETS
Over the past several decades, clustering algorithms have earned their place as a go-to solution for database mining. This paper introduces a new concept which is used to develop a new recursive version of DBSCAN that can successfully perform hierarchical clustering, called Level- Set Clustering (LSC). A level-set is a subset of points of a data-set whose densities are greater than some threshold, ‘t’. By graphing the size of each level-set against its respective ‘t,’ indents are produced in the line graph which correspond to clusters in the data-set, as the points in a cluster have very similar densities. This new algorithm is able to produce the clustering result with the same O(n log n) time complexity as DBSCAN and OPTICS, while catching clusters the others missed
A non-parametric and scale-independent method for cluster analysis II: the multivariate case
A general method is described for detecting and analysing galaxy systems. The
multivariate geometrical structure of the sample is studied by using an
extension of the method which we introduced in a previous paper. The method is
based on an estimate of the probability density underlying a data sample. The
density is estimated by using an iterative and adaptive kernel estimator. The
used kernels have spherical symmetry, however we describe a method in order to
estimate the locally optimal shape of the kernels. We use the results of the
geometrical structure analysis in order to study the effects that is has on the
cluster parameter estimate. This suggests a possible way to distinguish between
structure and substructure within a sample. The method is tested by using
simulated numerical models and applied to two galaxy samples taken from the
literature. The results obtained for the Coma cluster suggest a core-halo
structure formed by a large number of geometrically independent systems. A
different conclusion is suggested by the results for the Cancer cluster
indicating the presence of at least two independent structures both containing
substructure. The dynamical consequences of the results obtained from the
geometrical analysis will be described in a later paper. Further applications
of the method are suggested and are currently in progress.Comment: To appear in Monthly Notices of R.A.S., 50 pages of text, latex file,
aasms style, figures are available on request from the Autho
A computational framework to emulate the human perspective in flow cytometric data analysis
Background: In recent years, intense research efforts have focused on developing methods for automated flow cytometric data analysis. However, while designing such applications, little or no attention has been paid to the human perspective that is absolutely central to the manual gating process of identifying and characterizing cell populations. In particular, the assumption of many common techniques that cell populations could be modeled reliably with pre-specified distributions may not hold true in real-life samples, which can have populations of arbitrary shapes and considerable inter-sample variation.
<p/>Results: To address this, we developed a new framework flowScape for emulating certain key aspects of the human perspective in analyzing flow data, which we implemented in multiple steps. First, flowScape begins with creating a mathematically rigorous map of the high-dimensional flow data landscape based on dense and sparse regions defined by relative concentrations of events around modes. In the second step, these modal clusters are connected with a global hierarchical structure. This representation allows flowScape to perform ridgeline analysis for both traversing the landscape and isolating cell populations at different levels of resolution. Finally, we extended manual gating with a new capacity for constructing templates that can identify target populations in terms of their relative parameters, as opposed to the more commonly used absolute or physical parameters. This allows flowScape to apply such templates in batch mode for detecting the corresponding populations in a flexible, sample-specific manner. We also demonstrated different applications of our framework to flow data analysis and show its superiority over other analytical methods.
<p/>Conclusions: The human perspective, built on top of intuition and experience, is a very important component of flow cytometric data analysis. By emulating some of its approaches and extending these with automation and rigor, flowScape provides a flexible and robust framework for computational cytomics
- …