32,043 research outputs found

    Enhance density peak clustering algorithm for anomaly intrusion detection system

    Get PDF
    In this paper proposed new model of Density Peak Clustering algorithm to enhance clustering of intrusion attacks. The Anomaly Intrusion Detection System (AIDS) by using original density peak clustering algorithm shows the stable in result to be applied to data-mining module of the intrusion detection system. The proposed system depends on two objectives; the first objective is to analyzing the disadvantage of DPC; however, we propose a novel improvement of DPC algorithm by modifying the calculation of local density method based on cosine similarity instead of the cat off distance parameter to improve the operation of selecting the peak points. The second objective is using the Gaussian kernel measure as a distance metric instead of Euclidean distance to improve clustering of high-dimensional complex nonlinear inseparable network traffic data and reduce the noise. The experimentations evaluated with NSL-KDD dataset

    HIERARCHICAL CLUSTERING USING LEVEL SETS

    Get PDF
    Over the past several decades, clustering algorithms have earned their place as a go-to solution for database mining. This paper introduces a new concept which is used to develop a new recursive version of DBSCAN that can successfully perform hierarchical clustering, called Level- Set Clustering (LSC). A level-set is a subset of points of a data-set whose densities are greater than some threshold, ‘t’. By graphing the size of each level-set against its respective ‘t,’ indents are produced in the line graph which correspond to clusters in the data-set, as the points in a cluster have very similar densities. This new algorithm is able to produce the clustering result with the same O(n log n) time complexity as DBSCAN and OPTICS, while catching clusters the others missed

    A non-parametric and scale-independent method for cluster analysis II: the multivariate case

    Get PDF
    A general method is described for detecting and analysing galaxy systems. The multivariate geometrical structure of the sample is studied by using an extension of the method which we introduced in a previous paper. The method is based on an estimate of the probability density underlying a data sample. The density is estimated by using an iterative and adaptive kernel estimator. The used kernels have spherical symmetry, however we describe a method in order to estimate the locally optimal shape of the kernels. We use the results of the geometrical structure analysis in order to study the effects that is has on the cluster parameter estimate. This suggests a possible way to distinguish between structure and substructure within a sample. The method is tested by using simulated numerical models and applied to two galaxy samples taken from the literature. The results obtained for the Coma cluster suggest a core-halo structure formed by a large number of geometrically independent systems. A different conclusion is suggested by the results for the Cancer cluster indicating the presence of at least two independent structures both containing substructure. The dynamical consequences of the results obtained from the geometrical analysis will be described in a later paper. Further applications of the method are suggested and are currently in progress.Comment: To appear in Monthly Notices of R.A.S., 50 pages of text, latex file, aasms style, figures are available on request from the Autho

    A computational framework to emulate the human perspective in flow cytometric data analysis

    Get PDF
    Background: In recent years, intense research efforts have focused on developing methods for automated flow cytometric data analysis. However, while designing such applications, little or no attention has been paid to the human perspective that is absolutely central to the manual gating process of identifying and characterizing cell populations. In particular, the assumption of many common techniques that cell populations could be modeled reliably with pre-specified distributions may not hold true in real-life samples, which can have populations of arbitrary shapes and considerable inter-sample variation. <p/>Results: To address this, we developed a new framework flowScape for emulating certain key aspects of the human perspective in analyzing flow data, which we implemented in multiple steps. First, flowScape begins with creating a mathematically rigorous map of the high-dimensional flow data landscape based on dense and sparse regions defined by relative concentrations of events around modes. In the second step, these modal clusters are connected with a global hierarchical structure. This representation allows flowScape to perform ridgeline analysis for both traversing the landscape and isolating cell populations at different levels of resolution. Finally, we extended manual gating with a new capacity for constructing templates that can identify target populations in terms of their relative parameters, as opposed to the more commonly used absolute or physical parameters. This allows flowScape to apply such templates in batch mode for detecting the corresponding populations in a flexible, sample-specific manner. We also demonstrated different applications of our framework to flow data analysis and show its superiority over other analytical methods. <p/>Conclusions: The human perspective, built on top of intuition and experience, is a very important component of flow cytometric data analysis. By emulating some of its approaches and extending these with automation and rigor, flowScape provides a flexible and robust framework for computational cytomics
    • …
    corecore