71,140 research outputs found

    Improved Density Peak Clustering Algorithm Based on Choosing Strategy Automatically for Cut-off Distance and Cluster Centre

    Get PDF
    Due to the defect of quick search density peak clustering algorithm required an artificial attempt to determine the cut-off distance and circle the clustering centres, density peak clustering algorithm based on choosing strategy automatically for cut-off distance and cluster center (CSA-DP) is proposed. The algorithm introduces the improved idea of determining cut-off distance and clustering centres, according to the approximate distance that maximum density sample point to minimum density sample point and the variation of similarity between the points which may be clustering centres. First, obtaining the sample point density according to the k-nearest neighbour samples and tapping the sample sorting of the distance to the maximum density point; then finding the turning position of density trends and determining the cut-off distance on the basis of the turning position; finally, in view of the density peak clustering algorithm, finding the data points which may be the centres of the cluster, comparing the similarity between them and determining the final clustering centres. The simulation results show that the improved algorithm proposed in this paper can automatically determine the cut-off distance, circle the centres, and make the clustering results become more accurate. In the end, this paper makes an empirical analysis on the stock of 147 bio pharmaceutical listed companies by using the improved algorithm, which provides a reliable basis for the classification and evaluation of listed companies. It has a wide range of applicability

    Improved Fruit Fly Optimization Algorithm-based density peak clustering and its applications

    Get PDF
    Kao algoritam temeljen na gustoći, algoritam grupiranja na osnovu najviše gustoće (Density Peak Clustering - DPC) superioran je u grupiranju pronalaženjem vršne gustoće. No, smanjena udaljenost i središta grupiranja trebaju se postaviti slučajno, što bi utjecalo na rezultate grupiranja. Voćne mušice pronalaze najbolju hranu lokalnim pretraživanjem i globalnim pretraživanjem. Pronađena hrana je ekstremna vrijednost parametra izračunata algoritmom optimizacije voćne mušice (Fruit Fly Optimization Algorithm - FOA). Na osnovu brze pretrage i superiornosti brze konvergencije FOA-e, moguće je nadoknaditi slučajnost DPC-a. Poboljšana vršna gustoća grupiranja voćnih mušica, temeljena na algoritmu optimizacije, predložena je kao FOA-DPC. Taj bi algoritam trebao biti efikasniji i učinkovitiji od DPC algoritma. Rezultati sedam simulacijskih eksperimenata na UCI nizovima podataka potvrdili su da predloženi algoritam nije imao samo bolju performansu grupiranja već je bio bliži pravim brojevima grupiranja. Nadalje, FOA-DPC primijenjen je i u analizi financijskih podataka i pokazao se vrlo učinkovitim.As density-based algorithm, Density Peak Clustering (DPC) algorithm has superiority of clustering by finding the density peaks. But the cut-off distance and clustering centres had to be set at random, which would influence clustering outcomes. Fruit flies find the best food by local searching and global searching. The food found was the parameter extreme value calculated by Fruit Fly Optimization Algorithm (FOA). Based on the rapid search and fast convergence superiorities of FOA, it is possible to make up the casualness of DPC. An improved fruit fly optimization-based density peak clustering algorithm was proposed as FOA-DPC. The FOA-DPC algorithm would be more efficient and effective than DPC algorithm. The results of seven simulation experiments in UCI data sets validated that the proposed algorithm did not only have better clustering performance, but also were closer to the true clustering numbers. Furthermore, FOA-DPC was applied to practical financial data analysis and the conclusion was also effective

    Multi-level algorithms for modularity clustering

    Full text link
    Modularity is one of the most widely used quality measures for graph clusterings. Maximizing modularity is NP-hard, and the runtime of exact algorithms is prohibitive for large graphs. A simple and effective class of heuristics coarsens the graph by iteratively merging clusters (starting from singletons), and optionally refines the resulting clustering by iteratively moving individual vertices between clusters. Several heuristics of this type have been proposed in the literature, but little is known about their relative performance. This paper experimentally compares existing and new coarsening- and refinement-based heuristics with respect to their effectiveness (achieved modularity) and efficiency (runtime). Concerning coarsening, it turns out that the most widely used criterion for merging clusters (modularity increase) is outperformed by other simple criteria, and that a recent algorithm by Schuetz and Caflisch is no improvement over simple greedy coarsening for these criteria. Concerning refinement, a new multi-level algorithm is shown to produce significantly better clusterings than conventional single-level algorithms. A comparison with published benchmark results and algorithm implementations shows that combinations of coarsening and multi-level refinement are competitive with the best algorithms in the literature.Comment: 12 pages, 10 figures, see http://www.informatik.tu-cottbus.de/~rrotta/ for downloading the graph clustering softwar

    Fast, scalable, Bayesian spike identification for multi-electrode arrays

    Get PDF
    We present an algorithm to identify individual neural spikes observed on high-density multi-electrode arrays (MEAs). Our method can distinguish large numbers of distinct neural units, even when spikes overlap, and accounts for intrinsic variability of spikes from each unit. As MEAs grow larger, it is important to find spike-identification methods that are scalable, that is, the computational cost of spike fitting should scale well with the number of units observed. Our algorithm accomplishes this goal, and is fast, because it exploits the spatial locality of each unit and the basic biophysics of extracellular signal propagation. Human intervention is minimized and streamlined via a graphical interface. We illustrate our method on data from a mammalian retina preparation and document its performance on simulated data consisting of spikes added to experimentally measured background noise. The algorithm is highly accurate

    MetAssign: probabilistic annotation of metabolites from LC–MS data using a Bayesian clustering approach

    Get PDF
    Motivation: The use of liquid chromatography coupled to mass spectrometry (LC–MS) has enabled the high-throughput profiling of the metabolite composition of biological samples. However, the large amount of data obtained can be difficult to analyse and often requires computational processing to understand which metabolites are present in a sample. This paper looks at the dual problem of annotating peaks in a sample with a metabolite, together with putatively annotating whether a metabolite is present in the sample. The starting point of the approach is a Bayesian clustering of peaks into groups, each corresponding to putative adducts and isotopes of a single metabolite.<p></p> Results: The Bayesian modelling introduced here combines information from the mass-to-charge ratio, retention time and intensity of each peak, together with a model of the inter-peak dependency structure, to increase the accuracy of peak annotation. The results inherently contain a quantitative estimate of confidence in the peak annotations and allow an accurate trade off between precision and recall. Extensive validation experiments using authentic chemical standards show that this system is able to produce more accurate putative identifications than other state-of-the-art systems, while at the same time giving a probabilistic measure of confidence in the annotations.<p></p> Availability: The software has been implemented as part of the mzMatch metabolomics analysis pipeline, which is available for download at http://mzmatch.sourceforge.net/

    Data clustering using a model granular magnet

    Full text link
    We present a new approach to clustering, based on the physical properties of an inhomogeneous ferromagnet. No assumption is made regarding the underlying distribution of the data. We assign a Potts spin to each data point and introduce an interaction between neighboring points, whose strength is a decreasing function of the distance between the neighbors. This magnetic system exhibits three phases. At very low temperatures it is completely ordered; all spins are aligned. At very high temperatures the system does not exhibit any ordering and in an intermediate regime clusters of relatively strongly coupled spins become ordered, whereas different clusters remain uncorrelated. This intermediate phase is identified by a jump in the order parameters. The spin-spin correlation function is used to partition the spins and the corresponding data points into clusters. We demonstrate on three synthetic and three real data sets how the method works. Detailed comparison to the performance of other techniques clearly indicates the relative success of our method.Comment: 46 pages, postscript, 15 ps figures include
    corecore