871 research outputs found

    A META CLUSTERING APPROACH FOR ENSEMBLE PROBLEM

    Get PDF
    A critical problem in cluster ensemble research is how to combine multiple clustering to yield a superior clustering result. Leveraging advanced graph partitioning techniques, we solve this problem by reducing it to a graph partitioning problem. We introduce a new reduction method that constructs a bipartite graph from a given cluster ensemble. The resulting graph models both instances and clusters of the ensemble simultaneously as vertices in the graph. Our approach retains all of the information provided by a given ensemble, allowing the similarity among instances and the similarity among clusters to be considered collectively in forming the clustering. Further, the resulting graph partitioning problem can be solved efficiently. We empirically evaluate the proposed approach against two commonly used graph formulations and show that it is more robust and achieves comparable or better performance in comparison to its competitors

    Qjets: A Non-Deterministic Approach to Tree-Based Jet Substructure

    Full text link
    Jet substructure is typically studied using clustering algorithms, such as kT, which arrange the jets' constituents into trees. Instead of considering a single tree per jet, we propose that multiple trees should be considered, weighted by an appropriate metric. Then each jet in each event produces a distribution for an observable, rather than a single value. Advantages of this approach include: 1) observables have significantly increased statistical stability; and, 2) new observables, such as the variance of the distribution, provide new handles for signal and background discrimination. For example, we find that employing a set of trees substantially reduces the observed fluctuations in the pruned mass distribution, enhancing the likelihood of new particle discovery for a given integrated luminosity. Furthermore, the resulting pruned mass distributions for (background) QCD jets are found to be substantially wider than that for (signal) jets with intrinsic mass scales, e.g. jets containing a W decay. A cut on this width yields a substantial enhancement in significance relative to a cut on the standard pruned jet mass alone. In particular the luminosity needed for a given significance requirement decreases by a factor of two relative to standard pruning.Comment: Minor changes to match journal versio

    How to Control Clustering Results?

    Get PDF
    One of the most important and challenging questions in the area of clustering is how to choose the best-fitting algorithm and parameterization to obtain an optimal clustering for the considered data. The clustering aggregation concept tries to bypass this problem by generating a set of separate, heterogeneous partitionings of the same data set, from which an aggregate clustering is derived. As of now, almost every existing aggregation approach combines given crisp clusterings on the basis of pair-wise similarities. In this paper, we regard an input set of soft clusterings and show that it contains additional information that is efficiently useable for the aggregation. Our approach introduces an expansion of mentioned pair-wise similarities, allowing control and adjustment of the aggregation process and its result. Our experiments show that our flexible approach offers adaptive results, improved identification of structures and high useability

    Combining states without scale hierarchies with ordered parton showers

    Full text link
    We present a parameter-free scheme to combine fixed-order multi-jet results with parton-shower evolution. The scheme produces jet cross sections with leading-order accuracy in the complete phase space of multiple emissions, resumming large logarithms when appropriate, while not arbitrarily enforcing ordering on momentum configurations beyond the reach of the parton-shower evolution equation. This requires the development of a matrix-element correction scheme for complex phase-spaces including ordering conditions as well as a systematic scale-setting procedure for unordered phase-space points. The resulting algorithm does not require a merging-scale parameter. We implement the new method in the Vincia framework and compare to LHC data.Comment: updated to version published in EPJ

    Image patch analysis of sunspots and active regions. II. Clustering via matrix factorization

    Full text link
    Separating active regions that are quiet from potentially eruptive ones is a key issue in Space Weather applications. Traditional classification schemes such as Mount Wilson and McIntosh have been effective in relating an active region large scale magnetic configuration to its ability to produce eruptive events. However, their qualitative nature prevents systematic studies of an active region's evolution for example. We introduce a new clustering of active regions that is based on the local geometry observed in Line of Sight magnetogram and continuum images. We use a reduced-dimension representation of an active region that is obtained by factoring the corresponding data matrix comprised of local image patches. Two factorizations can be compared via the definition of appropriate metrics on the resulting factors. The distances obtained from these metrics are then used to cluster the active regions. We find that these metrics result in natural clusterings of active regions. The clusterings are related to large scale descriptors of an active region such as its size, its local magnetic field distribution, and its complexity as measured by the Mount Wilson classification scheme. We also find that including data focused on the neutral line of an active region can result in an increased correspondence between our clustering results and other active region descriptors such as the Mount Wilson classifications and the RR value. We provide some recommendations for which metrics, matrix factorization techniques, and regions of interest to use to study active regions.Comment: Accepted for publication in the Journal of Space Weather and Space Climate (SWSC). 33 pages, 12 figure

    A novel ensemble clustering for operational transients classification with application to a nuclear power plant turbine

    Get PDF
    International audienceThe objective of the present work is to develop a novel approach for combining in an ensemble multiple base clusterings of operational transients of industrial equipment, when the number of clusters in the final consensus clustering is unknown. A measure of pairwise similarity is used to quantify the co-association matrix that describes the similarity among the different base clusterings. Then, a Spectral Clustering technique of literature, embedding the unsupervised K-Means algorithm, is applied to the co-association matrix for finding the optimum number of clusters of the final consensus clustering, based on Silhouette validity index calculation. The proposed approach is developed with reference to an artificial case study, properly designed to mimic the signal trend behavior of a Nuclear Power Plant (NPP) turbine during shutdown. The results of the artificial case have been compared with those achieved by a state-of-art approach, known as Cluster-based Similarity Partitioning and Serial Graph Partitioning and Fill-reducing Matrix Ordering Algorithms (CSPA-METIS). The comparison shows that the proposed approach is able to identify a final consensus clustering that classifies the transients with better accuracy and robustness compared to the CSPA-METIS approach. The approach is, then, validated on an industrial case concerning 149 shutdown transients of a NPP turbine

    Element-centric clustering comparison unifies overlaps and hierarchy

    Full text link
    Clustering is one of the most universal approaches for understanding complex data. A pivotal aspect of clustering analysis is quantitatively comparing clusterings; clustering comparison is the basis for many tasks such as clustering evaluation, consensus clustering, and tracking the temporal evolution of clusters. In particular, the extrinsic evaluation of clustering methods requires comparing the uncovered clusterings to planted clusterings or known metadata. Yet, as we demonstrate, existing clustering comparison measures have critical biases which undermine their usefulness, and no measure accommodates both overlapping and hierarchical clusterings. Here we unify the comparison of disjoint, overlapping, and hierarchically structured clusterings by proposing a new element-centric framework: elements are compared based on the relationships induced by the cluster structure, as opposed to the traditional cluster-centric philosophy. We demonstrate that, in contrast to standard clustering similarity measures, our framework does not suffer from critical biases and naturally provides unique insights into how the clusterings differ. We illustrate the strengths of our framework by revealing new insights into the organization of clusters in two applications: the improved classification of schizophrenia based on the overlapping and hierarchical community structure of fMRI brain networks, and the disentanglement of various social homophily factors in Facebook social networks. The universality of clustering suggests far-reaching impact of our framework throughout all areas of science
    • …
    corecore