871 research outputs found
A META CLUSTERING APPROACH FOR ENSEMBLE PROBLEM
A critical problem in cluster ensemble research is how to combine multiple clustering to yield a superior clustering result. Leveraging advanced graph partitioning techniques, we solve this problem by reducing it to a graph partitioning problem. We introduce a new reduction method that constructs a bipartite graph from a given cluster ensemble. The resulting graph models both instances and clusters of the ensemble simultaneously as vertices in the graph. Our approach retains all of the information provided by a given ensemble, allowing the similarity among instances and the similarity among clusters to be considered collectively in forming the clustering. Further, the resulting graph partitioning problem can be solved efficiently. We empirically evaluate the proposed approach against two commonly used graph formulations and show that it is more robust and achieves comparable or better performance in comparison to its competitors
Qjets: A Non-Deterministic Approach to Tree-Based Jet Substructure
Jet substructure is typically studied using clustering algorithms, such as
kT, which arrange the jets' constituents into trees. Instead of considering a
single tree per jet, we propose that multiple trees should be considered,
weighted by an appropriate metric. Then each jet in each event produces a
distribution for an observable, rather than a single value. Advantages of this
approach include: 1) observables have significantly increased statistical
stability; and, 2) new observables, such as the variance of the distribution,
provide new handles for signal and background discrimination. For example, we
find that employing a set of trees substantially reduces the observed
fluctuations in the pruned mass distribution, enhancing the likelihood of new
particle discovery for a given integrated luminosity. Furthermore, the
resulting pruned mass distributions for (background) QCD jets are found to be
substantially wider than that for (signal) jets with intrinsic mass scales,
e.g. jets containing a W decay. A cut on this width yields a substantial
enhancement in significance relative to a cut on the standard pruned jet mass
alone. In particular the luminosity needed for a given significance requirement
decreases by a factor of two relative to standard pruning.Comment: Minor changes to match journal versio
How to Control Clustering Results?
One of the most important and challenging questions in the area of clustering is how to choose the best-fitting algorithm and parameterization to obtain an optimal clustering for the considered data. The clustering aggregation concept tries to bypass this problem by generating a set of separate, heterogeneous partitionings of the same data set, from which an aggregate clustering is derived. As of now, almost every existing aggregation approach combines given crisp clusterings on the basis of pair-wise similarities. In this paper, we regard an input set of soft clusterings and show that it contains additional information that is efficiently useable for the aggregation. Our approach introduces an expansion of mentioned pair-wise similarities, allowing control and adjustment of the aggregation process and its result. Our experiments show that our flexible approach offers adaptive results, improved identification of structures and high useability
Combining states without scale hierarchies with ordered parton showers
We present a parameter-free scheme to combine fixed-order multi-jet results
with parton-shower evolution. The scheme produces jet cross sections with
leading-order accuracy in the complete phase space of multiple emissions,
resumming large logarithms when appropriate, while not arbitrarily enforcing
ordering on momentum configurations beyond the reach of the parton-shower
evolution equation. This requires the development of a matrix-element
correction scheme for complex phase-spaces including ordering conditions as
well as a systematic scale-setting procedure for unordered phase-space points.
The resulting algorithm does not require a merging-scale parameter. We
implement the new method in the Vincia framework and compare to LHC data.Comment: updated to version published in EPJ
Image patch analysis of sunspots and active regions. II. Clustering via matrix factorization
Separating active regions that are quiet from potentially eruptive ones is a
key issue in Space Weather applications. Traditional classification schemes
such as Mount Wilson and McIntosh have been effective in relating an active
region large scale magnetic configuration to its ability to produce eruptive
events. However, their qualitative nature prevents systematic studies of an
active region's evolution for example. We introduce a new clustering of active
regions that is based on the local geometry observed in Line of Sight
magnetogram and continuum images. We use a reduced-dimension representation of
an active region that is obtained by factoring the corresponding data matrix
comprised of local image patches. Two factorizations can be compared via the
definition of appropriate metrics on the resulting factors. The distances
obtained from these metrics are then used to cluster the active regions. We
find that these metrics result in natural clusterings of active regions. The
clusterings are related to large scale descriptors of an active region such as
its size, its local magnetic field distribution, and its complexity as measured
by the Mount Wilson classification scheme. We also find that including data
focused on the neutral line of an active region can result in an increased
correspondence between our clustering results and other active region
descriptors such as the Mount Wilson classifications and the value. We
provide some recommendations for which metrics, matrix factorization
techniques, and regions of interest to use to study active regions.Comment: Accepted for publication in the Journal of Space Weather and Space
Climate (SWSC). 33 pages, 12 figure
A novel ensemble clustering for operational transients classification with application to a nuclear power plant turbine
International audienceThe objective of the present work is to develop a novel approach for combining in an ensemble multiple base clusterings of operational transients of industrial equipment, when the number of clusters in the final consensus clustering is unknown. A measure of pairwise similarity is used to quantify the co-association matrix that describes the similarity among the different base clusterings. Then, a Spectral Clustering technique of literature, embedding the unsupervised K-Means algorithm, is applied to the co-association matrix for finding the optimum number of clusters of the final consensus clustering, based on Silhouette validity index calculation. The proposed approach is developed with reference to an artificial case study, properly designed to mimic the signal trend behavior of a Nuclear Power Plant (NPP) turbine during shutdown. The results of the artificial case have been compared with those achieved by a state-of-art approach, known as Cluster-based Similarity Partitioning and Serial Graph Partitioning and Fill-reducing Matrix Ordering Algorithms (CSPA-METIS). The comparison shows that the proposed approach is able to identify a final consensus clustering that classifies the transients with better accuracy and robustness compared to the CSPA-METIS approach. The approach is, then, validated on an industrial case concerning 149 shutdown transients of a NPP turbine
Element-centric clustering comparison unifies overlaps and hierarchy
Clustering is one of the most universal approaches for understanding complex
data. A pivotal aspect of clustering analysis is quantitatively comparing
clusterings; clustering comparison is the basis for many tasks such as
clustering evaluation, consensus clustering, and tracking the temporal
evolution of clusters. In particular, the extrinsic evaluation of clustering
methods requires comparing the uncovered clusterings to planted clusterings or
known metadata. Yet, as we demonstrate, existing clustering comparison measures
have critical biases which undermine their usefulness, and no measure
accommodates both overlapping and hierarchical clusterings. Here we unify the
comparison of disjoint, overlapping, and hierarchically structured clusterings
by proposing a new element-centric framework: elements are compared based on
the relationships induced by the cluster structure, as opposed to the
traditional cluster-centric philosophy. We demonstrate that, in contrast to
standard clustering similarity measures, our framework does not suffer from
critical biases and naturally provides unique insights into how the clusterings
differ. We illustrate the strengths of our framework by revealing new insights
into the organization of clusters in two applications: the improved
classification of schizophrenia based on the overlapping and hierarchical
community structure of fMRI brain networks, and the disentanglement of various
social homophily factors in Facebook social networks. The universality of
clustering suggests far-reaching impact of our framework throughout all areas
of science
- …