Search CORE

1,344 research outputs found

A Clustering Algorithm Based on an Ensemble of Dissimilarities: An Application in the Bioinformatics Domain

Author: Alonso Vidal
Ferreras Antonio
López Rivero Alfonso José
Martín Merino Manuel
Vallejo Marcelo
Publication venue: 'Universidad Internacional de La Rioja'
Publication date: 13/12/2022
Field of study

Clustering algorithms such as k-means depend heavily on choosing an appropriate distance metric that reflect accurately the object proximities. A wide range of dissimilarities may be defined that often lead to different clustering results. Choosing the best dissimilarity is an ill-posed problem and learning a general distance from the data is a complex task, particularly for high dimensional problems. Therefore, an appealing approach is to learn an ensemble of dissimilarities. In this paper, we have developed a semi-supervised clustering algorithm that learns a linear combination of dissimilarities considering incomplete knowledge in the form of pairwise constraints. The minimization of the loss function is based on a robust and efficient quadratic optimization algorithm. Besides, a regularization term is considered that controls the complexity of the distance metric learned avoiding overfitting. The algorithm has been applied to the identification of tumor samples using the gene expression profiles, where domain experts provide often incomplete knowledge in the form of pairwise constraints. We report that the algorithm proposed outperforms a standard semi-supervised clustering technique available in the literature and clustering results based on a single dissimilarity. The improvement is particularly relevant for applications with high level of noise

Image patch analysis of sunspots and active regions. II. Clustering via matrix factorization

Author: De Visscher Ruben
Delouille Veronique
Hero III Alfred O.
Li Jimmy J.
Moon Kevin R.
Watson Fraser
Publication venue: 'EDP Sciences'
Publication date: 10/12/2015
Field of study

Separating active regions that are quiet from potentially eruptive ones is a key issue in Space Weather applications. Traditional classification schemes such as Mount Wilson and McIntosh have been effective in relating an active region large scale magnetic configuration to its ability to produce eruptive events. However, their qualitative nature prevents systematic studies of an active region's evolution for example. We introduce a new clustering of active regions that is based on the local geometry observed in Line of Sight magnetogram and continuum images. We use a reduced-dimension representation of an active region that is obtained by factoring the corresponding data matrix comprised of local image patches. Two factorizations can be compared via the definition of appropriate metrics on the resulting factors. The distances obtained from these metrics are then used to cluster the active regions. We find that these metrics result in natural clusterings of active regions. The clusterings are related to large scale descriptors of an active region such as its size, its local magnetic field distribution, and its complexity as measured by the Mount Wilson classification scheme. We also find that including data focused on the neutral line of an active region can result in an increased correspondence between our clustering results and other active region descriptors such as the Mount Wilson classifications and the

R

value. We provide some recommendations for which metrics, matrix factorization techniques, and regions of interest to use to study active regions.Comment: Accepted for publication in the Journal of Space Weather and Space Climate (SWSC). 33 pages, 12 figure

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Recommended from our members

A quantum geometric model of similarity

Author: Busemeyer J. R.
Pothos E. M.
Trueblood J. S.
Publication venue: 'American Psychological Association (APA)'
Publication date: 01/01/2013
Field of study

No other study has had as great an impact on the development of the similarity literature as that of Tversky (1977), which provided compelling demonstrations against all the fundamental assumptions of the popular, and extensively employed, geometric similarity models. Notably, similarity judgments were shown to violate symmetry and the triangle inequality, and also be subject to context effects, so that the same pair of items would be rated differently, depending on the presence of other items. Quantum theory provides a generalized geometric approach to similarity and can address several of Tversky’s (1997) main findings. Similarity is modeled as quantum probability, so that asymmetries emerge as order effects, and the triangle equality violations and the diagnosticity effect can be related to the context-dependent properties of quantum probability. We so demonstrate the promise of the quantum approach for similarity and discuss the implications for representation theory in general

City Research Online

Procrustes Analysis of Truncated Least Squares Multidimensional Scaling

Author: Boryczko Krzysztof
Dzwinel Witold
Kurdziel Marcin
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 30/01/2013
Field of study

Multidimensional Scaling (MDS) is an important class of techniques for embedding sets of patterns in Euclidean space. Most often it is used to visualize in mathbbR3 multidimensional data sets or data sets given by dissimilarity measures that are not distance metrics. Unfortunately, embedding n patterns with MDS involves processing O(n2) pairwise pattern dissimilarities, making MDS computationally demanding for large data sets. Especially in Least Squares MDS (LS-MDS) methods, that proceed by finding a minimum of a multimodal stress function, computational cost is a limiting factor. Several works therefore explored approximate MDS techniques that are less computationally expensive. These approximate methods were evaluated in terms of correlation between Euclidean distances in the embedding and the pattern dissimilarities or value of the stress function. We employ Procrustes Analysis to directly quantify differences between embeddings constructed with an approximate LS-MDS method and embeddings constructed with exact LS-MDS. We then compare our findings to the results of classical analysis, i.e. that based on stress value and correlation between Euclidean distances and pattern dissimilarities. Our results demonstrate that small changes in stress value or correlation coefficient can translate to large differences between embeddings. The differences can be attributed not only to the inevitable variability resulting from the multimodality of the stress function but also to the approximation errors. These results show that approximation may have larger impact on MDS than what was thus far revealed by analyses of stress value and correlation between Euclidean distances and pattern dissimilarities

Beyond pairwise clustering

Author: Agarwal Sameer
Belongie Serge
Kriegman David
Lim Jongwoo
Perona Pietro
Zelnik-Manor Lihi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

We consider the problem of clustering in domains where the affinity relations are not dyadic (pairwise), but rather triadic, tetradic or higher. The problem is an instance of the hypergraph partitioning problem. We propose a two-step algorithm for solving this problem. In the first step we use a novel scheme to approximate the hypergraph using a weighted graph. In the second step a spectral partitioning algorithm is used to partition the vertices of this graph. The algorithm is capable of handling hyperedges of all orders including order two, thus incorporating information of all orders simultaneously. We present a theoretical analysis that relates our algorithm to an existing hypergraph partitioning algorithm and explain the reasons for its superior performance. We report the performance of our algorithm on a variety of computer vision problems and compare it to several existing hypergraph partitioning algorithms

CiteSeerX

Caltech Authors