Search CORE

31,496 research outputs found

Efficient Correlation Clustering Methods for Large Consensus Clustering Instances

Author: Cordner Nathan
Kollios George
Publication venue
Publication date: 07/07/2023
Field of study

Consensus clustering (or clustering aggregation) inputs

k

partitions of a given ground set

V

, and seeks to create a single partition that minimizes disagreement with all input partitions. State-of-the-art algorithms for consensus clustering are based on correlation clustering methods like the popular Pivot algorithm. Unfortunately these methods have not proved to be practical for consensus clustering instances where either

k

V

gets large. In this paper we provide practical run time improvements for correlation clustering solvers when

V

is large. We reduce the time complexity of Pivot from

O(|V|^2 k)

O(|V| k)

, and its space complexity from

O(|V|^2)

O(|V| k)

-- a significant savings since in practice

k

is much less than

|V|

. We also analyze a sampling method for these algorithms when

k

is large, bridging the gap between running Pivot on the full set of input partitions (an expected 1.57-approximation) and choosing a single input partition at random (an expected 2-approximation). We show experimentally that algorithms like Pivot do obtain quality clustering results in practice even on small samples of input partitions

arXiv.org e-Print Archive

Unifying Sparsest Cut, Cluster Deletion, and Modularity Clustering Objectives with Correlation Clustering

Author: Gleich David
Veldt Nate
Wirth Anthony
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Graph clustering, or community detection, is the task of identifying groups of closely related objects in a large network. In this paper we introduce a new community-detection framework called LambdaCC that is based on a specially weighted version of correlation clustering. A key component in our methodology is a clustering resolution parameter,

\lambda

, which implicitly controls the size and structure of clusters formed by our framework. We show that, by increasing this parameter, our objective effectively interpolates between two different strategies in graph clustering: finding a sparse cut and forming dense subgraphs. Our methodology unifies and generalizes a number of other important clustering quality functions including modularity, sparsest cut, and cluster deletion, and places them all within the context of an optimization problem that has been well studied from the perspective of approximation algorithms. Our approach is particularly relevant in the regime of finding dense clusters, as it leads to a 2-approximation for the cluster deletion problem. We use our approach to cluster several graphs, including large collaboration networks and social networks

arXiv.org e-Print Archive

University of Melbourne Institutional Repository

Categorical Dimensions of Human Odor Descriptor Space Revealed by Non-Negative Matrix Factorization

Author: A Arzi
A Dravnieks
A Mamlouk
AA Koulakov
AG Khan
Andreas Schaefer
Arvind Ramanathan
Chakra S. Chennubhotla
CI Bargmann
DD Lee
G Hinton
G Laurent
H Lapid
J Niessing
JA Gottfried
Jason B. Castro
JE Amoore
JE Amoore
JP Brunet
L van der Maaten
M Berry
M Zarzo
M Zarzo
P Lennie
P Paatero
P Paatero
PM Kim
PM Wise
R Haddad
RB Lotto
RM Khan
SS Schiffman
SS Schiffman
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

In contrast to most other sensory modalities, the basic perceptual dimensions of olfaction remain unclear. Here, we use non-negative matrix factorization (NMF) – a dimensionality reduction technique – to uncover structure in a panel of odor profiles, with each odor defined as a point in multi-dimensional descriptor space. The properties of NMF are favorable for the analysis of such lexical and perceptual data, and lead to a high-dimensional account of odor space. We further provide evidence that odor dimensions apply categorically. That is, odor space is not occupied homogenously, but rather in a discrete and intrinsically clustered manner. We discuss the potential implications of these results for the neural coding of odors, as well as for developing classifiers on larger datasets that may be useful for predicting perceptual qualities from chemical structures

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

D-Scholarship@Pitt

FigShare

A Bayesian alternative to mutual information for the hierarchical clustering of dependent random variables

Author: Bellec Pierre
Marrelec Guillaume
Messé Arnaud
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

The use of mutual information as a similarity measure in agglomerative hierarchical clustering (AHC) raises an important issue: some correction needs to be applied for the dimensionality of variables. In this work, we formulate the decision of merging dependent multivariate normal variables in an AHC procedure as a Bayesian model comparison. We found that the Bayesian formulation naturally shrinks the empirical covariance matrix towards a matrix set a priori (e.g., the identity), provides an automated stopping rule, and corrects for dimensionality using a term that scales up the measure as a function of the dimensionality of the variables. Also, the resulting log Bayes factor is asymptotically proportional to the plug-in estimate of mutual information, with an additive correction for dimensionality in agreement with the Bayesian information criterion. We investigated the behavior of these Bayesian alternatives (in exact and asymptotic forms) to mutual information on simulated and real data. An encouraging result was first derived on simulations: the hierarchical clustering based on the log Bayes factor outperformed off-the-shelf clustering techniques as well as raw and normalized mutual information in terms of classification accuracy. On a toy example, we found that the Bayesian approaches led to results that were similar to those of mutual information clustering techniques, with the advantage of an automated thresholding. On real functional magnetic resonance imaging (fMRI) datasets measuring brain activity, it identified clusters consistent with the established outcome of standard procedures. On this application, normalized mutual information had a highly atypical behavior, in the sense that it systematically favored very large clusters. These initial experiments suggest that the proposed Bayesian alternatives to mutual information are a useful new tool for hierarchical clustering

arXiv.org e-Print Archive

CiteSeerX

HAL-Inserm

Directory of Open Access Journals

PubMed Central

FigShare