1,633 research outputs found
Adaptive Evolutionary Clustering
In many practical applications of clustering, the objects to be clustered
evolve over time, and a clustering result is desired at each time step. In such
applications, evolutionary clustering typically outperforms traditional static
clustering by producing clustering results that reflect long-term trends while
being robust to short-term variations. Several evolutionary clustering
algorithms have recently been proposed, often by adding a temporal smoothness
penalty to the cost function of a static clustering method. In this paper, we
introduce a different approach to evolutionary clustering by accurately
tracking the time-varying proximities between objects followed by static
clustering. We present an evolutionary clustering framework that adaptively
estimates the optimal smoothing parameter using shrinkage estimation, a
statistical approach that improves a naive estimate using additional
information. The proposed framework can be used to extend a variety of static
clustering algorithms, including hierarchical, k-means, and spectral
clustering, into evolutionary clustering algorithms. Experiments on synthetic
and real data sets indicate that the proposed framework outperforms static
clustering and existing evolutionary clustering algorithms in many scenarios.Comment: To appear in Data Mining and Knowledge Discovery, MATLAB toolbox
available at http://tbayes.eecs.umich.edu/xukevin/affec
Nonparametric Feature Extraction from Dendrograms
We propose feature extraction from dendrograms in a nonparametric way. The
Minimax distance measures correspond to building a dendrogram with single
linkage criterion, with defining specific forms of a level function and a
distance function over that. Therefore, we extend this method to arbitrary
dendrograms. We develop a generalized framework wherein different distance
measures can be inferred from different types of dendrograms, level functions
and distance functions. Via an appropriate embedding, we compute a vector-based
representation of the inferred distances, in order to enable many numerical
machine learning algorithms to employ such distances. Then, to address the
model selection problem, we study the aggregation of different dendrogram-based
distances respectively in solution space and in representation space in the
spirit of deep representations. In the first approach, for example for the
clustering problem, we build a graph with positive and negative edge weights
according to the consistency of the clustering labels of different objects
among different solutions, in the context of ensemble methods. Then, we use an
efficient variant of correlation clustering to produce the final clusters. In
the second approach, we investigate the sequential combination of different
distances and features sequentially in the spirit of multi-layered
architectures to obtain the final features. Finally, we demonstrate the
effectiveness of our approach via several numerical studies
An overview of clustering methods with guidelines for application in mental health research
Cluster analyzes have been widely used in mental health research to decompose inter-individual heterogeneity
by identifying more homogeneous subgroups of individuals. However, despite advances in new algorithms and
increasing popularity, there is little guidance on model choice, analytical framework and reporting requirements.
In this paper, we aimed to address this gap by introducing the philosophy, design, advantages/disadvantages and
implementation of major algorithms that are particularly relevant in mental health research. Extensions of basic
models, such as kernel methods, deep learning, semi-supervised clustering, and clustering ensembles are subsequently
introduced. How to choose algorithms to address common issues as well as methods for pre-clustering
data processing, clustering evaluation and validation are then discussed. Importantly, we also provide general
guidance on clustering workflow and reporting requirements. To facilitate the implementation of different algorithms,
we provide information on R functions and librarie
Learning the optimal scale for GWAS through hierarchical SNP aggregation
Motivation: Genome-Wide Association Studies (GWAS) seek to identify causal
genomic variants associated with rare human diseases. The classical statistical
approach for detecting these variants is based on univariate hypothesis
testing, with healthy individuals being tested against affected individuals at
each locus. Given that an individual's genotype is characterized by up to one
million SNPs, this approach lacks precision, since it may yield a large number
of false positives that can lead to erroneous conclusions about genetic
associations with the disease. One way to improve the detection of true genetic
associations is to reduce the number of hypotheses to be tested by grouping
SNPs. Results: We propose a dimension-reduction approach which can be applied
in the context of GWAS by making use of the haplotype structure of the human
genome. We compare our method with standard univariate and multivariate
approaches on both synthetic and real GWAS data, and we show that reducing the
dimension of the predictor matrix by aggregating SNPs gives a greater precision
in the detection of associations between the phenotype and genomic regions
Convergence and Divergence among Technology Clubs
The paper investigates cross-country differences in technology in a large sample of developed and developing economies over the 1990s. The empirical analysis indicates the existence of three technology clubs with markedly different levels of technological development: advanced, followers and marginalized countries. The technology clubs also differ with respect to their dynamics over the 1990s. While the club of followers is characterized by a process of gradual convergence towards the technological frontier, the group of marginalized has experienced an increase in its gap in terms of innovative capabilities.Growth and development; technological change; convergence clubs; polarization
- …