1,189 research outputs found
Adaptive Evolutionary Clustering
In many practical applications of clustering, the objects to be clustered
evolve over time, and a clustering result is desired at each time step. In such
applications, evolutionary clustering typically outperforms traditional static
clustering by producing clustering results that reflect long-term trends while
being robust to short-term variations. Several evolutionary clustering
algorithms have recently been proposed, often by adding a temporal smoothness
penalty to the cost function of a static clustering method. In this paper, we
introduce a different approach to evolutionary clustering by accurately
tracking the time-varying proximities between objects followed by static
clustering. We present an evolutionary clustering framework that adaptively
estimates the optimal smoothing parameter using shrinkage estimation, a
statistical approach that improves a naive estimate using additional
information. The proposed framework can be used to extend a variety of static
clustering algorithms, including hierarchical, k-means, and spectral
clustering, into evolutionary clustering algorithms. Experiments on synthetic
and real data sets indicate that the proposed framework outperforms static
clustering and existing evolutionary clustering algorithms in many scenarios.Comment: To appear in Data Mining and Knowledge Discovery, MATLAB toolbox
available at http://tbayes.eecs.umich.edu/xukevin/affec
3rd Workshop in Symbolic Data Analysis: book of abstracts
This workshop is the third regular meeting of researchers interested in Symbolic Data Analysis. The main aim of the
event is to favor the meeting of people and the exchange of ideas from different fields - Mathematics, Statistics, Computer Science, Engineering, Economics, among others - that contribute to Symbolic Data Analysis
Preprocessing Solar Images while Preserving their Latent Structure
Telescopes such as the Atmospheric Imaging Assembly aboard the Solar Dynamics
Observatory, a NASA satellite, collect massive streams of high resolution
images of the Sun through multiple wavelength filters. Reconstructing
pixel-by-pixel thermal properties based on these images can be framed as an
ill-posed inverse problem with Poisson noise, but this reconstruction is
computationally expensive and there is disagreement among researchers about
what regularization or prior assumptions are most appropriate. This article
presents an image segmentation framework for preprocessing such images in order
to reduce the data volume while preserving as much thermal information as
possible for later downstream analyses. The resulting segmented images reflect
thermal properties but do not depend on solving the ill-posed inverse problem.
This allows users to avoid the Poisson inverse problem altogether or to tackle
it on each of 10 segments rather than on each of 10 pixels,
reducing computing time by a factor of 10. We employ a parametric
class of dissimilarities that can be expressed as cosine dissimilarity
functions or Hellinger distances between nonlinearly transformed vectors of
multi-passband observations in each pixel. We develop a decision theoretic
framework for choosing the dissimilarity that minimizes the expected loss that
arises when estimating identifiable thermal properties based on segmented
images rather than on a pixel-by-pixel basis. We also examine the efficacy of
different dissimilarities for recovering clusters in the underlying thermal
properties. The expected losses are computed under scientifically motivated
prior distributions. Two simulation studies guide our choices of dissimilarity
function. We illustrate our method by segmenting images of a coronal hole
observed on 26 February 2015
An exact CP approach for the cardinality-constrained euclidean minimum sum-of-squares clustering problem
Clustering consists in finding hidden groups from unlabeled data which are as homogeneous and well-separated as possible. Some contexts impose constraints on the clustering solutions such as restrictions on the size of each cluster, known as cardinality-constrained clustering. In this work we present an exact approach to solve the Cardinality-Constrained Euclidean Minimum Sum-of-Squares Clustering Problem. We take advantage of the structure of the problem to improve several aspects of previous constraint programming approaches: lower bounds, domain filtering, and branching. Computational experiments on benchmark instances taken from the literature confirm that our approach improves our solving capability over previously-proposed exact methods for this problem
Component-Tree Simplification through Fast Alpha Cuts
Tree-based hierarchical image representations are commonly used in connected morphological image filtering, segmentation and multi-scale analysis. In the case of component trees, filtering is generally based on thresholding single attributes computed for all the nodes in the tree. Alternatively, so-called shapings are used, which rely on building a component tree of a component tree to filter the image. Neither method is practical when using vector attributes. In this case, more complicated machine learning methods are required, including clustering methods. In this paper I present a simple, fast hierarchical clustering algorithm based on cuts of α-trees to simplify and filter component trees
An exploration of methodologies to improve semi-supervised hierarchical clustering with knowledge-based constraints
Clustering algorithms with constraints (also known as semi-supervised clustering algorithms) have been introduced to the field of machine learning as a significant variant to the conventional unsupervised clustering learning algorithms. They have been demonstrated to achieve better performance due to integrating prior knowledge during the clustering process, that enables uncovering relevant useful information from the data being clustered. However, the research conducted within the context of developing semi-supervised hierarchical clustering techniques are still an open and active investigation area. Majority of current semi-supervised clustering algorithms are developed as partitional clustering (PC) methods and only few research efforts have been made on developing semi-supervised hierarchical clustering methods. The aim of this research is to enhance hierarchical clustering (HC) algorithms based on prior knowledge, by adopting novel methodologies. [Continues.
Modelling and recognition of protein contact networks by multiple kernel learning and dissimilarity representations
Multiple kernel learning is a paradigm which employs a properly constructed chain of kernel functions able to simultaneously analyse different data or different representations of the same data. In this paper, we propose an hybrid classification system based on a linear combination of multiple kernels defined over multiple dissimilarity spaces. The core of the training procedure is the joint optimisation of kernel weights and representatives selection in the dissimilarity spaces. This equips the system with a two-fold knowledge discovery phase: by analysing the weights, it is possible to check which representations are more suitable for solving the classification problem, whereas the pivotal patterns selected as representatives can give further insights on the modelled system, possibly with the help of field-experts. The proposed classification system is tested on real proteomic data in order to predict proteins' functional role starting from their folded structure: specifically, a set of eight representations are drawn from the graph-based protein folded description. The proposed multiple kernel-based system has also been benchmarked against a clustering-based classification system also able to exploit multiple dissimilarities simultaneously. Computational results show remarkable classification capabilities and the knowledge discovery analysis is in line with current biological knowledge, suggesting the reliability of the proposed system
- …