Search CORE

1,189 research outputs found

Adaptive Evolutionary Clustering

Author: AC Harvey
Alfred O. Hero III
DJ Fenn
GW Milligan
H Lütkepohl
H Ning
HW Kuhn
J Schäfer
J Shi
Kevin S. Xu
M Charikar
Mark Kliger
N Eagle
O Ledoit
PJ Mucha
S Haykin
S Tadepalli
T Hastie
T Yang
TW Anderson
U Luxburg von
Y Chen
Y Chi
YR Lin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

In many practical applications of clustering, the objects to be clustered evolve over time, and a clustering result is desired at each time step. In such applications, evolutionary clustering typically outperforms traditional static clustering by producing clustering results that reflect long-term trends while being robust to short-term variations. Several evolutionary clustering algorithms have recently been proposed, often by adding a temporal smoothness penalty to the cost function of a static clustering method. In this paper, we introduce a different approach to evolutionary clustering by accurately tracking the time-varying proximities between objects followed by static clustering. We present an evolutionary clustering framework that adaptively estimates the optimal smoothing parameter using shrinkage estimation, a statistical approach that improves a naive estimate using additional information. The proposed framework can be used to extend a variety of static clustering algorithms, including hierarchical, k-means, and spectral clustering, into evolutionary clustering algorithms. Experiments on synthetic and real data sets indicate that the proposed framework outperforms static clustering and existing evolutionary clustering algorithms in many scenarios.Comment: To appear in Data Mining and Knowledge Discovery, MATLAB toolbox available at http://tbayes.eecs.umich.edu/xukevin/affec

arXiv.org e-Print Archive

CiteSeerX

Crossref

3rd Workshop in Symbolic Data Analysis: book of abstracts

Author: Arroyo Gallardo Javier, ed.
Brito Paula, ed.
Maté Carlos, ed.
Noirhomme-Fraiture Monique, ed.
Publication venue
Publication date: 01/01/2012
Field of study

This workshop is the third regular meeting of researchers interested in Symbolic Data Analysis. The main aim of the event is to favor the meeting of people and the exchange of ideas from different fields - Mathematics, Statistics, Computer Science, Engineering, Economics, among others - that contribute to Symbolic Data Analysis

Docta Complutense

Preprocessing Solar Images while Preserving their Latent Structure

Author: Kashyap Vinay L
Stein Nathan M
van Dyk David A
Publication venue
Publication date: 10/12/2015
Field of study

Telescopes such as the Atmospheric Imaging Assembly aboard the Solar Dynamics Observatory, a NASA satellite, collect massive streams of high resolution images of the Sun through multiple wavelength filters. Reconstructing pixel-by-pixel thermal properties based on these images can be framed as an ill-posed inverse problem with Poisson noise, but this reconstruction is computationally expensive and there is disagreement among researchers about what regularization or prior assumptions are most appropriate. This article presents an image segmentation framework for preprocessing such images in order to reduce the data volume while preserving as much thermal information as possible for later downstream analyses. The resulting segmented images reflect thermal properties but do not depend on solving the ill-posed inverse problem. This allows users to avoid the Poisson inverse problem altogether or to tackle it on each of

\sim

10 segments rather than on each of

\sim

^7

pixels, reducing computing time by a factor of

\sim

^6

. We employ a parametric class of dissimilarities that can be expressed as cosine dissimilarity functions or Hellinger distances between nonlinearly transformed vectors of multi-passband observations in each pixel. We develop a decision theoretic framework for choosing the dissimilarity that minimizes the expected loss that arises when estimating identifiable thermal properties based on segmented images rather than on a pixel-by-pixel basis. We also examine the efficacy of different dissimilarities for recovering clusters in the underlying thermal properties. The expected losses are computed under scientifically motivated prior distributions. Two simulation studies guide our choices of dissimilarity function. We illustrate our method by segmenting images of a coronal hole observed on 26 February 2015

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

An exact CP approach for the cardinality-constrained euclidean minimum sum-of-squares clustering problem

Author: Aloise Daniel
Haouas Mohammed Najib
Pesant Gilles
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/09/2020
Field of study

Clustering consists in finding hidden groups from unlabeled data which are as homogeneous and well-separated as possible. Some contexts impose constraints on the clustering solutions such as restrictions on the size of each cluster, known as cardinality-constrained clustering. In this work we present an exact approach to solve the Cardinality-Constrained Euclidean Minimum Sum-of-Squares Clustering Problem. We take advantage of the structure of the problem to improve several aspects of previous constraint programming approaches: lower bounds, domain filtering, and branching. Computational experiments on benchmark instances taken from the literature confirm that our approach improves our solving capability over previously-proposed exact methods for this problem

PolyPublie

Component-Tree Simplification through Fast Alpha Cuts

Author: Wilkinson M.H.F.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/10/2022
Field of study

Tree-based hierarchical image representations are commonly used in connected morphological image filtering, segmentation and multi-scale analysis. In the case of component trees, filtering is generally based on thresholding single attributes computed for all the nodes in the tree. Alternatively, so-called shapings are used, which rely on building a component tree of a component tree to filter the image. Neither method is practical when using vector attributes. In this case, more complicated machine learning methods are required, including clustering methods. In this paper I present a simple, fast hierarchical clustering algorithm based on cuts of α-trees to simplify and filter component trees

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

An exploration of methodologies to improve semi-supervised hierarchical clustering with knowledge-based constraints

Author: Abeer Aljohani (1257537)
Publication venue
Publication date: 01/01/2019
Field of study

Clustering algorithms with constraints (also known as semi-supervised clustering algorithms) have been introduced to the field of machine learning as a significant variant to the conventional unsupervised clustering learning algorithms. They have been demonstrated to achieve better performance due to integrating prior knowledge during the clustering process, that enables uncovering relevant useful information from the data being clustered. However, the research conducted within the context of developing semi-supervised hierarchical clustering techniques are still an open and active investigation area. Majority of current semi-supervised clustering algorithms are developed as partitional clustering (PC) methods and only few research efforts have been made on developing semi-supervised hierarchical clustering methods. The aim of this research is to enhance hierarchical clustering (HC) algorithms based on prior knowledge, by adopting novel methodologies. [Continues.

Loughborough University Institutional Repository

From Binary to Multi-Class Divisions: improvements on Hierarchical Divisive Human Activity Recognition

Author: Tomás Vieira Caldas
Publication venue
Publication date: 12/07/2019
Field of study

Repositório Aberto da Universidade do Porto

Modelling and recognition of protein contact networks by multiple kernel learning and dissimilarity representations

Author: De Santis Enrico
Giuliani Alessandro
Martino Alessio
Rizzi Antonello
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

Multiple kernel learning is a paradigm which employs a properly constructed chain of kernel functions able to simultaneously analyse different data or different representations of the same data. In this paper, we propose an hybrid classification system based on a linear combination of multiple kernels defined over multiple dissimilarity spaces. The core of the training procedure is the joint optimisation of kernel weights and representatives selection in the dissimilarity spaces. This equips the system with a two-fold knowledge discovery phase: by analysing the weights, it is possible to check which representations are more suitable for solving the classification problem, whereas the pivotal patterns selected as representatives can give further insights on the modelled system, possibly with the help of field-experts. The proposed classification system is tested on real proteomic data in order to predict proteins' functional role starting from their folded structure: specifically, a set of eight representations are drawn from the graph-based protein folded description. The proposed multiple kernel-based system has also been benchmarked against a clustering-based classification system also able to exploit multiple dissimilarities simultaneously. Computational results show remarkable classification capabilities and the knowledge discovery analysis is in line with current biological knowledge, suggesting the reliability of the proposed system

Multidisciplinary Digital Publishing Institute

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Archivio della ricerca- Università di Roma La Sapienza