Search CORE

3 research outputs found

Graph construction with condition-based weights for spectral clustering of hierarchical datasets

Author: Knoll Zsolt
Papp Dávid
Szűcs Gábor
Publication venue: 'Infocommunications Journal'
Publication date: 01/01/2020
Field of study

Most of the unsupervised machine learning algorithms focus on clustering the data based on similarity metrics, while ignoring other attributes, or perhaps other type of connections between the data points. In case of hierarchical datasets, groups of points (point-sets) can be defined according to the hierarchy system. Our goal was to develop such spectral clustering approach that preserves the structure of the dataset throughout the clustering procedure. The main contribution of this paper is a set of conditions for weighted graph construction used in spectral clustering. Following the requirements – given by the set of conditions – ensures that the hierarchical formation of the dataset remains unchanged, and therefore the clustering of data points imply the clustering of point-sets as well. The proposed spectral clustering algorithm was tested on three datasets, the results were compared to baseline methods and it can be concluded the algorithm with the proposed conditions always preserves the hierarchy structure

Repository of the Academy's Library

Diffusion maps for Lagrangian trajectory data unravel coherent sets

Author: Banisch Ralf
Koltai Péter
Publication venue
Publication date: 01/01/2017
Field of study

Dynamical systems often exhibit the emergence of long-lived coherent sets, which are regions in state space that keep their geometric integrity to a high extent and thus play an important role in transport. In this article, we provide a method for extracting coherent sets from possibly sparse Lagrangian trajectory data. Our method can be seen as an extension of diffusion maps to trajectory space, and it allows us to construct “dynamical coordinates,” which reveal the intrinsic low-dimensional organization of the data with respect to transport. The only a priori knowledge about the dynamics that we require is a locally valid notion of distance, which renders our method highly suitable for automated data analysis. We show convergence of our method to the analytic transfer operator framework of coherence in the infinite data limit and illustrate its potential on several two- and three-dimensional examples as well as real world data. One aspect of the coexistence of regular structures and chaos in many dynamical systems is the emergence of coherent sets: If we place a large number of passive tracers in a coherent set at some initial time, then macroscopically they perform a collective motion and stay close together for a long period of time, while their surrounding can mix chaotically. Natural examples are moving vortices in atmospheric or oceanographic flows. In this article, we propose a method for extracting coherent sets from possibly sparse Lagrangian trajectory data. This is done by constructing a random walk on the data points that captures both the inherent time-ordering of the data and the idea of closeness in space, which is at the heart of coherence. In the rich data limit, we can show equivalence to the well-established functional-analytic framework of coherent sets. One output of our method are “dynamical coordinates,” which reveal the intrinsic low- dimensional transport-based organization of the data

Institutional Repository of the Freie Universität Berlin

Fast Large-Scale Spectral Clustering by Sequential Shrinkage Optimization

Author
Publication venue
Publication date
Field of study

In many applications, we need to cluster largescale data objects. However, some recently proposed clustering algorithms such as spectral clustering can hardly handle large-scale applications due to the complexity issue, although their effectiveness has been demonstrated in many previous work. In this paper, we propose a fast solver for spectral clustering. In contrast to traditional spectral clustering algorithms that first solve an eigenvalue decomposition problem, and then employ a clustering heuristic to obtain labels for the data points, our new approach sequentially decides the labels of relatively well-separated data points. Because the scale of the problem shrinks quickly during this process, it can be much faster than the traditional methods. Experiments on both synthetic data and a large collection of product records show that our algorithm can achieve significant improvement in speed as compared to traditional spectral clustering algorithms. 1

CiteSeerX