7 research outputs found
Spectral Clustering: An Empirical Study of Approximation Algorithms and its Application to the Attrition Problem
Clustering is the problem of separating a set of objects into groups (called clusters) so that objects within the same cluster are more similar to each other than to those in different clusters. Spectral clustering is a now well-known method for clustering which utilizes the spectrum of the data similarity matrix to perform this separation. Since the method relies on solving an eigenvector problem, it is computationally expensive for large datasets. To overcome this constraint, approximation methods have been developed which aim to reduce running time while maintaining accurate classification. In this article, we summarize and experimentally evaluate several approximation methods for spectral clustering. From an applications standpoint, we employ spectral clustering to solve the so-called attrition problem, where one aims to identify from a set of employees those who are likely to voluntarily leave the company from those who are not. Our study sheds light on the empirical performance of existing approximate spectral clustering methods and shows the applicability of these methods in an important business optimization related problem
A Review on Data Clustering Algorithms for Mixed Data
Clustering is the unsupervised classification of patterns into groups (clusters). The clustering problem has been addressed in many contexts
and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. In general, clustering is a
method of dividing the data into groups of similar objects. One of significant research areas in data mining is to develop methods to modernize knowledge by using
the existing knowledge, since it can generally augment mining efficiency, especially for very bulky database. Data mining uncovers hidden, previously unknown,
and potentially useful information from large amounts of data. This paper presents a general survey of various clustering algorithms. In addition, the paper also
describes the efficiency of Self-Organized Map (SOM) algorithm in enhancing the mixed data clustering
Applications of a Graph Theoretic Based Clustering Framework in Computer Vision and Pattern Recognition
Recently, several clustering algorithms have been used to solve variety of
problems from different discipline. This dissertation aims to address different
challenging tasks in computer vision and pattern recognition by casting the
problems as a clustering problem. We proposed novel approaches to solve
multi-target tracking, visual geo-localization and outlier detection problems
using a unified underlining clustering framework, i.e., dominant set clustering
and its extensions, and presented a superior result over several
state-of-the-art approaches.Comment: doctoral dissertatio
Efficient out-of-sample extension of dominant-set clusters
Dominant sets are a new graph-theoretic concept that has proven to be relevant in pairwise data clustering problems, such as image segmentation. They generalize the notion of a maximal clique to edgeweighted graphs and have intriguing, non-trivial connections to continuous quadratic optimization and spectral-based grouping. We address the problem of grouping out-of-sample examples after the clustering process has taken place. This may serve either to drastically reduce the computational burden associated to the processing of very large data sets, or to efficiently deal with dynamic situations whereby data sets need to be updated continually. We show that the very notion of a dominant set offers a simple and efficient way of doing this. Numerical experiments on various grouping problems show the effectiveness of the approach.
Algorithmic Results for Clustering and Refined Physarum Analysis
In the first part of this thesis, we study the Binary -Rank- problem which given a binary matrix and a positive integer , seeks to find a rank- binary matrix minimizing the number of non-zero entries of . A central open question is whether this problem admits a polynomial time approximation scheme. We give an affirmative answer to this question by designing the first randomized almost-linear time approximation scheme for constant over the reals, , and the Boolean semiring. In addition, we give novel algorithms for important variants of -low rank approximation.
The second part of this dissertation, studies a popular and successful heuristic, known as Approximate Spectral Clustering (ASC), for partitioning the nodes of a graph into clusters with small conductance. We give a comprehensive analysis, showing that ASC runs efficiently and yields a good approximation of an optimal -way node partition of .
In the final part of this thesis, we present two results on slime mold computations: i) the continuous undirected Physarum dynamics converges for undirected linear programs with a non-negative cost vector; and ii) for the discrete directed Physarum dynamics, we give a refined analysis that yields strengthened and close to optimal convergence rate bounds, and shows that the model can be initialized with any strongly dominating point.Im ersten Teil dieser Arbeit untersuchen wir das Binary -Rank- Problem. Hier sind eine bin{\"a}re Matrix und eine positive ganze Zahl gegeben und gesucht wird eine bin{\"a}re Matrix mit Rang , welche die Anzahl von nicht null Eintr{\"a}gen in minimiert. Wir stellen das erste randomisierte, nahezu lineare Aproximationsschema vor konstantes {\"u}ber die reellen Zahlen, und den Booleschen Semiring. Zus{\"a}tzlich erzielen wir neue Algorithmen f{\"u}r wichtige Varianten der -low rank Approximation.
Der zweite Teil dieser Dissertation besch{\"a}ftigt sich mit einer beliebten und erfolgreichen Heuristik, die unter dem Namen Approximate Spectral Cluster (ASC) bekannt ist. ASC partitioniert die Knoten eines gegeben Graphen in Cluster kleiner Conductance. Wir geben eine umfassende Analyse von ASC, die zeigt, dass ASC eine effiziente Laufzeit besitzt und eine gute Approximation einer optimale -Weg-Knoten Partition f{\"u}r berechnet.
Im letzten Teil dieser Dissertation pr{\"a}sentieren wir zwei Ergebnisse {\"u}ber Berechnungen mit Hilfe von Schleimpilzen: i) die kontinuierliche ungerichtete Physarum Dynamik konvergiert f{\"u}r ungerichtete lineare Programme mit einem nicht negativen Kostenvektor; und ii) f{\"u}r die diskrete gerichtete Physikum Dynamik geben wir eine verfeinerte Analyse, die st{\"a}rkere und beinahe optimale Schranken f{\"u}r ihre Konvergenzraten liefert und zeigt, dass das Model mit einem beliebigen stark dominierender Punkt initialisiert werden kann