Search CORE

7 research outputs found

Spectral Clustering: An Empirical Study of Approximation Algorithms and its Application to the Attrition Problem

Author: Boutsidis C.
Cung B.
Jin T
Needell Deanna
Ramirez J.
Thompson A.
Publication venue: Scholarship @ Claremont
Publication date: 14/11/2012
Field of study

Clustering is the problem of separating a set of objects into groups (called clusters) so that objects within the same cluster are more similar to each other than to those in different clusters. Spectral clustering is a now well-known method for clustering which utilizes the spectrum of the data similarity matrix to perform this separation. Since the method relies on solving an eigenvector problem, it is computationally expensive for large datasets. To overcome this constraint, approximation methods have been developed which aim to reduce running time while maintaining accurate classification. In this article, we summarize and experimentally evaluate several approximation methods for spectral clustering. From an applications standpoint, we employ spectral clustering to solve the so-called attrition problem, where one aims to identify from a set of employees those who are likely to voluntarily leave the company from those who are not. Our study sheds light on the empirical performance of existing approximate spectral clustering methods and shows the applicability of these methods in an important business optimization related problem

arXiv.org e-Print Archive

Scholarship@Claremont

A Review on Data Clustering Algorithms for Mixed Data

Author: Prasad D. Hari
Punithavalli Dr. M.
Publication venue: Global Journals Inc. (US)
Publication date: 12/06/2010
Field of study

Clustering is the unsupervised classification of patterns into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. In general, clustering is a method of dividing the data into groups of similar objects. One of significant research areas in data mining is to develop methods to modernize knowledge by using the existing knowledge, since it can generally augment mining efficiency, especially for very bulky database. Data mining uncovers hidden, previously unknown, and potentially useful information from large amounts of data. This paper presents a general survey of various clustering algorithms. In addition, the paper also describes the efficiency of Self-Organized Map (SOM) algorithm in enhancing the mixed data clustering

Global Journal of Computer Science and Technology (GJCST)

Applications of a Graph Theoretic Based Clustering Framework in Computer Vision and Pattern Recognition

Author: Tesfaye Yonatan Tariku
Publication venue
Publication date: 07/01/2018
Field of study

Recently, several clustering algorithms have been used to solve variety of problems from different discipline. This dissertation aims to address different challenging tasks in computer vision and pattern recognition by casting the problems as a clustering problem. We proposed novel approaches to solve multi-target tracking, visual geo-localization and outlier detection problems using a unified underlining clustering framework, i.e., dominant set clustering and its extensions, and presented a superior result over several state-of-the-art approaches.Comment: doctoral dissertatio

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università IUAV di Venezia

Efficient out-of-sample extension of dominant-set clusters

Author: PAVAN M.
PELILLO M.
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2005
Field of study

Dominant sets are a new graph-theoretic concept that has proven to be relevant in pairwise data clustering problems, such as image segmentation. They generalize the notion of a maximal clique to edgeweighted graphs and have intriguing, non-trivial connections to continuous quadratic optimization and spectral-based grouping. We address the problem of grouping out-of-sample examples after the clustering process has taken place. This may serve either to drastically reduce the computational burden associated to the processing of very large data sets, or to efficiently deal with dynamic situations whereby data sets need to be updated continually. We show that the very notion of a dominant set offers a simple and efficient way of doing this. Numerical experiments on various grouping problems show the effectiveness of the approach.

Archivio Ricerca Ca'Foscari

CiteSeerX

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Algorithmic Results for Clustering and Refined Physarum Analysis

Author: Kolev Pavel
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2018
Field of study

In the first part of this thesis, we study the Binary

\ell_0

-Rank-

k

problem which given a binary matrix

A

and a positive integer

k

, seeks to find a rank-

k

binary matrix

B

minimizing the number of non-zero entries of

A-B

. A central open question is whether this problem admits a polynomial time approximation scheme. We give an affirmative answer to this question by designing the first randomized almost-linear time approximation scheme for constant

k

over the reals,

\mathbb{F}_2

, and the Boolean semiring. In addition, we give novel algorithms for important variants of

\ell_0

-low rank approximation. The second part of this dissertation, studies a popular and successful heuristic, known as Approximate Spectral Clustering (ASC), for partitioning the nodes of a graph

G

into clusters with small conductance. We give a comprehensive analysis, showing that ASC runs efficiently and yields a good approximation of an optimal

k

-way node partition of

G

. In the final part of this thesis, we present two results on slime mold computations: i) the continuous undirected Physarum dynamics converges for undirected linear programs with a non-negative cost vector; and ii) for the discrete directed Physarum dynamics, we give a refined analysis that yields strengthened and close to optimal convergence rate bounds, and shows that the model can be initialized with any strongly dominating point.Im ersten Teil dieser Arbeit untersuchen wir das Binary

\ell_0

-Rank-

k

Problem. Hier sind eine bin{\"a}re Matrix

A

und eine positive ganze Zahl

k

gegeben und gesucht wird eine bin{\"a}re Matrix

B

mit Rang

k

, welche die Anzahl von nicht null Eintr{\"a}gen in

A-B

minimiert. Wir stellen das erste randomisierte, nahezu lineare Aproximationsschema vor konstantes

k

{\"u}ber die reellen Zahlen,

\mathbb{F}_2

und den Booleschen Semiring. Zus{\"a}tzlich erzielen wir neue Algorithmen f{\"u}r wichtige Varianten der

\ell_0

-low rank Approximation. Der zweite Teil dieser Dissertation besch{\"a}ftigt sich mit einer beliebten und erfolgreichen Heuristik, die unter dem Namen Approximate Spectral Cluster (ASC) bekannt ist. ASC partitioniert die Knoten eines gegeben Graphen

G

in Cluster kleiner Conductance. Wir geben eine umfassende Analyse von ASC, die zeigt, dass ASC eine effiziente Laufzeit besitzt und eine gute Approximation einer optimale

k

-Weg-Knoten Partition f{\"u}r

G

berechnet. Im letzten Teil dieser Dissertation pr{\"a}sentieren wir zwei Ergebnisse {\"u}ber Berechnungen mit Hilfe von Schleimpilzen: i) die kontinuierliche ungerichtete Physarum Dynamik konvergiert f{\"u}r ungerichtete lineare Programme mit einem nicht negativen Kostenvektor; und ii) f{\"u}r die diskrete gerichtete Physikum Dynamik geben wir eine verfeinerte Analyse, die st{\"a}rkere und beinahe optimale Schranken f{\"u}r ihre Konvergenzraten liefert und zeigt, dass das Model mit einem beliebigen stark dominierender Punkt initialisiert werden kann

Universaar

Acronym

MPG.PuRe