Search CORE

39,618 research outputs found

Recent Advances in Modularity Optimization and Their Application in Retailing

Author: Geyer-Schulz Andreas
Ovelgönne Michael
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 27/06/2014
Field of study

In this contribution we report on three recent advances in modularity optimization, namely: 1. The randomized greedy (RG) family of modularity optimization algorithms are state-of-the-art graph clustering algorithms which are near optimal, fast, and scalable. 2. The extension of the RG family to multi-level clustering. 3. A new entropy based cluster index which allows the detection of the proper clustering levels and of stable core clusters at each level. Last, but not least, several marketing applications of these algorithms for customer enablement and empowerment are discussed: e.g. the detection of low-level cluster structures from retail purchase data, the analysis of the co-usage structure of scientific documents for detecting multilevel category structures for scientific libraries, and the analysis of social groups from the friend relation of social network sites

KITopen

Evolutionary star-structured heterogeneous data co-clustering

Author: Salunke Amit
Publication venue: RIT Scholar Works
Publication date: 01/11/2012
Field of study

A star-structured interrelationship, which is a more common type in real world data, has a central object connected to the other types of objects. One of the key challenges in evolutionary clustering is integration of historical data in current data. Traditionally, smoothness in data transition over a period of time is achieved by means of cost functions defined over historical and current data. These functions provide a tunable tolerance for shifts of current data accounting instance to all historical information for corresponding instance. Once historical data is integrated into current data using cost functions, co-clustering is obtained using various co-clustering algorithms like spectral clustering, non-negative matrix factorization, and information theory based clustering. Non-negative matrix factorization has been proven efficient and scalable for large data and is less memory intensive compared to other approaches. Non-negative matrix factorization tri-factorizes original data matrix into row indicator matrix, column indicator matrix, and a matrix that provides correlation between the row and column clusters. However, challenges in clustering evolving heterogeneous data have never been addressed. In this thesis, I propose a new algorithm for clustering a specific case of this problem, viz. the star-structured heterogeneous data. The proposed algorithm will provide cost functions to integrate historical star-structured heterogeneous data into current data. Then I will use non-negative matrix factorization to cluster each time-step of instances and features. This contribution to the field will provide an avenue for further development of higher order evolutionary co-clustering algorithms

RIT Scholar Works

Embed and Conquer: Scalable Embeddings for Kernel k-Means on MapReduce

Author: Elgohary Ahmed
Farahat Ahmed K.
Kamel Mohamed S.
Karray Fakhri
Publication venue
Publication date: 29/01/2014
Field of study

The kernel

k

-means is an effective method for data clustering which extends the commonly-used

k

-means algorithm to work on a similarity matrix over complex data structures. The kernel

k

-means algorithm is however computationally very complex as it requires the complete data matrix to be calculated and stored. Further, the kernelized nature of the kernel

k

-means algorithm hinders the parallelization of its computations on modern infrastructures for distributed computing. In this paper, we are defining a family of kernel-based low-dimensional embeddings that allows for scaling kernel

k

-means on MapReduce via an efficient and unified parallelization strategy. Afterwards, we propose two methods for low-dimensional embedding that adhere to our definition of the embedding family. Exploiting the proposed parallelization strategy, we present two scalable MapReduce algorithms for kernel

k

-means. We demonstrate the effectiveness and efficiency of the proposed algorithms through an empirical evaluation on benchmark data sets.Comment: Appears in Proceedings of the SIAM International Conference on Data Mining (SDM), 201

arXiv.org e-Print Archive

CiteSeerX