Search CORE

16,787 research outputs found

Recommended from our members

Overlapping community detection in massive social networks

Author: Whang Joyce Jiyoung
Publication venue
Publication date: 11/02/2016
Field of study

Massive social networks have become increasingly popular in recent years. Community detection is one of the most important techniques for the analysis of such complex networks. A community is a set of cohesive vertices that has more connections inside the set than outside. In many social and information networks, these communities naturally overlap. For instance, in a social network, each vertex in a graph corresponds to an individual who usually participates in multiple communities. In this thesis, we propose scalable overlapping community detection algorithms that effectively identify high quality overlapping communities in various real-world networks. We first develop an efficient overlapping community detection algorithm using a seed set expansion approach. The key idea of this algorithm is to find good seeds and then greedily expand these seeds using a personalized PageRank clustering scheme. Experimental results show that our algorithm significantly outperforms other state-of-the-art overlapping community detection methods in terms of run time, cohesiveness of communities, and ground-truth accuracy. To develop more principled methods, we formulate the overlapping community detection problem as a non-exhaustive, overlapping graph clustering problem where clusters are allowed to overlap with each other, and some nodes are allowed to be outside of any cluster. To tackle this non-exhaustive, overlapping clustering problem, we propose a simple and intuitive objective function that captures the issues of overlap and non-exhaustiveness in a unified manner. To optimize the objective, we develop not only fast iterative algorithms but also more sophisticated algorithms using a low-rank semidefinite programming technique. Our experimental results show that the new objective and the algorithms are effective in finding ground-truth clusterings that have varied overlap and non-exhaustiveness. We extend our non-exhaustive, overlapping clustering techniques to co-clustering where the goal is to simultaneously identify a clustering of the rows as well as the columns of a data matrix. As an example application, consider recommender systems where users have ratings on items. This can be represented by a bipartite graph where users and items are denoted by two different types of nodes, and the ratings are denoted by weighted edges between the users and the items. In this case, co-clustering would be a simultaneous clustering of users and items. We propose a new co-clustering objective function and an efficient co-clustering algorithm that is able to identify overlapping clusters as well as outliers on both types of the nodes in the bipartite graph. We show that our co-clustering algorithm is able to effectively capture the underlying co-clustering structure of the data, which results in boosting the performance of a standard one-dimensional clustering. Finally, we study the design of parallel data-driven algorithms, which enables us to further increase the scalability of our overlapping community detection algorithms. Using PageRank as a model problem, we look at three algorithm design axes: work activation, data access pattern, and scheduling. We investigate the impact of different algorithm design choices. Using these design axes, we design and test a variety of PageRank implementations finding that data-driven, push-based algorithms are able to achieve a significantly superior scalability than standard PageRank implementations. The design choices affect both single-threaded performance as well as parallel scalability. The lessons learned from this study not only guide efficient implementations of many graph mining algorithms but also provide a framework for designing new scalable algorithms, especially for large-scale community detection.Computer Science

Texas ScholarWorks

Measuring Visual Complexity of Cluster-Based Visualizations

Author: Chen M.
Dasgupta A.
Duffy B.
Kosara R.
Walton S.
Publication venue
Publication date: 01/01/2012
Field of study

Handling visual complexity is a challenging problem in visualization owing to the subjectiveness of its definition and the difficulty in devising generalizable quantitative metrics. In this paper we address this challenge by measuring the visual complexity of two common forms of cluster-based visualizations: scatter plots and parallel coordinatess. We conceptualize visual complexity as a form of visual uncertainty, which is a measure of the degree of difficulty for humans to interpret a visual representation correctly. We propose an algorithm for estimating visual complexity for the aforementioned visualizations using Allen's interval algebra. We first establish a set of primitive 2-cluster cases in scatter plots and another set for parallel coordinatess based on symmetric isomorphism. We confirm that both are the minimal sets and verify the correctness of their members computationally. We score the uncertainty of each primitive case based on its topological properties, including the existence of overlapping regions, splitting regions and meeting points or edges. We compare a few optional scoring schemes against a set of subjective scores by humans, and identify the one that is the most consistent with the subjective scores. Finally, we extend the 2-cluster measure to k-cluster measure as a general purpose estimator of visual complexity for these two forms of cluster-based visualization

arXiv.org e-Print Archive

CiteSeerX

Oxford University Research Archive

A Dynamic Clustering and Resource Allocation Algorithm for Downlink CoMP Systems with Multiple Antenna UEs

Author: Baracca Paolo
Benvenuto Nevio
Boccardi Federico
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/11/2013
Field of study

Coordinated multi-point (CoMP) schemes have been widely studied in the recent years to tackle the inter-cell interference. In practice, latency and throughput constraints on the backhaul allow the organization of only small clusters of base stations (BSs) where joint processing (JP) can be implemented. In this work we focus on downlink CoMP-JP with multiple antenna user equipments (UEs) and propose a novel dynamic clustering algorithm. The additional degrees of freedom at the UE can be used to suppress the residual interference by using an interference rejection combiner (IRC) and allow a multistream transmission. In our proposal we first define a set of candidate clusters depending on long-term channel conditions. Then, in each time block, we develop a resource allocation scheme by jointly optimizing transmitter and receiver where: a) within each candidate cluster a weighted sum rate is estimated and then b) a set of clusters is scheduled in order to maximize the system weighted sum rate. Numerical results show that much higher rates are achieved when UEs are equipped with multiple antennas. Moreover, as this performance improvement is mainly due to the IRC, the gain achieved by the proposed approach with respect to the non-cooperative scheme decreases by increasing the number of UE antennas.Comment: 27 pages, 8 figure

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova

Techniques for clustering gene expression data

Author: Crane Martin
Doolan Padraig
Kerr Gráinne
Ruskin Heather J.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered

CiteSeerX

Irish Universities

DCU Online Research Access Service

Scalable and interpretable product recommendations via overlapping co-clustering

Author: Dünner Celestine
Heckel Reinhard
Parnell Thomas
Vlachos Michail
Publication venue
Publication date: 01/04/2017
Field of study

We consider the problem of generating interpretable recommendations by identifying overlapping co-clusters of clients and products, based only on positive or implicit feedback. Our approach is applicable on very large datasets because it exhibits almost linear complexity in the input examples and the number of co-clusters. We show, both on real industrial data and on publicly available datasets, that the recommendation accuracy of our algorithm is competitive to that of state-of-art matrix factorization techniques. In addition, our technique has the advantage of offering recommendations that are textually and visually interpretable. Finally, we examine how to implement our technique efficiently on Graphical Processing Units (GPUs).Comment: In IEEE International Conference on Data Engineering (ICDE) 201

arXiv.org e-Print Archive

Serveur académique lausannois