5,956 research outputs found
Bipartite graph partitioning and data clustering
Many data types arising from data mining applications can be modeled as
bipartite graphs, examples include terms and documents in a text corpus,
customers and purchasing items in market basket analysis and reviewers and
movies in a movie recommender system. In this paper, we propose a new data
clustering method based on partitioning the underlying bipartite graph. The
partition is constructed by minimizing a normalized sum of edge weights between
unmatched pairs of vertices of the bipartite graph. We show that an approximate
solution to the minimization problem can be obtained by computing a partial
singular value decomposition (SVD) of the associated edge weight matrix of the
bipartite graph. We point out the connection of our clustering algorithm to
correspondence analysis used in multivariate analysis. We also briefly discuss
the issue of assigning data objects to multiple clusters. In the experimental
results, we apply our clustering algorithm to the problem of document
clustering to illustrate its effectiveness and efficiency.Comment: Proceedings of ACM CIKM 2001, the Tenth International Conference on
Information and Knowledge Management, 200
Unifying Sparsest Cut, Cluster Deletion, and Modularity Clustering Objectives with Correlation Clustering
Graph clustering, or community detection, is the task of identifying groups
of closely related objects in a large network. In this paper we introduce a new
community-detection framework called LambdaCC that is based on a specially
weighted version of correlation clustering. A key component in our methodology
is a clustering resolution parameter, , which implicitly controls the
size and structure of clusters formed by our framework. We show that, by
increasing this parameter, our objective effectively interpolates between two
different strategies in graph clustering: finding a sparse cut and forming
dense subgraphs. Our methodology unifies and generalizes a number of other
important clustering quality functions including modularity, sparsest cut, and
cluster deletion, and places them all within the context of an optimization
problem that has been well studied from the perspective of approximation
algorithms. Our approach is particularly relevant in the regime of finding
dense clusters, as it leads to a 2-approximation for the cluster deletion
problem. We use our approach to cluster several graphs, including large
collaboration networks and social networks
Clustering and Community Detection in Directed Networks: A Survey
Networks (or graphs) appear as dominant structures in diverse domains,
including sociology, biology, neuroscience and computer science. In most of the
aforementioned cases graphs are directed - in the sense that there is
directionality on the edges, making the semantics of the edges non symmetric.
An interesting feature that real networks present is the clustering or
community structure property, under which the graph topology is organized into
modules commonly called communities or clusters. The essence here is that nodes
of the same community are highly similar while on the contrary, nodes across
communities present low similarity. Revealing the underlying community
structure of directed complex networks has become a crucial and
interdisciplinary topic with a plethora of applications. Therefore, naturally
there is a recent wealth of research production in the area of mining directed
graphs - with clustering being the primary method and tool for community
detection and evaluation. The goal of this paper is to offer an in-depth review
of the methods presented so far for clustering directed networks along with the
relevant necessary methodological background and also related applications. The
survey commences by offering a concise review of the fundamental concepts and
methodological base on which graph clustering algorithms capitalize on. Then we
present the relevant work along two orthogonal classifications. The first one
is mostly concerned with the methodological principles of the clustering
algorithms, while the second one approaches the methods from the viewpoint
regarding the properties of a good cluster in a directed network. Further, we
present methods and metrics for evaluating graph clustering results,
demonstrate interesting application domains and provide promising future
research directions.Comment: 86 pages, 17 figures. Physics Reports Journal (To Appear
Extracting information from informal communication
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.Includes bibliographical references (leaves 89-93).This thesis focuses on the problem of extracting information from informal communication. Textual informal communication, such as e-mail, bulletin boards and blogs, has become a vast information resource. However, such information is poorly organized and difficult for a computer to understand due to lack of editing and structure. Thus, techniques which work well for formal text, such as newspaper articles, may be considered insufficient on informal text. One focus of ours is to attempt to advance the state-of-the-art for sub-problems of the information extraction task. We make contributions to the problems of named entity extraction, co-reference resolution and context tracking. We channel our efforts toward methods which are particularly applicable to informal communication. We also consider a type of information which is somewhat unique to informal communication: preferences and opinions. Individuals often expression their opinions on products and services in such communication. Others' may read these "reviews" to try to predict their own experiences. However, humans do a poor job of aggregating and generalizing large sets of data. We develop techniques that can perform the job of predicting unobserved opinions.(cont.) We address both the single-user case where information about the items is known, and the multi-user case where we can generalize opinions without external information. Experiments on large-scale rating data sets validate our approach.by Jason D.M. Rennie.Ph.D
Adaptive Graph via Multiple Kernel Learning for Nonnegative Matrix Factorization
Nonnegative Matrix Factorization (NMF) has been continuously evolving in
several areas like pattern recognition and information retrieval methods. It
factorizes a matrix into a product of 2 low-rank non-negative matrices that
will define parts-based, and linear representation of nonnegative data.
Recently, Graph regularized NMF (GrNMF) is proposed to find a compact
representation,which uncovers the hidden semantics and simultaneously respects
the intrinsic geometric structure. In GNMF, an affinity graph is constructed
from the original data space to encode the geometrical information. In this
paper, we propose a novel idea which engages a Multiple Kernel Learning
approach into refining the graph structure that reflects the factorization of
the matrix and the new data space. The GrNMF is improved by utilizing the graph
refined by the kernel learning, and then a novel kernel learning method is
introduced under the GrNMF framework. Our approach shows encouraging results of
the proposed algorithm in comparison to the state-of-the-art clustering
algorithms like NMF, GrNMF, SVD etc.Comment: This paper has been withdrawn by the author due to the terrible
writin
- …