14 research outputs found
Improved Theoretical and Practical Guarantees for Chromatic Correlation Clustering
We study a natural generalization of the correlation cluster-ing problem to graphs in which the pairwise relations be-tween objects are categorical instead of binary. This prob-lem was recently introduced by Bonchi et al. under the name of chromatic correlation clustering, and is motivated by many real-world applications in data-mining and social networks, including community detection, link classification, and entity de-duplication. Our main contribution is a fast and easy-to-implement constant approximation framework for the problem, which builds on a novel reduction of the problem to that of cor-relation clustering. This result significantly progresses the current state of knowledge for the problem, improving on a previous result that only guaranteed linear approximation in the input size. We complement the above result by devel-oping a linear programming-based algorithm that achieves an improved approximation ratio of 4. Although this al-gorithm cannot be considered to be practical, it further ex-tends our theoretical understanding of chromatic correlation clustering. We also present a fast heuristic algorithm that is motivated by real-life scenarios in which there is a ground-truth clustering that is obscured by noisy observations. We test our algorithms on both synthetic and real datasets, like social networks data. Our experiments reinforce the theoret-ical findings by demonstrating that our algorithms generally outperform previous approaches, both in terms of solution cost and reconstruction of an underlying ground-truth clus-tering
Overlapping and Robust Edge-Colored Clustering in Hypergraphs
A recent trend in data mining has explored (hyper)graph clustering algorithms
for data with categorical relationship types. Such algorithms have applications
in the analysis of social, co-authorship, and protein interaction networks, to
name a few. Many such applications naturally have some overlap between
clusters, a nuance which is missing from current combinatorial models.
Additionally, existing models lack a mechanism for handling noise in datasets.
We address these concerns by generalizing Edge-Colored Clustering, a recent
framework for categorical clustering of hypergraphs. Our generalizations allow
for a budgeted number of either (a) overlapping cluster assignments or (b) node
deletions. For each new model we present a greedy algorithm which approximately
minimizes an edge mistake objective, as well as bicriteria approximations where
the second approximation factor is on the budget. Additionally, we address the
parameterized complexity of each problem, providing FPT algorithms and hardness
results
Efficient Correlation Clustering Methods for Large Consensus Clustering Instances
Consensus clustering (or clustering aggregation) inputs partitions of a
given ground set , and seeks to create a single partition that minimizes
disagreement with all input partitions. State-of-the-art algorithms for
consensus clustering are based on correlation clustering methods like the
popular Pivot algorithm. Unfortunately these methods have not proved to be
practical for consensus clustering instances where either or gets
large.
In this paper we provide practical run time improvements for correlation
clustering solvers when is large. We reduce the time complexity of Pivot
from to , and its space complexity from to
-- a significant savings since in practice is much less than
. We also analyze a sampling method for these algorithms when is
large, bridging the gap between running Pivot on the full set of input
partitions (an expected 1.57-approximation) and choosing a single input
partition at random (an expected 2-approximation). We show experimentally that
algorithms like Pivot do obtain quality clustering results in practice even on
small samples of input partitions
Cost-optimal constrained correlation clustering via weighted partial Maximum Satisfiability
Peer reviewe
A Social Network Image Classification Algorithm Based on Multimodal Deep Learning
The complex data structure and massive image data of social networks pose a huge challenge to the mining of associations between social information. For accurate classification of social network images, this paper proposes a social network image classification algorithm based on multimodal deep learning. Firstly, a social network association clustering model (SNACM) was established, and used to calculate trust and similarity, which represent the degree of similarity between users. Based on artificial ant colony algorithm, the SNACM was subject to weighted stacking, and the social network image association network was constructed. After that, the social network images of three modes, i.e. RGB (red-green-blue) image, grayscale image, and depth image, were fused. Finally, a three-dimensional neural network (3D NN) was constructed to extract the features of the multimodal social network image. The proposed algorithm was proved valid and accurate through experiments. The research results provide a reference for applying multimodal deep learning to classify the images in other fields
Core Decomposition in Multilayer Networks: Theory, Algorithms, and Applications
Multilayer networks are a powerful paradigm to model complex systems, where
multiple relations occur between the same entities. Despite the keen interest
in a variety of tasks, algorithms, and analyses in this type of network, the
problem of extracting dense subgraphs has remained largely unexplored so far.
In this work we study the problem of core decomposition of a multilayer
network. The multilayer context is much challenging as no total order exists
among multilayer cores; rather, they form a lattice whose size is exponential
in the number of layers. In this setting we devise three algorithms which
differ in the way they visit the core lattice and in their pruning techniques.
We then move a step forward and study the problem of extracting the
inner-most (also known as maximal) cores, i.e., the cores that are not
dominated by any other core in terms of their core index in all the layers.
Inner-most cores are typically orders of magnitude less than all the cores.
Motivated by this, we devise an algorithm that effectively exploits the
maximality property and extracts inner-most cores directly, without first
computing a complete decomposition.
Finally, we showcase the multilayer core-decomposition tool in a variety of
scenarios and problems. We start by considering the problem of densest-subgraph
extraction in multilayer networks. We introduce a definition of multilayer
densest subgraph that trades-off between high density and number of layers in
which the high density holds, and exploit multilayer core decomposition to
approximate this problem with quality guarantees. As further applications, we
show how to utilize multilayer core decomposition to speed-up the extraction of
frequent cross-graph quasi-cliques and to generalize the community-search
problem to the multilayer setting
Correlation Clustering with Sherali-Adams
Given a complete graph where each edge is labeled or ,
the Correlation Clustering problem asks to partition into clusters to
minimize the number of edges between different clusters plus the number of
edges within the same cluster. Correlation Clustering has been used to model
a large number of clustering problems in practice, making it one of the most
widely studied clustering formulations. The approximability of Correlation
Clustering has been actively investigated [BBC04, CGW05, ACN08], culminating in
a -approximation algorithm [CMSY15], based on rounding the standard LP
relaxation. Since the integrality gap for this formulation is 2, it has
remained a major open question to determine if the approximation factor of 2
can be reached, or even breached.
In this paper, we answer this question affirmatively by showing that there
exists a -approximation algorithm based on
) rounds of the Sherali-Adams hierarchy. In order to round a
solution to the Sherali-Adams relaxation, we adapt the {\em correlated
rounding} originally developed for CSPs [BRS11, GS11, RT12]. With this tool, we
reach an approximation ratio of for Correlation Clustering. To
breach this ratio, we go beyond the traditional triangle-based analysis by
employing a global charging scheme that amortizes the total cost of the
rounding across different triangles