316 research outputs found
Window-based Streaming Graph Partitioning Algorithm
In the recent years, the scale of graph datasets has increased to such a
degree that a single machine is not capable of efficiently processing large
graphs. Thereby, efficient graph partitioning is necessary for those large
graph applications. Traditional graph partitioning generally loads the whole
graph data into the memory before performing partitioning; this is not only a
time consuming task but it also creates memory bottlenecks. These issues of
memory limitation and enormous time complexity can be resolved using
stream-based graph partitioning. A streaming graph partitioning algorithm reads
vertices once and assigns that vertex to a partition accordingly. This is also
called an one-pass algorithm. This paper proposes an efficient window-based
streaming graph partitioning algorithm called WStream. The WStream algorithm is
an edge-cut partitioning algorithm, which distributes a vertex among the
partitions. Our results suggest that the WStream algorithm is able to partition
large graph data efficiently while keeping the load balanced across different
partitions, and communication to a minimum. Evaluation results with real
workloads also prove the effectiveness of our proposed algorithm, and it
achieves a significant reduction in load imbalance and edge-cut with different
ranges of dataset
Streaming, Memory Limited Algorithms for Community Detection
In this paper, we consider sparse networks consisting of a finite number of
non-overlapping communities, i.e. disjoint clusters, so that there is higher
density within clusters than across clusters. Both the intra- and inter-cluster
edge densities vanish when the size of the graph grows large, making the
cluster reconstruction problem nosier and hence difficult to solve. We are
interested in scenarios where the network size is very large, so that the
adjacency matrix of the graph is hard to manipulate and store. The data stream
model in which columns of the adjacency matrix are revealed sequentially
constitutes a natural framework in this setting. For this model, we develop two
novel clustering algorithms that extract the clusters asymptotically
accurately. The first algorithm is {\it offline}, as it needs to store and keep
the assignments of nodes to clusters, and requires a memory that scales
linearly with the network size. The second algorithm is {\it online}, as it may
classify a node when the corresponding column is revealed and then discard this
information. This algorithm requires a memory growing sub-linearly with the
network size. To construct these efficient streaming memory-limited clustering
algorithms, we first address the problem of clustering with partial
information, where only a small proportion of the columns of the adjacency
matrix is observed and develop, for this setting, a new spectral algorithm
which is of independent interest.Comment: NIPS 201
Link-Prediction Enhanced Consensus Clustering for Complex Networks
Many real networks that are inferred or collected from data are incomplete
due to missing edges. Missing edges can be inherent to the dataset (Facebook
friend links will never be complete) or the result of sampling (one may only
have access to a portion of the data). The consequence is that downstream
analyses that consume the network will often yield less accurate results than
if the edges were complete. Community detection algorithms, in particular,
often suffer when critical intra-community edges are missing. We propose a
novel consensus clustering algorithm to enhance community detection on
incomplete networks. Our framework utilizes existing community detection
algorithms that process networks imputed by our link prediction based
algorithm. The framework then merges their multiple outputs into a final
consensus output. On average our method boosts performance of existing
algorithms by 7% on artificial data and 17% on ego networks collected from
Facebook
Optimal learning of joint alignments with a faulty oracle
We consider the following problem, which is useful in applications such as joint image and
shape alignment. The goal is to recover n discrete variables gi ∈ {0, . . . , k − 1} (up to some
global offset) given noisy observations of a set of their pairwise differences {(gi − gj) mod k};
specifically, with probability 1
k + for some > 0 one obtains the correct answer, and with
the remaining probability one obtains a uniformly random incorrect answer. We consider a
learning-based formulation where one can perform a query to observe a pairwise difference, and
the goal is to perform as few queries as possible while obtaining the exact joint alignment.
We provide an easy-to-implement, time efficient algorithm that performs O (n lg n
k^2 ) queries, and
recovers the joint alignment with high probability. We also show that our algorithm is optimal
by proving a general lower bound that holds for all non-adaptive algorithms. Our work improves
significantly recent work by Chen and Cand´es [CC16], who view the problem as a constrained
principal components analysis problem that can be solved using the power method. Specifically,
our approach is simpler both in the algorithm and the analysis, and provides additional insights
into the problem structure.First author draf
- …