5,225 research outputs found
Distributed Data Summarization in Well-Connected Networks
We study distributed algorithms for some fundamental problems in data summarization. Given a communication graph G of n nodes each of which may hold a value initially, we focus on computing sum_{i=1}^N g(f_i), where f_i is the number of occurrences of value i and g is some fixed function. This includes important statistics such as the number of distinct elements, frequency moments, and the empirical entropy of the data.
In the CONGEST~ model, a simple adaptation from streaming lower bounds shows that it requires Omega~(D+ n) rounds, where D is the diameter of the graph, to compute some of these statistics exactly. However, these lower bounds do not hold for graphs that are well-connected. We give an algorithm that computes sum_{i=1}^{N} g(f_i) exactly in {tau_{G}} * 2^{O(sqrt{log n})} rounds where {tau_{G}} is the mixing time of G. This also has applications in computing the top k most frequent elements.
We demonstrate that there is a high similarity between the GOSSIP~ model and the CONGEST~ model in well-connected graphs. In particular, we show that each round of the GOSSIP~ model can be simulated almost perfectly in O~({tau_{G}}) rounds of the CONGEST~ model. To this end, we develop a new algorithm for the GOSSIP~ model that 1 +/- epsilon approximates the p-th frequency moment F_p = sum_{i=1}^N f_i^p in O~(epsilon^{-2} n^{1-k/p}) roundsfor p >= 2, when the number of distinct elements F_0 is at most O(n^{1/(k-1)}). This result can be translated back to the CONGEST~ model with a factor O~({tau_{G}}) blow-up in the number of rounds
On the Distributed Complexity of Large-Scale Graph Computations
Motivated by the increasing need to understand the distributed algorithmic
foundations of large-scale graph computations, we study some fundamental graph
problems in a message-passing model for distributed computing where
machines jointly perform computations on graphs with nodes (typically, ). The input graph is assumed to be initially randomly partitioned among
the machines, a common implementation in many real-world systems.
Communication is point-to-point, and the goal is to minimize the number of
communication {\em rounds} of the computation.
Our main contribution is the {\em General Lower Bound Theorem}, a theorem
that can be used to show non-trivial lower bounds on the round complexity of
distributed large-scale data computations. The General Lower Bound Theorem is
established via an information-theoretic approach that relates the round
complexity to the minimal amount of information required by machines to solve
the problem. Our approach is generic and this theorem can be used in a
"cookbook" fashion to show distributed lower bounds in the context of several
problems, including non-graph problems. We present two applications by showing
(almost) tight lower bounds for the round complexity of two fundamental graph
problems, namely {\em PageRank computation} and {\em triangle enumeration}. Our
approach, as demonstrated in the case of PageRank, can yield tight lower bounds
for problems (including, and especially, under a stochastic partition of the
input) where communication complexity techniques are not obvious.
Our approach, as demonstrated in the case of triangle enumeration, can yield
stronger round lower bounds as well as message-round tradeoffs compared to
approaches that use communication complexity techniques
Ranking and Selecting Multi-Hop Knowledge Paths to Better Predict Human Needs
To make machines better understand sentiments, research needs to move from
polarity identification to understanding the reasons that underlie the
expression of sentiment. Categorizing the goals or needs of humans is one way
to explain the expression of sentiment in text. Humans are good at
understanding situations described in natural language and can easily connect
them to the character's psychological needs using commonsense knowledge. We
present a novel method to extract, rank, filter and select multi-hop relation
paths from a commonsense knowledge resource to interpret the expression of
sentiment in terms of their underlying human needs. We efficiently integrate
the acquired knowledge paths in a neural model that interfaces context
representations with knowledge using a gated attention mechanism. We assess the
model's performance on a recently published dataset for categorizing human
needs. Selectively integrating knowledge paths boosts performance and
establishes a new state-of-the-art. Our model offers interpretability through
the learned attention map over commonsense knowledge paths. Human evaluation
highlights the relevance of the encoded knowledge
Ant-Inspired Density Estimation via Random Walks
Many ant species employ distributed population density estimation in
applications ranging from quorum sensing [Pra05], to task allocation [Gor99],
to appraisal of enemy colony strength [Ada90]. It has been shown that ants
estimate density by tracking encounter rates -- the higher the population
density, the more often the ants bump into each other [Pra05,GPT93].
We study distributed density estimation from a theoretical perspective. We
prove that a group of anonymous agents randomly walking on a grid are able to
estimate their density within a small multiplicative error in few steps by
measuring their rates of encounter with other agents. Despite dependencies
inherent in the fact that nearby agents may collide repeatedly (and, worse,
cannot recognize when this happens), our bound nearly matches what would be
required to estimate density by independently sampling grid locations.
From a biological perspective, our work helps shed light on how ants and
other social insects can obtain relatively accurate density estimates via
encounter rates. From a technical perspective, our analysis provides new tools
for understanding complex dependencies in the collision probabilities of
multiple random walks. We bound the strength of these dependencies using
of the underlying graph. Our results extend beyond
the grid to more general graphs and we discuss applications to size estimation
for social networks and density estimation for robot swarms
- …