127,726 research outputs found
Distributed anonymous discrete function computation
We propose a model for deterministic distributed function computation by a
network of identical and anonymous nodes. In this model, each node has bounded
computation and storage capabilities that do not grow with the network size.
Furthermore, each node only knows its neighbors, not the entire graph. Our goal
is to characterize the class of functions that can be computed within this
model. In our main result, we provide a necessary condition for computability
which we show to be nearly sufficient, in the sense that every function that
satisfies this condition can at least be approximated. The problem of computing
suitably rounded averages in a distributed manner plays a central role in our
development; we provide an algorithm that solves it in time that grows
quadratically with the size of the network
An Iterative Scheme for Leverage-based Approximate Aggregation
The current data explosion poses great challenges to the approximate
aggregation with an efficiency and accuracy. To address this problem, we
propose a novel approach to calculate the aggregation answers with a high
accuracy using only a small portion of the data. We introduce leverages to
reflect individual differences in the samples from a statistical perspective.
Two kinds of estimators, the leverage-based estimator, and the sketch estimator
(a "rough picture" of the aggregation answer), are in constraint relations and
iteratively improved according to the actual conditions until their difference
is below a threshold. Due to the iteration mechanism and the leverages, our
approach achieves a high accuracy. Moreover, some features, such as not
requiring recording the sampled data and easy to extend to various execution
modes (e.g., the online mode), make our approach well suited to deal with big
data. Experiments show that our approach has an extraordinary performance, and
when compared with the uniform sampling, our approach can achieve high-quality
answers with only 1/3 of the same sample size.Comment: 17 pages, 9 figure
Algorithms for Provisioning Queries and Analytics
Provisioning is a technique for avoiding repeated expensive computations in
what-if analysis. Given a query, an analyst formulates hypotheticals, each
retaining some of the tuples of a database instance, possibly overlapping, and
she wishes to answer the query under scenarios, where a scenario is defined by
a subset of the hypotheticals that are "turned on". We say that a query admits
compact provisioning if given any database instance and any hypotheticals,
one can create a poly-size (in ) sketch that can then be used to answer the
query under any of the possible scenarios without accessing the
original instance.
In this paper, we focus on provisioning complex queries that combine
relational algebra (the logical component), grouping, and statistics/analytics
(the numerical component). We first show that queries that compute quantiles or
linear regression (as well as simpler queries that compute count and
sum/average of positive values) can be compactly provisioned to provide
(multiplicative) approximate answers to an arbitrary precision. In contrast,
exact provisioning for each of these statistics requires the sketch size to be
exponential in . We then establish that for any complex query whose logical
component is a positive relational algebra query, as long as the numerical
component can be compactly provisioned, the complex query itself can be
compactly provisioned. On the other hand, introducing negation or recursion in
the logical component again requires the sketch size to be exponential in .
While our positive results use algorithms that do not access the original
instance after a scenario is known, we prove our lower bounds even for the case
when, knowing the scenario, limited access to the instance is allowed
Distributed anonymous function computation in information fusion and multiagent systems
We propose a model for deterministic distributed function computation by a
network of identical and anonymous nodes, with bounded computation and storage
capabilities that do not scale with the network size. Our goal is to
characterize the class of functions that can be computed within this model. In
our main result, we exhibit a class of non-computable functions, and prove that
every function outside this class can at least be approximated. The problem of
computing averages in a distributed manner plays a central role in our
development
- …