739 research outputs found
Sublinear-Time Algorithms for Monomer-Dimer Systems on Bounded Degree Graphs
For a graph , let be the partition function of the
monomer-dimer system defined by , where is the
number of matchings of size in . We consider graphs of bounded degree
and develop a sublinear-time algorithm for estimating at an
arbitrary value within additive error with high
probability. The query complexity of our algorithm does not depend on the size
of and is polynomial in , and we also provide a lower bound
quadratic in for this problem. This is the first analysis of a
sublinear-time approximation algorithm for a # P-complete problem. Our
approach is based on the correlation decay of the Gibbs distribution associated
with . We show that our algorithm approximates the probability
for a vertex to be covered by a matching, sampled according to this Gibbs
distribution, in a near-optimal sublinear time. We extend our results to
approximate the average size and the entropy of such a matching within an
additive error with high probability, where again the query complexity is
polynomial in and the lower bound is quadratic in .
Our algorithms are simple to implement and of practical use when dealing with
massive datasets. Our results extend to other systems where the correlation
decay is known to hold as for the independent set problem up to the critical
activity
Distributed Data Summarization in Well-Connected Networks
We study distributed algorithms for some fundamental problems in data summarization. Given a communication graph G of n nodes each of which may hold a value initially, we focus on computing sum_{i=1}^N g(f_i), where f_i is the number of occurrences of value i and g is some fixed function. This includes important statistics such as the number of distinct elements, frequency moments, and the empirical entropy of the data.
In the CONGEST~ model, a simple adaptation from streaming lower bounds shows that it requires Omega~(D+ n) rounds, where D is the diameter of the graph, to compute some of these statistics exactly. However, these lower bounds do not hold for graphs that are well-connected. We give an algorithm that computes sum_{i=1}^{N} g(f_i) exactly in {tau_{G}} * 2^{O(sqrt{log n})} rounds where {tau_{G}} is the mixing time of G. This also has applications in computing the top k most frequent elements.
We demonstrate that there is a high similarity between the GOSSIP~ model and the CONGEST~ model in well-connected graphs. In particular, we show that each round of the GOSSIP~ model can be simulated almost perfectly in O~({tau_{G}}) rounds of the CONGEST~ model. To this end, we develop a new algorithm for the GOSSIP~ model that 1 +/- epsilon approximates the p-th frequency moment F_p = sum_{i=1}^N f_i^p in O~(epsilon^{-2} n^{1-k/p}) roundsfor p >= 2, when the number of distinct elements F_0 is at most O(n^{1/(k-1)}). This result can be translated back to the CONGEST~ model with a factor O~({tau_{G}}) blow-up in the number of rounds
Testing probability distributions underlying aggregated data
In this paper, we analyze and study a hybrid model for testing and learning
probability distributions. Here, in addition to samples, the testing algorithm
is provided with one of two different types of oracles to the unknown
distribution over . More precisely, we define both the dual and
cumulative dual access models, in which the algorithm can both sample from
and respectively, for any ,
- query the probability mass (query access); or
- get the total mass of , i.e. (cumulative
access)
These two models, by generalizing the previously studied sampling and query
oracle models, allow us to bypass the strong lower bounds established for a
number of problems in these settings, while capturing several interesting
aspects of these problems -- and providing new insight on the limitations of
the models. Finally, we show that while the testing algorithms can be in most
cases strictly more efficient, some tasks remain hard even with this additional
power
05291 Abstracts Collection -- Sublinear Algorithms
From 17.07.05 to 22.07.05, the Dagstuhl Seminar
05291 ``Sublinear Algorithms\u27\u27 was held
in the International Conference and Research Center (IBFI),
Schloss Dagstuhl.
During the seminar, several participants presented their current
research, and ongoing work and open problems were discussed. Abstracts of
the presentations given during the seminar as well as abstracts of
seminar results and ideas are put together in this paper. The first section
describes the seminar topics and goals in general.
Links to extended abstracts or full papers are provided, if available
Sublinear Algorithms for Approximating String Compressibility
We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: run-length encoding (RLE) and a variant of Lempel-Ziv (LZ77), and present sublinear algorithms for approximating compressibility with respect to both schemes. We also give several lower bounds that show that our algorithms for both schemes cannot be improved significantly.
Our investigation of LZ77 yields results whose interest goes beyond the initial questions we set out to study. In particular, we prove combinatorial structural lemmas that relate the compressibility of a string with respect to LZ77 to the number of distinct short substrings contained in it (its ℓth subword complexity , for small ℓ). In addition, we show that approximating the compressibility with respect to LZ77 is related to approximating the support size of a distribution.National Science Foundation (U.S.) (Award CCF-1065125)National Science Foundation (U.S.) (Award CCF-0728645)Marie Curie International Reintegration Grant PIRG03-GA-2008-231077Israel Science Foundation (Grant 1147/09)Israel Science Foundation (Grant 1675/09
- …