117 research outputs found
A Randomized Sublinear Time Parallel GCD Algorithm for the EREW PRAM
We present a randomized parallel algorithm that computes the greatest common
divisor of two integers of n bits in length with probability 1-o(1) that takes
O(n loglog n / log n) expected time using n^{6+\epsilon} processors on the EREW
PRAM parallel model of computation. We believe this to be the first randomized
sublinear time algorithm on the EREW PRAM for this problem
Polylog Depth Circuits for Integer Factoring and Discrete Logarithms
AbstractIn this paper, we develop parallel algorithms for integer factoring and for computing discrete logarithms. In particular, we give polylog depth probabilistic boolean circuits of subexponential size for both of these problems, thereby solving an open problem of Adleman and Kompella.
Existing sequential algorithms for integer factoring and discrete logarithms use a prime base which is the set of all primes up to a bound B. We use a much smaller value for B for our parallel algorithms than is typical for sequential algorithms. In particular, for inputs of length n, by setting B = nlogdn with d a positive constant, we construct
•Probabilistic boolean circuits of depth (log) and size exp[(/log)] for completely factoring a positive integer with probability 1−(1), and
•Probabilistic boolean circuits of depth (log + log) and size exp[(/log)] for computing discrete logarithms in the finite field () for a prime with probability 1−(1). These are the first results of this type for both problem
Streaming Periodicity with Mismatches
We study the problem of finding all k-periods of a length-n string S, presented as a data stream. S is said to have k-period p if its prefix of length n-p differs from its suffix of length n-p in at most k locations.
We give a one-pass streaming algorithm that computes the k-periods of a string S using poly(k, log n) bits of space, for k-periods of length at most n/2. We also present a two-pass streaming algorithm that computes k-periods of S using poly(k, log n) bits of space, regardless of period length. We complement these results with comparable lower bounds
Testing non-uniform k-wise independent distributions over product spaces (extended abstract)
A distribution D over Σ1× ⋯ ×Σ n is called (non-uniform) k-wise independent if for any set of k indices {i 1, ..., i k } and for any z1zki1ik, PrXD[Xi1Xik=z1zk]=PrXD[Xi1=z1]PrXD[Xik=zk]. We study the problem of testing (non-uniform) k-wise independent distributions over product spaces. For the uniform case we show an upper bound on the distance between a distribution D from the set of k-wise independent distributions in terms of the sum of Fourier coefficients of D at vectors of weight at most k. Such a bound was previously known only for the binary field. For the non-uniform case, we give a new characterization of distributions being k-wise independent and further show that such a characterization is robust. These greatly generalize the results of Alon et al. [1] on uniform k-wise independence over the binary field to non-uniform k-wise independence over product spaces. Our results yield natural testing algorithms for k-wise independence with time and sample complexity sublinear in terms of the support size when k is a constant. The main technical tools employed include discrete Fourier transforms and the theory of linear systems of congruences.National Science Foundation (U.S.) (NSF grant 0514771)National Science Foundation (U.S.) (grant 0728645)National Science Foundation (U.S.) (Grant 0732334)Marie Curie International Reintegration Grants (Grant PIRG03-GA-2008-231077)Israel Science Foundation (Grant 1147/09)Israel Science Foundation (Grant 1675/09)Massachusetts Institute of Technology (Akamai Presidential Fellowship
Optimal Substring-Equality Queries with Applications to Sparse Text Indexing
We consider the problem of encoding a string of length from an integer
alphabet of size so that access and substring equality queries (that
is, determining the equality of any two substrings) can be answered
efficiently. Any uniquely-decodable encoding supporting access must take
bits. We describe a new data
structure matching this lower bound when while supporting
both queries in optimal time. Furthermore, we show that the string can
be overwritten in-place with this structure. The redundancy of
bits and the constant query time break exponentially a lower bound that is
known to hold in the read-only model. Using our new string representation, we
obtain the first in-place subquadratic (indeed, even sublinear in some cases)
algorithms for several string-processing problems in the restore model: the
input string is rewritable and must be restored before the computation
terminates. In particular, we describe the first in-place subquadratic Monte
Carlo solutions to the sparse suffix sorting, sparse LCP array construction,
and suffix selection problems. With the sole exception of suffix selection, our
algorithms are also the first running in sublinear time for small enough sets
of input suffixes. Combining these solutions, we obtain the first
sublinear-time Monte Carlo algorithm for building the sparse suffix tree in
compact space. We also show how to derandomize our algorithms using small
space. This leads to the first Las Vegas in-place algorithm computing the full
LCP array in time and to the first Las Vegas in-place algorithms
solving the sparse suffix sorting and sparse LCP array construction problems in
time. Running times of these Las Vegas
algorithms hold in the worst case with high probability.Comment: Refactored according to TALG's reviews. New w.h.p. bounds and Las
Vegas algorithm
Approximating Properties of Data Streams
In this dissertation, we present algorithms that approximate properties in the data stream model, where elements of an underlying data set arrive sequentially, but algorithms must use space sublinear in the size of the underlying data set. We first study the problem of finding all k-periods of a length-n string S, presented as a data stream. S is said to have k-period p if its prefix of length n − p differs from its suffix of length n − p in at most k locations. We give algorithms to compute the k-periods of a string S using poly(k, log n) bits of space and we complement these results with comparable lower bounds. We then study the problem of identifying a longest substring of strings S and T of length n that forms a d-near-alignment under the edit distance, in the simultaneous streaming model. In this model, symbols of strings S and T are streamed at the same time and form a d-near-alignment if the distance between them in some given metric is at most d. We give several algorithms, including an exact one-pass algorithm that uses O(d2 + d log n) bits of space. We then consider the distinct elements and `p-heavy hitters problems in the sliding window model, where only the most recent n elements in the data stream form the underlying set. We first introduce the composable histogram, a simple twist on the exponential (Datar et al., SODA 2002) and smooth histograms (Braverman and Ostrovsky, FOCS 2007) that may be of independent interest. We then show that the composable histogram along with a careful combination of existing techniques to track either the identity or frequency of a few specific items suffices to obtain algorithms for both distinct elements and `p-heavy hitters that is nearly optimal in both n and c. Finally, we consider the problem of estimating the maximum weighted matching of a graph whose edges are revealed in a streaming fashion. We develop a reduction from the maximum weighted matching problem to the maximum cardinality matching problem that only doubles the approximation factor of a streaming algorithm developed for the maximum cardinality matching problem. As an application, we obtain an estimator for the weight of a maximum weighted matching in bounded-arboricity graphs and in particular, a (48 + )-approximation estimator for the weight of a maximum weighted matching in planar graphs
A distributed wheel sieve algorithm using Scheduling by Multiple Edge Reversal
Number of pages: 12This paper presents a new distributed approach for generating all prime numbers in a given interval of integers. From Eratosthenes, who elaborated the first prime sieve (more than 2000 years ago), to the current generation of parallel computers, which have permitted to reach larger bounds on the interval or to obtain previous results in a shorter time, prime numbers generation still represents an attractive domain of research and plays a central role in cryptography. We propose a fully distributed algorithm for finding all primes in the interval , based on the \emph{wheel sieve} and the SMER (\emph{Scheduling by Multiple Edge Reversal}) multigraph dynamics. Given a multigraph of arbitrary topology, having nodes, an SMER-driven system is defined by the number of directed edges (arcs) between any two nodes of , and by the global period length of all ''arc reversals'' in . The new prime number generation method inherits the distributed and parallel nature of SMER and requires at most time steps
A distributed prime sieving algorithm based on SMER
Rapport interne LIPNIn this paper, we propose a fully distributed algorithm for finding all primes in an given interval (or , more generally), based on the SMER (\textit{Scheduling by Multiple Edge Reversal}) multigraph dynamics. Given a multigraph of arbitrary topology, having nodes, the SMER-driven system is defined by the numberof directed edges (arcs) between any two nodes of , and by the global period length of all ``arc reversals'' in . In the domain of prime numbers generation, such a graph method shows quite elegant, and it also yields a totally new kind of distributed prime sieving algorithms of an entirely original design. The maximum number of steps required by the algorithm is at most . Although far beyond the steps required by the improved sequential ``wheel sieve'' algorithms, our SMER-based algorithm is fully distributed and of linear (step) complexity. The message complexity of the algorithm is at most , where denotes the maximum ``multidegree'' of the arbitrary multigraph , and the space required per process is linear
- …