912 research outputs found
Singularity analysis, Hadamard products, and tree recurrences
We present a toolbox for extracting asymptotic information on the
coefficients of combinatorial generating functions. This toolbox notably
includes a treatment of the effect of Hadamard products on singularities in the
context of the complex Tauberian technique known as singularity analysis. As a
consequence, it becomes possible to unify the analysis of a number of
divide-and-conquer algorithms, or equivalently random tree models, including
several classical methods for sorting, searching, and dynamically managing
equivalence relationsComment: 47 pages. Submitted for publicatio
Rational series and asymptotic expansion for linear homogeneous divide-and-conquer recurrences
Among all sequences that satisfy a divide-and-conquer recurrence, the
sequences that are rational with respect to a numeration system are certainly
the most immediate and most essential. Nevertheless, until recently they have
not been studied from the asymptotic standpoint. We show how a mechanical
process permits to compute their asymptotic expansion. It is based on linear
algebra, with Jordan normal form, joint spectral radius, and dilation
equations. The method is compared with the analytic number theory approach,
based on Dirichlet series and residues, and new ways to compute the Fourier
series of the periodic functions involved in the expansion are developed. The
article comes with an extended bibliography
Extending the Nested Parallel Model to the Nested Dataflow Model with Provably Efficient Schedulers
The nested parallel (a.k.a. fork-join) model is widely used for writing
parallel programs. However, the two composition constructs, i.e. ""
(parallel) and "" (serial), are insufficient in expressing "partial
dependencies" or "partial parallelism" in a program. We propose a new dataflow
composition construct "" to express partial dependencies in
algorithms in a processor- and cache-oblivious way, thus extending the Nested
Parallel (NP) model to the \emph{Nested Dataflow} (ND) model. We redesign
several divide-and-conquer algorithms ranging from dense linear algebra to
dynamic-programming in the ND model and prove that they all have optimal span
while retaining optimal cache complexity. We propose the design of runtime
schedulers that map ND programs to multicore processors with multiple levels of
possibly shared caches (i.e, Parallel Memory Hierarchies) and provide
theoretical guarantees on their ability to preserve locality and load balance.
For this, we adapt space-bounded (SB) schedulers for the ND model. We show that
our algorithms have increased "parallelizability" in the ND model, and that SB
schedulers can use the extra parallelizability to achieve asymptotically
optimal bounds on cache misses and running time on a greater number of
processors than in the NP model. The running time for the algorithms in this
paper is , where is the cache complexity of task ,
is the cost of cache miss at level- cache which is of size ,
is a constant, and is the number of processors in an
-level cache hierarchy
Deriving divide-and-conquer dynamic programming algorithms using solver-aided transformations
We introduce a framework allowing domain experts to manipulate computational terms in the interest of deriving better, more efficient implementations.It employs deductive reasoning to generate provably correct efficient implementations from a very high-level specification of an algorithm, and inductive constraint-based synthesis to improve automation. Semantic information is encoded into program terms through the use of refinement types.
In this paper, we develop the technique in the context of a system called Bellmania that uses solver-aided tactics to derive parallel divide-and-conquer implementations of dynamic programming algorithms that have better locality and are significantly more efficient than traditional loop-based implementations. Bellmania includes a high-level language for specifying dynamic programming algorithms and a calculus that facilitates gradual transformation of these specifications into efficient implementations. These transformations formalize the divide-and conquer technique; a visualization interface helps users to interactively guide the process, while an SMT-based back-end verifies each step and takes care of low-level reasoning required for parallelism.
We have used the system to generate provably correct implementations of several algorithms, including some important algorithms from computational biology, and show that the performance is comparable to that of the best manually optimized code.National Science Foundation (U.S.) (CCF-1139056)National Science Foundation (U.S.) (CCF- 1439084)National Science Foundation (U.S.) (CNS-1553510
- …