Search CORE

1,146 research outputs found

Functional and dynamic programming in the design of parallel prefix networks

Author: Sheeran Mary
Publication venue
Publication date: 01/01/2010
Field of study

A parallel prefix network of width n takes n inputs, a1, a2, . . ., an, and computes each yi = a1 ○ a2 ○ ⋅ ⋅ ⋅ ○ ai for 1 ≤ i ≤ n, for an associative operator ○. This is one of the fundamental problems in computer science, because it gives insight into how parallel computation can be used to solve an apparently sequential problem. As parallel programming becomes the dominant programming paradigm, parallel prefix or scan is proving to be a very important building block of parallel algorithms and applications. There are many different parallel prefix networks, with different properties such as number of operators, depth and allowed fanout from the operators. In this paper, ideas from functional programming are combined with search to enable a deep exploration of parallel prefix network design. Networks that improve on the best known previous results are generated. It is argued that precise modelling in a functional programming language, together with simple visualization of the networks, gives a new, more experimental, approach to parallel prefix network design, improving on the manual techniques typically employed in the literature. The programming idiom that marries search with higher order functions may well have wider application than the network generation described here

Chalmers Research

Chalmers Publication Library

String Matching: Communication, Circuits, and Learning

Author: Golovnev Alexander
Reichman Daniel
Shinkar Igor
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)
Publication date: 01/01/2019
Field of study

String matching is the problem of deciding whether a given n-bit string contains a given k-bit pattern. We study the complexity of this problem in three settings. - Communication complexity. For small k, we provide near-optimal upper and lower bounds on the communication complexity of string matching. For large k, our bounds leave open an exponential gap; we exhibit some evidence for the existence of a better protocol. - Circuit complexity. We present several upper and lower bounds on the size of circuits with threshold and DeMorgan gates solving the string matching problem. Similarly to the above, our bounds are near-optimal for small k. - Learning. We consider the problem of learning a hidden pattern of length at most k relative to the classifier that assigns 1 to every string that contains the pattern. We prove optimal bounds on the VC dimension and sample complexity of this problem

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

The Parallel Complexity of Growth Models

Author: A. Gibbons
D. Richardson
D. Stauffer
D. Willinson
F. E. Fich
H. Kesten
H. Kesten
J. Krug
J. Krug
J. Krug
J. M. Kim
J. Machta
J. T. Chayes
Jonathan Machta
L.-H. Tang
M. Eden
M. J. Vold
M. Kardar
M. Kardar
M. Plischke
M. Pokorny
M. R. Garey
P. Freche
P. Meakin
R. Chandler
Raymond Greenlaw
S. Roux
S. Roux
T. Vicsek
Y.-C. Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1994
Field of study

This paper investigates the parallel complexity of several non-equilibrium growth models. Invasion percolation, Eden growth, ballistic deposition and solid-on-solid growth are all seemingly highly sequential processes that yield self-similar or self-affine random clusters. Nonetheless, we present fast parallel randomized algorithms for generating these clusters. The running times of the algorithms scale as

O(\log^2 N)

, where

N

is the system size, and the number of processors required scale as a polynomial in

N

. The algorithms are based on fast parallel procedures for finding minimum weight paths; they illuminate the close connection between growth models and self-avoiding paths in random environments. In addition to their potential practical value, our algorithms serve to classify these growth models as less complex than other growth models, such as diffusion-limited aggregation, for which fast parallel algorithms probably do not exist.Comment: 20 pages, latex, submitted to J. Stat. Phys., UNH-TR94-0

arXiv.org e-Print Archive

CiteSeerX

Crossref

Towards Optimal Depth-Reductions for Algebraic Formulas

Author: Limaye Nutan
Malod Guillaume
Srinivasan Srikanth
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 38th Computational Complexity Conference (CCC 2023)
Publication date: 01/01/2023
Field of study

Classical results of Brent, Kuck and Maruyama (IEEE Trans. Computers 1973) and Brent (JACM 1974) show that any algebraic formula of size s can be converted to one of depth O(log s) with only a polynomial blow-up in size. In this paper, we consider a fine-grained version of this result depending on the degree of the polynomial computed by the algebraic formula. Given a homogeneous algebraic formula of size s computing a polynomial P of degree d, we show that P can also be computed by an (unbounded fan-in) algebraic formula of depth O(log d) and size poly(s). Our proof shows that this result also holds in the highly restricted setting of monotone, non-commutative algebraic formulas. This improves on previous results in the regime when d is small (i.e., d = s^o(1)). In particular, for the setting of d = O(log s), along with a result of Raz (STOC 2010, JACM 2013), our result implies the same depth reduction even for inhomogeneous formulas. This is particularly interesting in light of recent algebraic formula lower bounds, which work precisely in this "low-degree" and "low-depth" setting. We also show that these results cannot be improved in the monotone setting, even for commutative formulas

Dagstuhl Research Online Publication Server

Parallel RAM from Cyclic Circuits

Author: Heath David
Publication venue
Publication date: 10/09/2023
Field of study

Known simulations of random access machines (RAMs) or parallel RAMs (PRAMs) by Boolean circuits incur significant polynomial blowup, due to the need to repeatedly simulate accesses to a large main memory. Consider two modifications to Boolean circuits: (1) remove the restriction that circuit graphs are acyclic and (2) enhance AND gates such that they output zero eagerly. If an AND gate has a zero input, it 'short circuits' and outputs zero without waiting for its second input. We call this the cyclic circuit model. Note, circuits in this model remain combinational, as they do not allow wire values to change over time. We simulate a bounded-word-size PRAM via a cyclic circuit, and the blowup from the simulation is only polylogarithmic. Consider a PRAM program

P

that on a length

n

input uses an arbitrary number of processors to manipulate words of size

\Theta(\log n)

bits and then halts within

W(n)

work. We construct a size-

O(W(n)\cdot \log^4 n)

cyclic circuit that simulates

P

. Suppose that on a particular input,

P

halts in time

T

; our circuit computes the same output within

T \cdot O(\log^3 n)

gate delay. This implies theoretical feasibility of powerful parallel machines. Cyclic circuits can be implemented in hardware, and our circuit achieves performance within polylog factors of PRAM. Our simulated PRAM synchronizes processors by simply leveraging logical dependencies between wires

arXiv.org e-Print Archive

Transformers Learn Shortcuts to Automata

Author: Ash Jordan T.
Goel Surbhi
Krishnamurthy Akshay
Liu Bingbin
Zhang Cyril
Publication venue
Publication date: 02/05/2023
Field of study

Algorithmic reasoning requires capabilities which are most naturally understood through recurrent models of computation, like the Turing machine. However, Transformer models, while lacking recurrence, are able to perform such reasoning using far fewer layers than the number of reasoning steps. This raises the question: what solutions are learned by these shallow and non-recurrent models? We find that a low-depth Transformer can represent the computations of any finite-state automaton (thus, any bounded-memory algorithm), by hierarchically reparameterizing its recurrent dynamics. Our theoretical results characterize shortcut solutions, whereby a Transformer with

o(T)

layers can exactly replicate the computation of an automaton on an input sequence of length

T

. We find that polynomial-sized

O(\log T)

-depth solutions always exist; furthermore,

O(1)

-depth simulators are surprisingly common, and can be understood using tools from Krohn-Rhodes theory and circuit complexity. Empirically, we perform synthetic experiments by training Transformers to simulate a wide variety of automata, and show that shortcut solutions can be learned via standard training. We further investigate the brittleness of these solutions and propose potential mitigations

arXiv.org e-Print Archive