75 research outputs found

    Long-Lived Counters with Polylogarithmic Amortized Step Complexity

    Get PDF
    A shared-memory counter is a well-studied and widely-used concurrent object. It supports two operations: An Inc operation that increases its value by 1 and a Read operation that returns its current value. Jayanti, Tan and Toueg [Jayanti et al., 2000] proved a linear lower bound on the worst-case step complexity of obstruction-free implementations, from read and write operations, of a large class of shared objects that includes counters. The lower bound leaves open the question of finding counter implementations with sub-linear amortized step complexity. In this paper, we address this gap. We present the first wait-free n-process counter, implemented using only read and write operations, whose amortized operation step complexity is O(log^2 n) in all executions. This is the first non-blocking read/write counter algorithm that provides sub-linear amortized step complexity in executions of arbitrary length. Since a logarithmic lower bound on the amortized step complexity of obstruction-free counter implementations exists, our upper bound is optimal up to a logarithmic factor

    Complexity, parallel computation and statistical physics

    Full text link
    The intuition that a long history is required for the emergence of complexity in natural systems is formalized using the notion of depth. The depth of a system is defined in terms of the number of parallel computational steps needed to simulate it. Depth provides an objective, irreducible measure of history applicable to systems of the kind studied in statistical physics. It is argued that physical complexity cannot occur in the absence of substantial depth and that depth is a useful proxy for physical complexity. The ideas are illustrated for a variety of systems in statistical physics.Comment: 21 pages, 7 figure

    Relaxed Queues and Stacks from Read/Write Operations

    Get PDF
    Considering asynchronous shared memory systems in which any number of processes may crash, this work identifies and formally defines relaxations of queues and stacks that can be non-blocking or wait-free while being implemented using only read/write operations. Set-linearizability and Interval-linearizability are used to specify the relaxations formally, and precisely identify the subset of executions which preserve the original sequential behavior. The relaxations allow for an item to be returned more than once by different operations, but only in case of concurrency; we call such a property multiplicity. The stack implementation is wait-free, while the queue implementation is non-blocking. Interval-linearizability is used to describe a queue with multiplicity, with the additional relaxation that a dequeue operation can return weak-empty, which means that the queue might be empty. We present a read/write wait-free interval-linearizable algorithm of a concurrent queue. As far as we know, this work is the first that provides formalizations of the notions of multiplicity and weak-emptiness, which can be implemented on top of read/write registers only

    A Moment-Matching Approach to Testable Learning and a New Characterization of Rademacher Complexity

    Full text link
    A remarkable recent paper by Rubinfeld and Vasilyan (2022) initiated the study of \emph{testable learning}, where the goal is to replace hard-to-verify distributional assumptions (such as Gaussianity) with efficiently testable ones and to require that the learner succeed whenever the unknown distribution passes the corresponding test. In this model, they gave an efficient algorithm for learning halfspaces under testable assumptions that are provably satisfied by Gaussians. In this paper we give a powerful new approach for developing algorithms for testable learning using tools from moment matching and metric distances in probability. We obtain efficient testable learners for any concept class that admits low-degree \emph{sandwiching polynomials}, capturing most important examples for which we have ordinary agnostic learners. We recover the results of Rubinfeld and Vasilyan as a corollary of our techniques while achieving improved, near-optimal sample complexity bounds for a broad range of concept classes and distributions. Surprisingly, we show that the information-theoretic sample complexity of testable learning is tightly characterized by the Rademacher complexity of the concept class, one of the most well-studied measures in statistical learning theory. In particular, uniform convergence is necessary and sufficient for testable learning. This leads to a fundamental separation from (ordinary) distribution-specific agnostic learning, where uniform convergence is sufficient but not necessary.Comment: 34 page

    Aspects of practical implementations of PRAM algorithms

    Get PDF
    The PRAM is a shared memory model of parallel computation which abstracts away from inessential engineering details. It provides a very simple architecture independent model and provides a good programming environment. Theoreticians of the computer science community have proved that it is possible to emulate the theoretical PRAM model using current technology. Solutions have been found for effectively interconnecting processing elements, for routing data on these networks and for distributing the data among memory modules without hotspots. This thesis reviews this emulation and the possibilities it provides for large scale general purpose parallel computation. The emulation employs a bridging model which acts as an interface between the actual hardware and the PRAM model. We review the evidence that such a scheme crn achieve scalable parallel performance and portable parallel software and that PRAM algorithms can be optimally implemented on such practical models. In the course of this review we presented the following new results: 1. Concerning parallel approximation algorithms, we describe an NC algorithm for finding an approximation to a minimum weight perfect matching in a complete weighted graph. The algorithm is conceptually very simple and it is also the first NC-approximation algorithm for the task with a sub-linear performance ratio. 2. Concerning graph embedding, we describe dense edge-disjoint embeddings of the complete binary tree with n leaves in the following n-node communication networks: the hypercube, the de Bruijn and shuffle-exchange networks and the 2-dimcnsional mesh. In the embeddings the maximum distance from a leaf to the root of the tree is asymptotically optimally short. The embeddings facilitate efficient implementation of many PRAM algorithms on networks employing these graphs as interconnection networks. 3. Concerning bulk synchronous algorithmics, we describe scalable transportable algorithms for the following three commonly required types of computation; balanced tree computations. Fast Fourier Transforms and matrix multiplications

    Metastability-containing circuits, parallel distance problems, and terrain guarding

    Get PDF
    We study three problems. The first is the phenomenon of metastability in digital circuits. This is a state of bistable storage elements, such as registers, that is neither logical 0 nor 1 and breaks the abstraction of Boolean logic. We propose a time- and value-discrete model for metastability in digital circuits and show that it reflects relevant physical properties. Further, we propose the fundamentally new approach of using logical masking to perform meaningful computations despite the presence of metastable upsets and analyze what functions can be computed in our model. Additionally, we show that circuits with masking registers grow computationally more powerful with each available clock cycle. The second topic are parallel algorithms, based on an algebraic abstraction of the Moore-Bellman-Ford algorithm, for solving various distance problems. Our focus are distance approximations that obey the triangle inequality while at the same time achieving polylogarithmic depth and low work. Finally, we study the continuous Terrain Guarding Problem. We show that it has a rational discretization with a quadratic number of guard candidates, establish its membership in NP and the existence of a PTAS, and present an efficient implementation of a solver.Wir betrachten drei Probleme, zunĂ€chst das PhĂ€nomen von MetastabilitĂ€t in digitalen Schaltungen. Dabei geht es um einen Zustand in bistabilen Speicherelementen, z.B. Registern, welcher weder logisch 0 noch 1 entspricht und die Abstraktion Boolescher Logik unterwandert. Wir prĂ€sentieren ein zeit- und wertdiskretes Modell fĂŒr MetastabilitĂ€t in digitalen Schaltungen und zeigen, dass es relevante physikalische Eigenschaften abbildet. Des Weiteren prĂ€sentieren wir den grundlegend neuen Ansatz, trotz auftretender MetastabilitĂ€t mit Hilfe von logischem Maskieren sinnvolle Berechnungen durchzufĂŒhren und bestimmen, welche Funktionen in unserem Modell berechenbar sind. DarĂŒber hinaus zeigen wir, dass durch Maskingregister in zusĂ€tzlichen Taktzyklen mehr Funktionen berechenbar werden. Das zweite Thema sind parallele Algorithmen die, basierend auf einer Algebraisierung des Moore-Bellman-Ford-Algorithmus, diverse Distanzprobleme lösen. Der Fokus liegt auf Distanzapproximationen unter Einhaltung der Dreiecksungleichung bei polylogarithmischer Tiefe und niedriger Arbeit. Abschließend betrachten wir das kontinuierliche Terrain Guarding Problem. Wir zeigen, dass es eine rationale Diskretisierung mit einer quadratischen Anzahl von WĂ€chterpositionen erlaubt, folgern dass es in NP liegt und ein PTAS existiert und prĂ€sentieren eine effiziente Implementierung, die es löst

    Parallel RAM from Cyclic Circuits

    Full text link
    Known simulations of random access machines (RAMs) or parallel RAMs (PRAMs) by Boolean circuits incur significant polynomial blowup, due to the need to repeatedly simulate accesses to a large main memory. Consider two modifications to Boolean circuits: (1) remove the restriction that circuit graphs are acyclic and (2) enhance AND gates such that they output zero eagerly. If an AND gate has a zero input, it 'short circuits' and outputs zero without waiting for its second input. We call this the cyclic circuit model. Note, circuits in this model remain combinational, as they do not allow wire values to change over time. We simulate a bounded-word-size PRAM via a cyclic circuit, and the blowup from the simulation is only polylogarithmic. Consider a PRAM program PP that on a length nn input uses an arbitrary number of processors to manipulate words of size Θ(log⁡n)\Theta(\log n) bits and then halts within W(n)W(n) work. We construct a size-O(W(n)⋅log⁡4n)O(W(n)\cdot \log^4 n) cyclic circuit that simulates PP. Suppose that on a particular input, PP halts in time TT; our circuit computes the same output within T⋅O(log⁡3n)T \cdot O(\log^3 n) gate delay. This implies theoretical feasibility of powerful parallel machines. Cyclic circuits can be implemented in hardware, and our circuit achieves performance within polylog factors of PRAM. Our simulated PRAM synchronizes processors by simply leveraging logical dependencies between wires

    35th Symposium on Theoretical Aspects of Computer Science: STACS 2018, February 28-March 3, 2018, Caen, France

    Get PDF

    Progress Report : 1991 - 1994

    Get PDF
    • 

    corecore