403 research outputs found

    Metastability-Containing Circuits

    Get PDF
    In digital circuits, metastability can cause deteriorated signals that neither are logical 0 or logical 1, breaking the abstraction of Boolean logic. Unfortunately, any way of reading a signal from an unsynchronized clock domain or performing an analog-to-digital conversion incurs the risk of a metastable upset; no digital circuit can deterministically avoid, resolve, or detect metastability (Marino, 1981). Synchronizers, the only traditional countermeasure, exponentially decrease the odds of maintained metastability over time. Trading synchronization delay for an increased probability to resolve metastability to logical 0 or 1, they do not guarantee success. We propose a fundamentally different approach: It is possible to contain metastability by fine-grained logical masking so that it cannot infect the entire circuit. This technique guarantees a limited degree of metastability in---and uncertainty about---the output. At the heart of our approach lies a time- and value-discrete model for metastability in synchronous clocked digital circuits. Metastability is propagated in a worst-case fashion, allowing to derive deterministic guarantees, without and unlike synchronizers. The proposed model permits positive results and passes the test of reproducing Marino's impossibility results. We fully classify which functions can be computed by circuits with standard registers. Regarding masking registers, we show that they become computationally strictly more powerful with each clock cycle, resulting in a non-trivial hierarchy of computable functions

    Aspects of practical implementations of PRAM algorithms

    Get PDF
    The PRAM is a shared memory model of parallel computation which abstracts away from inessential engineering details. It provides a very simple architecture independent model and provides a good programming environment. Theoreticians of the computer science community have proved that it is possible to emulate the theoretical PRAM model using current technology. Solutions have been found for effectively interconnecting processing elements, for routing data on these networks and for distributing the data among memory modules without hotspots. This thesis reviews this emulation and the possibilities it provides for large scale general purpose parallel computation. The emulation employs a bridging model which acts as an interface between the actual hardware and the PRAM model. We review the evidence that such a scheme crn achieve scalable parallel performance and portable parallel software and that PRAM algorithms can be optimally implemented on such practical models. In the course of this review we presented the following new results: 1. Concerning parallel approximation algorithms, we describe an NC algorithm for finding an approximation to a minimum weight perfect matching in a complete weighted graph. The algorithm is conceptually very simple and it is also the first NC-approximation algorithm for the task with a sub-linear performance ratio. 2. Concerning graph embedding, we describe dense edge-disjoint embeddings of the complete binary tree with n leaves in the following n-node communication networks: the hypercube, the de Bruijn and shuffle-exchange networks and the 2-dimcnsional mesh. In the embeddings the maximum distance from a leaf to the root of the tree is asymptotically optimally short. The embeddings facilitate efficient implementation of many PRAM algorithms on networks employing these graphs as interconnection networks. 3. Concerning bulk synchronous algorithmics, we describe scalable transportable algorithms for the following three commonly required types of computation; balanced tree computations. Fast Fourier Transforms and matrix multiplications

    Distributed Approximation Algorithms for Weighted Shortest Paths

    Full text link
    A distributed network is modeled by a graph having nn nodes (processors) and diameter DD. We study the time complexity of approximating {\em weighted} (undirected) shortest paths on distributed networks with a O(log⁥n)O(\log n) {\em bandwidth restriction} on edges (the standard synchronous \congest model). The question whether approximation algorithms help speed up the shortest paths (more precisely distance computation) was raised since at least 2004 by Elkin (SIGACT News 2004). The unweighted case of this problem is well-understood while its weighted counterpart is fundamental problem in the area of distributed approximation algorithms and remains widely open. We present new algorithms for computing both single-source shortest paths (\sssp) and all-pairs shortest paths (\apsp) in the weighted case. Our main result is an algorithm for \sssp. Previous results are the classic O(n)O(n)-time Bellman-Ford algorithm and an O~(n1/2+1/2k+D)\tilde O(n^{1/2+1/2k}+D)-time (8k⌈log⁥(k+1)⌉−1)(8k\lceil \log (k+1) \rceil -1)-approximation algorithm, for any integer k≄1k\geq 1, which follows from the result of Lenzen and Patt-Shamir (STOC 2013). (Note that Lenzen and Patt-Shamir in fact solve a harder problem, and we use O~(⋅)\tilde O(\cdot) to hide the O(\poly\log n) term.) We present an O~(n1/2D1/4+D)\tilde O(n^{1/2}D^{1/4}+D)-time (1+o(1))(1+o(1))-approximation algorithm for \sssp. This algorithm is {\em sublinear-time} as long as DD is sublinear, thus yielding a sublinear-time algorithm with almost optimal solution. When DD is small, our running time matches the lower bound of Ω~(n1/2+D)\tilde \Omega(n^{1/2}+D) by Das Sarma et al. (SICOMP 2012), which holds even when D=Θ(log⁥n)D=\Theta(\log n), up to a \poly\log n factor.Comment: Full version of STOC 201

    Visibility-Related Problems on Parallel Computational Models

    Get PDF
    Visibility-related problems find applications in seemingly unrelated and diverse fields such as computer graphics, scene analysis, robotics and VLSI design. While there are common threads running through these problems, most existing solutions do not exploit these commonalities. With this in mind, this thesis identifies these common threads and provides a unified approach to solve these problems and develops solutions that can be viewed as template algorithms for an abstract computational model. A template algorithm provides an architecture independent solution for a problem, from which solutions can be generated for diverse computational models. In particular, the template algorithms presented in this work lead to optimal solutions to various visibility-related problems on fine-grain mesh connected computers such as meshes with multiple broadcasting and reconfigurable meshes, and also on coarse-grain multicomputers. Visibility-related problems studied in this thesis can be broadly classified into Object Visibility and Triangulation problems. To demonstrate the practical relevance of these algorithms, two of the fundamental template algorithms identified as powerful tools in almost every algorithm designed in this work were implemented on an IBM-SP2. The code was developed in the C language, using MPI, and can easily be ported to many commercially available parallel computers

    Metastability-Containing Circuits

    No full text
    Communication across unsynchronized clock domains is inherently vulnerable to metastable upsets; no digital circuit can deterministically avoid, resolve, or detect metastability (Marino, 1981). Traditionally, a possibly metastable input is stored in synchronizers, decreasing the odds of maintained metastability over time. This approach costs time, and does not guarantee success. We propose a fundamentally different approach: It is possible to \emph{contain} metastability by logical masking, so that it cannot infect the entire circuit. This technique guarantees a limited degree of metastability in---and uncertainty about---the output. We present a synchronizer-free, fault-tolerant clock synchronization algorithm as application, synchronizing clock domains and thus enabling metastability-free communication. At the heart of our approach lies a model for metastability in synchronous clocked digital circuits. Metastability is propagated in a worst-case fashion, allowing to derive deterministic guarantees, without and unlike synchronizers. The proposed model permits positive results while at the same time reproducing established impossibility results regarding avoidance, resolution, and detection of metastability. Furthermore, we fully classify which functions can be computed by synchronous circuits with standard registers, and show that masking registers are computationally strictly more powerful

    Lower bounds on systolic gossip

    Get PDF
    AbstractGossiping is an extensively investigated information dissemination process in which each processor has a distinct item of information and has to collect all the items possessed by the other processors. In this paper we provide an innovative and general lower bound technique relying on the novel notion of delay digraph of a gossiping protocol and on the use of matrix norm methods. Such a technique is very powerful and allows the determination of new and significantly improved lower bounds in many cases. In fact, we derive the first general lower bound on the gossiping time of systolic protocols, i.e., constituted by a periodic repetition of simple communication steps. In particular, given any network of n processors and any systolic period s, in the directed and the undirected half-duplex cases every s-systolic gossip protocol takes at least log(n)/log(1/λ)−O(loglog(n)) time steps, where λ is the unique solution between 0 and 1 of λ·p⌊s/2⌋(λ)·p⌈s/2⌉(λ)=1, with pi(λ)=1+λ2+⋯+λ2i−2 for any integer i>0. We then provide improved lower bounds in the directed and half-duplex cases for many well-known network topologies, such as Butterfly, de Bruijn, and Kautz graphs. All the results are extended also to the full-duplex case. Our technique is very general, as for s→∞ it allows the determination of improved results even for non-systolic protocols. In fact, for general networks, as a simple corollary it yields a lower bound only an O(loglog(n)) additive factor far from the general one independently proved in [Proc. 1st ACM Symposium on Parallel Algorithms and Architectures (SPAA), 1989, p. 318; Topics in Combinatorics and Graph Theory (1990) 451; SIAM Journal on Computing 21(1) (1992) 111; Discrete Applied Mathematics 42 (1993) 75] for all graphs and any (non-systolic) gossip protocol. Moreover, for specific networks, it significantly improves with respect to the previously known results, even in the full-duplex case. Correspondingly, better lower bounds on the gossiping time of non-systolic protocols are determined in the directed, half-duplex and full-duplex cases for Butterfly, de Bruijn, and Kautz graphs. Even if in this paper we give only a limited number of examples, our technique has wide applicability and gives a general framework that often allows to get improved lower bounds on the gossiping time of systolic and non-systolic protocols in the directed, half-duplex and full-duplex cases

    Fault Secure Encoder and Decoder for NanoMemory Applications

    Get PDF
    Memory cells have been protected from soft errors for more than a decade; due to the increase in soft error rate in logic circuits, the encoder and decoder circuitry around the memory blocks have become susceptible to soft errors as well and must also be protected. We introduce a new approach to design fault-secure encoder and decoder circuitry for memory designs. The key novel contribution of this paper is identifying and defining a new class of error-correcting codes whose redundancy makes the design of fault-secure detectors (FSD) particularly simple. We further quantify the importance of protecting encoder and decoder circuitry against transient errors, illustrating a scenario where the system failure rate (FIT) is dominated by the failure rate of the encoder and decoder. We prove that Euclidean geometry low-density parity-check (EG-LDPC) codes have the fault-secure detector capability. Using some of the smaller EG-LDPC codes, we can tolerate bit or nanowire defect rates of 10% and fault rates of 10^(-18) upsets/device/cycle, achieving a FIT rate at or below one for the entire memory system and a memory density of 10^(11) bit/cm^2 with nanowire pitch of 10 nm for memory blocks of 10 Mb or larger. Larger EG-LDPC codes can achieve even higher reliability and lower area overhead

    Polyvalent Parallelizations for Hierarchical Block Matching Motion Estimation

    Get PDF
    Block matching motion estimation algorithms are widely used in video coding schemes. In this paper,we design an efficient hierarchical block matching motion estimation (HBMME) algorithm on a hypercube multiprocessor. Unlike systolic array designs, this solution is not tied down to specific values of algorithm parameters and thus offers increased flexibility. Moreover, the hypercube network can efficiently handle the non regular data flow of the HBMME algorithm. Our techniques nearly eliminate the occurrence of “difficult” communication patterns, namely many-to-many personalized communication, by replacing them with simple shift operations. These operations have an efficient implementation on most of interconnection networks and thus our techniques can be adapted to other networks as well. With regard to the employed multiprocessor we make no specific assumption about the amount of local memory residing in each processor. Instead, we introduce a free parameter S and assume that each processor has O(S) local memory. By doing so, we handle all the cases of modern multiprocessors, that is fine-grained, medium-grained and coarse-grained multiprocessors and thus our design is quite general
    • 

    corecore