23,000 research outputs found
Recommended from our members
The use of Petri nets for modeling pipelined processors
This paper discusses the use of Petri Nets for modeling and analyzing pipelined processors. Petri Nets are particularly well-suited to modeling the synchronization, buffering, resource contention and delicate timing so common in pipelined processors. Tools for simulating, animating and analyzing the behavior of the models are described. The usefulness of the tools and the analysis methods they support in evaluating the performance and analyzing the detailed timing of pipelined microprocessors is illustrated through an example
Numerically Stable Recurrence Relations for the Communication Hiding Pipelined Conjugate Gradient Method
Pipelined Krylov subspace methods (also referred to as communication-hiding
methods) have been proposed in the literature as a scalable alternative to
classic Krylov subspace algorithms for iteratively computing the solution to a
large linear system in parallel. For symmetric and positive definite system
matrices the pipelined Conjugate Gradient method outperforms its classic
Conjugate Gradient counterpart on large scale distributed memory hardware by
overlapping global communication with essential computations like the
matrix-vector product, thus hiding global communication. A well-known drawback
of the pipelining technique is the (possibly significant) loss of numerical
stability. In this work a numerically stable variant of the pipelined Conjugate
Gradient algorithm is presented that avoids the propagation of local rounding
errors in the finite precision recurrence relations that construct the Krylov
subspace basis. The multi-term recurrence relation for the basis vector is
replaced by two-term recurrences, improving stability without increasing the
overall computational cost of the algorithm. The proposed modification ensures
that the pipelined Conjugate Gradient method is able to attain a highly
accurate solution independently of the pipeline length. Numerical experiments
demonstrate a combination of excellent parallel performance and improved
maximal attainable accuracy for the new pipelined Conjugate Gradient algorithm.
This work thus resolves one of the major practical restrictions for the
useability of pipelined Krylov subspace methods.Comment: 15 pages, 5 figures, 1 table, 2 algorithm
Comparison of Various Pipelined and Non-Pipelined SCl 8051 ALUs
This paper describes the development of an 8-bit SCL 8051 ALU with two versions: SCL 8051 ALU with nsleep and sleep signals and SCL 8051 ALU without nsleep. Both versions have combinational logic (C/L), registers, and completion components, which all utilize slept gates. Both three-stage pipelined and non-pipelined designs were examined for both versions. The four designs were compared in terms of area, speed, leakage power, average power and energy per operation. The SCL 8051 ALU without nsleep is smaller and faster, but it has greater leakage power. It also has lower average power, and less energy consumption than the SCL 8051 ALU with both nsleep and sleep signals. The pipelined SCL 8051 ALU is bigger, slower, and has larger leakage power, average power and energy consumption than the non-pipelined SCL 8051 ALU
Two-level pipelined systolic array graphics engine
The authors report a VLSI design of an advanced systolic array graphics (SAG) engine built from pipelined functional units which can generate realistic images interactively for high-resolution displays. They introduce a structured frame store system as an environment for the advanced SAG engine and present the principles and architecture of the advanced SAG engine. They introduce pipelined functional units into this SAG engine to meet the performance requirements. This is done by a formal approach where the original systolic array is represented at bit level by a finite, vertex-weighted, edge-weighted, directed graph. Two architectures built from pipelined functional units are described. A prototype containing nine processing elements was fabricated in a 1.6-Âżm CMOS technolog
Synthesizing Multiple Boolean Functions using Interpolation on a Single Proof
It is often difficult to correctly implement a Boolean controller for a
complex system, especially when concurrency is involved. Yet, it may be easy to
formally specify a controller. For instance, for a pipelined processor it
suffices to state that the visible behavior of the pipelined system should be
identical to a non-pipelined reference system (Burch-Dill paradigm). We present
a novel procedure to efficiently synthesize multiple Boolean control signals
from a specification given as a quantified first-order formula (with a specific
quantifier structure). Our approach uses uninterpreted functions to abstract
details of the design. We construct an unsatisfiable SMT formula from the given
specification. Then, from just one proof of unsatisfiability, we use a variant
of Craig interpolation to compute multiple coordinated interpolants that
implement the Boolean control signals. Our method avoids iterative learning and
back-substitution of the control functions. We applied our approach to
synthesize a controller for a simple two-stage pipelined processor, and present
first experimental results.Comment: This paper originally appeared in FMCAD 2013,
http://www.cs.utexas.edu/users/hunt/FMCAD/FMCAD13/index.shtml. This version
includes an appendix that is missing in the conference versio
Pipelined genetic propagation
© 2015 IEEE.Genetic Algorithms (GAs) are a class of numerical and combinatorial optimisers which are especially useful for solving complex non-linear and non-convex problems. However, the required execution time often limits their application to small-scale or latency-insensitive problems, so techniques to increase the computational efficiency of GAs are needed. FPGA-based acceleration has significant potential for speeding up genetic algorithms, but existing FPGA GAs are limited by the generational approaches inherited from software GAs. Many parts of the generational approach do not map well to hardware, such as the large shared population memory and intrinsic loop-carried dependency. To address this problem, this paper proposes a new hardware-oriented approach to GAs, called Pipelined Genetic Propagation (PGP), which is intrinsically distributed and pipelined. PGP represents a GA solver as a graph of loosely coupled genetic operators, which allows the solution to be scaled to the available resources, and also to dynamically change topology at run-time to explore different solution strategies. Experiments show that pipelined genetic propagation is effective in solving seven different applications. Our PGP design is 5 times faster than a recent FPGA-based GA system, and 90 times faster than a CPU-based GA system
Pipelined Two-Operand Modular Adders
Pipelined two-operand modular adder (TOMA) is one of basic components used in digital signal processing (DSP) systems that use the residue number system (RNS). Such modular adders are used in binary/residue and residue/binary converters, residue multipliers and scalers as well as within residue processing channels. The design of pipelined TOMAs is usually obtained by inserting an appriopriate number of latch layers inside a nonpipelined TOMA structure. Hence their area is also determined by the number of latches and the delay by the number of latch layers. In this paper we propose a new pipelined TOMA that is based on a new TOMA, that has the smaller area and smaller delay than other known structures. Comparisons are made using data from the very large scale of integration (VLSI) standard cell library
Pipelined Asynchronous Circuits
This thesis presents a design style for implementing communicating sequential processes (CSP) as quasi delay insensitive asynchronous circuits, based on the compilation method of [1]. Although hand compilation can always yield optimal circuits to a good designer, a restricted approach is suggested which can easily implement circuits with some slack between inputs and outputs. These circuits are fast and versatile building blocks for highly pipelined designs. The first chapter presents the implementation approach for individual cells. The second chapter investigates the time behavior of complex pipelined circuits, with the goal of adding slack where necessary and adjusting transistor sizes to optimize the overall throughput
- …