9,733 research outputs found
Adaptive Latency Insensitive Protocols
Latency-insensitive design copes with excessive delays typical of global wires in current and future IC technologies. It achieves its goal via encapsulation of synchronous logic blocks in wrappers that communicate through a latency-insensitive protocol (LIP) and pipelined interconnects. Previously proposed solutions suffer from an excessive performance penalty in terms of throughput or from a lack of generality. This article presents an adaptive LIP that outperforms previous static implementations, as demonstrated by two relevant cases — a microprocessor and an MPEG encoder — whose components we made insensitive to the latencies of their interconnections through a newly developed wrapper. We also present an informal exposition of the theoretical basis of adaptive LIPs, as well as implementation detail
Issues in Implementing Latency Insensitive Protocols
The performance of future Systems-on-Chip will be limited by the latency of long interconnects requiring more than one clock cycle for the signals to propagate. To deal with the problem L. Carloni et alii proposed the Latency Insensitive Protocols (LIP). A design that works under the assumption of zero-delay connections between functional modules is modified in a Latency Insensitive Design (LID) by encapsulating them within wrappers (“shells”) and connecting them through internally pipelined blocks (“relay stations”) complying with a protocol that guarantees identity of behavior [1]. The wrappers perform:- Data Validation: each output channel signals whether the datum therein present has still to be consumed.- Back Pressure: when the pearl is stopped the shell generates a stop signal sent in the opposite direction of inputs;- Clock Gating: a module waiting for new data and/or stopped keeps its present state. Such a protocol was implemented [2] through the introductio
Adaptive Latency Insensitive Protocols andElastic Circuits with Early Evaluation: A Comparative Analysis
AbstractLatency Insensitive Protocols (LIP) and Elastic Circuits (EC) solve the same problem of rendering a design tolerant to additional latencies caused by wires or computational elements. They are performance-limited by a firing semantics that enforces coherency through a lazy evaluation rule: Computation is enabled if all inputs to a block are simultaneously available. Adaptive LIP's (ALIP) and EC with early evaluation (ECEE) increase the performance by relaxing the evaluation rule: Computation is enabled as soon as the subset of inputs needed at a given time is available. Their difference in terms of implementation and behavior in selected cases justifies the need for the comparative analysis reported in this paper. Results have been obtained through simple examples, a single representative case-study already used in the context of both LIP's and EC and through extensive simulations over a suite of benchmarks
On-Chip Transparent Wire Pipelining (invited paper)
Wire pipelining has been proposed as a viable mean to break the discrepancy between decreasing gate delays and increasing wire delays in deep-submicron technologies. Far from being a straightforwardly applicable technique, this methodology requires a number of design modifications in order to insert it seamlessly in the current design flow. In this paper we briefly survey the methods presented by other researchers in the field and then we thoroughly analyze the solutions we recently proposed, ranging from system-level wire pipelining to physical design aspects
Indicating Asynchronous Array Multipliers
Multiplication is an important arithmetic operation that is frequently
encountered in microprocessing and digital signal processing applications, and
multiplication is physically realized using a multiplier. This paper discusses
the physical implementation of many indicating asynchronous array multipliers,
which are inherently elastic and modular and are robust to timing, process and
parametric variations. We consider the physical realization of many indicating
asynchronous array multipliers using a 32/28nm CMOS technology. The
weak-indication array multipliers comprise strong-indication or weak-indication
full adders, and strong-indication 2-input AND functions to realize the partial
products. The multipliers were synthesized in a semi-custom ASIC design style
using standard library cells including a custom-designed 2-input C-element. 4x4
and 8x8 multiplication operations were considered for the physical
implementations. The 4-phase return-to-zero (RTZ) and the 4-phase return-to-one
(RTO) handshake protocols were utilized for data communication, and the
delay-insensitive dual-rail code was used for data encoding. Among several
weak-indication array multipliers, a weak-indication array multiplier utilizing
a biased weak-indication full adder and the strong-indication 2-input AND
function is found to have reduced cycle time and power-cycle time product with
respect to RTZ and RTO handshaking for 4x4 and 8x8 multiplications. Further,
the 4-phase RTO handshaking is found to be preferable to the 4-phase RTZ
handshaking for achieving enhanced optimizations of the design metrics.Comment: arXiv admin note: text overlap with arXiv:1903.0943
Survey on wireless technology trade-offs for the industrial internet of things
Aside from vast deployment cost reduction, Industrial Wireless Sensor and Actuator Networks (IWSAN) introduce a new level of industrial connectivity. Wireless connection of sensors and actuators in industrial environments not only enables wireless monitoring and actuation, it also enables coordination of production stages, connecting mobile robots and autonomous transport vehicles, as well as localization and tracking of assets. All these opportunities already inspired the development of many wireless technologies in an effort to fully enable Industry 4.0. However, different technologies significantly differ in performance and capabilities, none being capable of supporting all industrial use cases. When designing a network solution, one must be aware of the capabilities and the trade-offs that prospective technologies have. This paper evaluates the technologies potentially suitable for IWSAN solutions covering an entire industrial site with limited infrastructure cost and discusses their trade-offs in an effort to provide information for choosing the most suitable technology for the use case of interest. The comparative discussion presented in this paper aims to enable engineers to choose the most suitable wireless technology for their specific IWSAN deployment
Recommended from our members
Topology-Based Performance Analysis and Optimization of Latency-Insensitive Systems
Latency-insensitive protocols allow system-on-chip engineers to decouple the design of the computing cores from the design of the inter-core communication channels while following the synchronous design paradigm. In a latency-insensitive system (LIS) each core is encapsulated within a shell, a synthesized interface module that dynamically controls its operation. At each clock period, if new data has not arrived on an input channel or a stalling request has arrived on an output channel, the shell stalls the core and buffers other incoming valid data for future processing. The combination of finite buffers and backpressure from stalling can cause throughput degradation. Previous works addressed this problem by increasing buffer space to reduce the backpressure requests or inserting extra buffering to balance the channel latency around a LIS. We explore the theoretical complexity of these approaches and propose a heuristic algorithm for efficient queue sizing. We also practically characterize several LIS topologies and how the topology of a LIS can impact not only how much throughput degradation will occur, but also the difficulty of finding optimal queue sizing solutions
Asynchronous techniques for system-on-chip design
SoC design will require asynchronous techniques as the large parameter variations across the chip will make it impossible to control delays in clock networks and other global signals efficiently. Initially, SoCs will be globally asynchronous and locally synchronous (GALS). But the complexity of the numerous asynchronous/synchronous interfaces required in a GALS will eventually lead to entirely asynchronous solutions. This paper introduces the main design principles, methods, and building blocks for asynchronous VLSI systems, with an emphasis on communication and synchronization. Asynchronous circuits with the only delay assumption of isochronic forks are called quasi-delay-insensitive (QDI). QDI is used in the paper as the basis for asynchronous logic. The paper discusses asynchronous handshake protocols for communication and the notion of validity/neutrality tests, and completion tree. Basic building blocks for sequencing, storage, function evaluation, and buses are described, and two alternative methods for the implementation of an arbitrary computation are explained. Issues of arbitration, and synchronization play an important role in complex distributed systems and especially in GALS. The two main asynchronous/synchronous interfaces needed in GALS-one based on synchronizer, the other on stoppable clock-are described and analyzed
Throughput-driven floorplanning with wire pipelining
The size of future high-performance SoC is such that the time-of-flight of wires connecting distant pins in the layout can be much higher than the clock period. In order to keep the frequency as high as possible, the wires may be pipelined. However, the insertion of flip-flops may alter the throughput of the system due to the presence of loops in the logic netlist. In this paper, we address the problem of floorplanning a large design where long interconnects are pipelined by inserting the throughput in the cost function of a tool based on simulated annealing. The results obtained on a series of benchmarks are then validated using a simple router that breaks long interconnects by suitably placing flip-flops along the wires
- …