Search CORE

17 research outputs found

Current Trends in High-Level Synthesis of Asynchronous Circuits

Author: Sparsø Jens
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Synchronous elasticization: considerations for correct implementation and miniMIPS case study

Author: Kilada Eliyah
Stevens Kenneth
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Journal ArticleLatency insensitivity is a promising design paradigm in the nanometer era since it has potential benefits of increased modularity and robustness to variations. Synchronous elasticization is one approach (among others) of transforming an ordinary clocked circuit into a latency insensitive design. This paper presents practical considerations of elasticizing reconvergent fanouts. It also investigates the suitability of previously published as well as new join and fork implementations for usage in the elastic control network. We demonstrate that elasticization comes at a cost. Measurements of a MiniMIPS processor fabricated in a 0.5 μm node show that elasticization results in area and dynamic and idle power penalties of 29%, 13% and 58.3%, respectively, without any loss in performance. These measurements do not exploit the capability of pipeline bubbles that occur if one needs to have unpredictable interface latency, or to insert extra bubbles into a pipeline due to wire delays. We finally show the architectural performance advantage of eager over lazy protocols in the presence of bubbles in the MiniMIPS

The University of Utah: J. Willard Marriott Digital Library

Synthesis of synchronous elastic architectures

Author: Cortadella Jordi
Grundmann Bill
Kishinevsky Michael
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2006
Field of study

A simple protocol for latency-insensitive design is presented. The main features of the protocol are the efficient implementation of elastic communication channels and the automatable design methodology. With this approach, fine-granularity elasticity can be introduced at the level of functional units (e.g. ALUs, memories). A formal specification of the protocol is defined and an efficient scheme for the implementation of elasticity that involves no datapath overhead is presented. The opportunities this protocol opens for microarchitectural design are discussed.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Performance optimization of elastic systems using buffer resizing and buffer insertion

Author: Bufistov Dmitry
Cortadella Jordi
Julvez Bueno Jorge Emilio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Buffer resizing and buffer insertion are two transformation techniques for the performance optimization of elastic systems. Different approaches for each technique have already been proposed in the literature. Both techniques increase the storage capacity and can potentially contribute to improve the throughput of the system. Each technique offers a different trade-off between area cost and latency. This paper presents a method that combines both techniques to achieve the maximum possible throughput while minimizing the cost of the implementation. The provided method is based on mixed integer linear programming. A set of experiments is designed to show the feasibility of the approach.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

System-on-chip Computing and Interconnection Architectures for Telecommunications and Signal Processing

Author: L'INSALATA NICOLA EUGENIO
Publication venue: 'Pisa University Press'
Publication date: 09/06/2048
Field of study

This dissertation proposes novel architectures and design techniques targeting SoC building blocks for telecommunications and signal processing applications. Hardware implementation of Low-Density Parity-Check decoders is approached at both the algorithmic and the architecture level. Low-Density Parity-Check codes are a promising coding scheme for future communication standards due to their outstanding error correction performance. This work proposes a methodology for analyzing effects of finite precision arithmetic on error correction performance and hardware complexity. The methodology is throughout employed for co-designing the decoder. First, a low-complexity check node based on the P-output decoding principle is designed and characterized on a CMOS standard-cells library. Results demonstrate implementation loss below 0.2 dB down to BER of 10^{-8} and a saving in complexity up to 59% with respect to other works in recent literature. High-throughput and low-latency issues are addressed with modified single-phase decoding schedules. A new "memory-aware" schedule is proposed requiring down to 20% of memory with respect to the traditional two-phase flooding decoding. Additionally, throughput is doubled and logic complexity reduced of 12%. These advantages are traded-off with error correction performance, thus making the solution attractive only for long codes, as those adopted in the DVB-S2 standard. The "layered decoding" principle is extended to those codes not specifically conceived for this technique. Proposed architectures exhibit complexity savings in the order of 40% for both area and power consumption figures, while implementation loss is smaller than 0.05 dB. Most modern communication standards employ Orthogonal Frequency Division Multiplexing as part of their physical layer. The core of OFDM is the Fast Fourier Transform and its inverse in charge of symbols (de)modulation. Requirements on throughput and energy efficiency call for FFT hardware implementation, while ubiquity of FFT suggests the design of parametric, re-configurable and re-usable IP hardware macrocells. In this context, this thesis describes an FFT/IFFT core compiler particularly suited for implementation of OFDM communication systems. The tool employs an accuracy-driven configuration engine which automatically profiles the internal arithmetic and generates a core with minimum operands bit-width and thus minimum circuit complexity. The engine performs a closed-loop optimization over three different internal arithmetic models (fixed-point, block floating-point and convergent block floating-point) using the numerical accuracy budget given by the user as a reference point. The flexibility and re-usability of the proposed macrocell are illustrated through several case studies which encompass all current state-of-the-art OFDM communications standards (WLAN, WMAN, xDSL, DVB-T/H, DAB and UWB). Implementations results are presented for two deep sub-micron standard-cells libraries (65 and 90 nm) and commercially available FPGA devices. Compared with other FFT core compilers, the proposed environment produces macrocells with lower circuit complexity and same system level performance (throughput, transform size and numerical accuracy). The final part of this dissertation focuses on the Network-on-Chip design paradigm whose goal is building scalable communication infrastructures connecting hundreds of core. A low-complexity link architecture for mesochronous on-chip communication is discussed. The link enables skew constraint looseness in the clock tree synthesis, frequency speed-up, power consumption reduction and faster back-end turnarounds. The proposed architecture reaches a maximum clock frequency of 1 GHz on 65 nm low-leakage CMOS standard-cells library. In a complex test case with a full-blown NoC infrastructure, the link overhead is only 3% of chip area and 0.5% of leakage power consumption. Finally, a new methodology, named metacoding, is proposed. Metacoding generates correct-by-construction technology independent RTL codebases for NoC building blocks. The RTL coding phase is abstracted and modeled with an Object Oriented framework, integrated within a commercial tool for IP packaging (Synopsys CoreTools suite). Compared with traditional coding styles based on pre-processor directives, metacoding produces 65% smaller codebases and reduces the configurations to verify up to three orders of magnitude

Electronic Thesis and Dissertation Archive - Università di Pisa

Correct-by-construction microarchitectural pipelining

Author: Cortadella Jordi
Galcerán Oms Marc
Kam Timothy
Kishinevsky Michael
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

This paper presents a method for correct-by-construction microarchitectural pipelining that handles cyclic systems with dependencies between iterations. Our method combines previously known bypass and retiming transformations with a few transformations valid only for elastic systems with early evaluation (namely, empty FIFO insertion, FIFO capacity sizing, insertion of anti-tokens, and introducing early evaluation multiplexors). By converting the design to a synchronous elastic form and then applying this extended set of transformations, one can pipeline a functional specification with an automatically generated distributed controller that implements stalling logic resolving data hazards off the critical path of the design. We have developed an interactive toolkit for exploring elastic microarchitectural transformations. The method is illustrated by pipelining a few simple examples of instruction set architecture ISA specifications.Peer ReviewedPostprint (published version

CiteSeerX

UPCommons. Portal del coneixement obert de la UPC

Elastic systems

Author: Cortadella Jordi
Galcerán Oms Marc
Kishinevsky Michael
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Elastic systems provide tolerance to the variations in computation and communication delays. The incorporation of elasticity opens new opportunities for optimization using new correct-by-construction transformations that cannot be applied to rigid non-elastic systems. The basics of synchronous and asynchronous elastic systems will be reviewed. A set of behavior-preserving transformations will be presented: retiming, recycling, early evaluation, variable-latency units and speculative execution. The application of these transformations for performance and power optimization will be discussed. Finally, a novel framework for microarchitectural exploration will be introduced, showing that the optimal pipelining of a circuit can be automatically obtained by using the previous transformations.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

Topology-Based Performance Analysis and Optimization of Latency-Insensitive Systems

Author: Carloni Luca
Collins Rebecca L.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2008
Field of study

Latency-insensitive protocols allow system-on-chip engineers to decouple the design of the computing cores from the design of the inter-core communication channels while following the synchronous design paradigm. In a latency-insensitive system (LIS) each core is encapsulated within a shell, a synthesized interface module that dynamically controls its operation. At each clock period, if new data has not arrived on an input channel or a stalling request has arrived on an output channel, the shell stalls the core and buffers other incoming valid data for future processing. The combination of finite buffers and backpressure from stalling can cause throughput degradation. Previous works addressed this problem by increasing buffer space to reduce the backpressure requests or inserting extra buffering to balance the channel latency around a LIS. We explore the theoretical complexity of these approaches and propose a heuristic algorithm for efficient queue sizing. We also practically characterize several LIS topologies and how the topology of a LIS can impact not only how much throughput degradation will occur, but also the difficulty of finding optimal queue sizing solutions

Columbia University Academic Commons