Search CORE

7 research outputs found

Performance optimization of elastic systems using buffer resizing and buffer insertion

Author: Bufistov Dmitry
Cortadella Jordi
Julvez Bueno Jorge Emilio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Buffer resizing and buffer insertion are two transformation techniques for the performance optimization of elastic systems. Different approaches for each technique have already been proposed in the literature. Both techniques increase the storage capacity and can potentially contribute to improve the throughput of the system. Each technique offers a different trade-off between area cost and latency. This paper presents a method that combines both techniques to achieve the maximum possible throughput while minimizing the cost of the implementation. The provided method is based on mixed integer linear programming. A set of experiments is designed to show the feasibility of the approach.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

LID: Retry Relay Station and Fusion Shell

Author: Boucaron Julien
Coadou Anthony
de Simone Robert
Publication venue: HAL CCSD
Publication date: 01/01/2009
Field of study

This paper is electronically published in Electonic Notes In Theoretical Computer Science http://dx.doi.org/10.1016/j.entcs.2009.07.026This paper introduces a new variant implementation of Latency-Insensitive Design elements. It optimizes area footprint of so-called Shell-Wrappers being partially fused with their input Relay-Stations. The modified Relay-Station is called a Retry Relay-Station. We show correctness of this implementation and provide comparative results between a regular implementation and our new one on both FPGA and ASIC

INRIA a CCSD electronic archive server

All digital skew tolerant synchronous interfacing methods for high-Performance point-to-point communication in DSM SoCs

Author: Bélanger Normand
Hasan Syed Rafay
Savaria Yvon
Publication venue
Publication date: 01/12/2008
Field of study

High-performance clocking of IPs, within a skew budget, is becoming difficult in Deep Sub-Micron technologies. Therefore, the concept of local islands of independent clocks prevails in SoCs, which can communicate using various synchronous and asynchronous interfacing methodologies. However, asynchronous methods are inadequately supported in the context of conventional synchronous design flows, and are also associated with substantial failure rates. By contrast, synchronous interfacing methods often require PLL based synchronization, which requires phase correction that consumes useful bandwidth and mixed signal components. This work proposes a novel and all digital synchronous design method for point-to-point communications, using n interfacing registers and locally delayed clocks with phase adjustments. An overall improvement in skew tolerance of up to n/2 to n times, compared to conventional designs, is obtained depending on the context. This is proven analytically. The modules are assumed to have same or integer multiple frequencies. Gate-level simulations are used to validate the analytical results. A proof of concept implementation of the proposed design is demonstrated using a Virtex-II Pro FPGA from Xilinx

PolyPublie

Inter-module Interfacing techniques for SoCs with multiple clock domains to address challenges in modern deep sub-micron technologies

Author: Hasan Syed Rafay
Publication venue
Publication date: 01/01/2009
Field of study

Miniaturization of integrated circuits (ICs) due to the improvement in lithographic techniques in modem deep sub-micron (DSM) technologies allows several complex processing elements to coexist in one IC, which are called System-on-Chip. As a first contribution, this thesis quantitatively analyzes the severity of timing constraints associated with Clock Distribution Network (CDN) in modem DSM technologies and shows that different processing elements may work in different dock domains to alleviate these constraints. Such systems are known as Globally Asynchronous Locally Synchronous (GALS) systems. It is imperative that different processing elements of a GALS system need to communicate with each other through some interfacing technique, and these interfaces can be asynchronous or synchronous. Conventionally, the asynchronous interfaces are described at the Register Transfer Logic (RTL) or system level. Such designs are susceptible to certain design constraints that cannot be addressed at higher abstraction levels; crosstalk glitch is one such constraint. This thesis initially identifies, using an analytical model, the possibility of asynchronous interface malfunction due to crosstalk glitch propagation. Next, we characterize crosstalk glitch propagation under normal operating conditions for two different classes of asynchronous protocols, namely bundled data protocol based and delay insensitive asynchronous designs. Subsequently, we propose a logic abstraction level modeling technique, which provides a framework to the designer to verify the asynchronous protocols against crosstalk glitches. The utility of this modeling technique is demonstrated experimentally on a Xilinx Virtex-II Pro FPGA. Furthermore, a novel methodology is proposed to quench such crosstalk glitch propagation through gating the asynchronous interface from sending the signal during potential glitch vulnerable instances. This methodology is termed as crosstalk glitch gating. This technique is successfully applied to obtain crosstalk glitch quenching in the representative interfaces. This thesis also addresses the dock skew challenges faced by high-performance synchronous interfacing methodologies in modem DSM technologies. The proposed methodology allows communicating modules to run at a frequency that is independent of the dock skew. Leveraging a novel clock-scheduling algorithm, our technique permits a faster module to communicate safely with a slower module without slowing down. Safe data communications for mesochronous schemes and for the cases when communicating modules have dock frequency ratios of integer or coprime numbers are theoretically explained and experimentally demonstrated. A clock-scheduling technique to dynamically accommodate phase variations is also proposed. These methods are implemented to the Xilinx Virtex II Pro technology. Experiments prove that the proposed interfacing scheme allows modules to communicate data safely, for mesochronous schemes, at 350 MHz, which is the limit of the technology used, under a dock skew of more than twice the time period (i.e. a dock skew of 12 ns

Concordia University Research Repository

Recommended from our members

Architectures and Design Automation for Photonic Networks On Chip

Author: Hendry Gilbert R.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2011
Field of study

Chip-scale photonics has emerged as an exciting field which can potentially solve many of the problems plaguing the high-performance computing industry, from large-scale to embedded. In theory, photonics is a superior communication medium because of its higher bandwidth density using wave-division multiplexing and bandwidth-power translucency to distance traveled. In practice, physical-layer design and engineering issues such as optical loss, crosstalk, and packaging have slowed its entry into widespread adoption at the chip and board scale. In this work, we present these issues and potential design improvements. The major contributions, however, are the tools and methods we have developed for the design of photonic interconnection networks, including a system-level simulator and CAD and modeling environment for layout, both of which are publicly available to the research community

Columbia University Academic Commons

Synchronous Latency Insensitive Design in FPGA

Author: Sheng Cheng
Publication venue: Institutionen för systemteknik
Publication date: 01/01/2005
Field of study

A design methodology to mitigate timing problems due to long wire delays is proposed. The timing problems are taking care of at architecture level instead of layout level in this design method so that no change is needed when the whole design goes to backend design. Hence design iterations are avoided by using this design methodology. The proposed design method is based on STARI architecture, and a novel initialization mechanism is proposed in this paper. Low frequency global clock is used to synchronize the communication and PLLs are used to provide high frequency working clocks. The feasibility of new design methodology is proved on FPGA test board and the implementation details are also described in this paper. Only standard library cells are used in this design method and no change is made to the traditional design flow. The new design methodology is expected to reduce the timing closure effort in high frequency and complex digital design in deep submicron technologies

Publikationer från Linköpings universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Efficient high-speed on-chip global interconnects

Author: Caputa Peter
Publication venue: Institutionen för systemteknik
Publication date: 01/01/2006
Field of study

The continuous miniaturization of integrated circuits has opened the path towards System-on-Chip realizations. Process shrinking into the nanometer regime improves transistor performancewhile the delay of global interconnects, connecting circuit blocks separated by a long distance, significantly increases. In fact, global interconnects extending across a full chip can have a delay corresponding to multiple clock cycles. At the same time, global clock skew constraints, not only between blocks but also along the pipelined interconnects, become even tighter. On-chip interconnects have always been considered RC-like, that is exhibiting long RC-delays. This has motivated large efforts on alternatives such as on-chip optical interconnects, which have not yet been demonstrated, or complex schemes utilizing on-chip F-transmission or pulsed current-mode signaling. In this thesis, we show that well-designed electrical global interconnects, behaving as transmission lines, have the capacity of very high data rates (higher than can be delivered by the actual process) and support near velocity-of-light delay for single-ended voltage-mode signaling, thus mitigating the RC-problem. We critically explore key interconnect performance measures such as data delay, maximum data rate, crosstalk, edge rates and power dissipation. To experimentally demonstrate the feasibility and superior properties of on-chip transmission line interconnects, we have designed and fabricated a test chip carrying a 5 mm long global communication link. Measurements show that we can achieve 3 Gb/s/wire over the 5 mm long, repeaterless on-chip bus implemented in a standard 0.18 μm CMOS process, achieving a signal velocity of 1/3 of the velocity of light in vacuum. To manage the problems due to global wire delays, we describe and implement a Synchronous Latency Insensitive Design (SLID) scheme, based on source-synchronous data transfer between blocks and data re-timing at the receiving block. The SLIDtechnique not onlymitigates unknown globalwire delays, but also removes the need for controlling global clock skew. The high-performance and high robustness capability of the SLID-method is practically demonstrated through a successful implementation of a SLID-based, 5.4 mm long, on-chip global bus, supporting 3 Gb/s/wire and dynamically accepting ± 2 clock cycles of data-clock skew, in a standard 0.18 μm CMOS porcess. In the context of technology scaling, there is a tendency for interconnects to dominate chip power dissipation due to their large total capacitance. In this thesis we address the problem of interconnect power dissipation by proposing and analyzing a transition-energy cost model aimed for efficient power estimation of performancecritical buses. The model, which includes properties that closely capture effects present in high-performance VLSI buses, can be used to more accurately determine the energy benefits of e.g. transition coding of bus topologies. We further show a power optimization scheme based on appropriate choice of reduced voltage swing of the interconnect and scaling of receiver amplifier. Finally, the power saving impact of swing reduction in combination with a sense-amplifying flip-flop receiver is shown on a microprocessor cache bus architecture used in industry

Publikationer från Linköpings universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line