Search CORE

1,379 research outputs found

Adaptive Latency Insensitive Protocols

Author: Casu Mario Roberto
Macchiarulo Luca
Publication venue: IEEE
Publication date: 01/01/2007
Field of study

Latency-insensitive design copes with excessive delays typical of global wires in current and future IC technologies. It achieves its goal via encapsulation of synchronous logic blocks in wrappers that communicate through a latency-insensitive protocol (LIP) and pipelined interconnects. Previously proposed solutions suffer from an excessive performance penalty in terms of throughput or from a lack of generality. This article presents an adaptive LIP that outperforms previous static implementations, as demonstrated by two relevant cases — a microprocessor and an MPEG encoder — whose components we made insensitive to the latencies of their interconnections through a newly developed wrapper. We also present an informal exposition of the theoretical basis of adaptive LIPs, as well as implementation detail

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Adaptive Latency Insensitive Protocols andElastic Circuits with Early Evaluation: A Comparative Analysis

Author: Carloni
Casu
Luca Macchiarulo
Mario R. Casu
Murata
Publication venue: Elsevier
Publication date: 01/01/2009
Field of study

AbstractLatency Insensitive Protocols (LIP) and Elastic Circuits (EC) solve the same problem of rendering a design tolerant to additional latencies caused by wires or computational elements. They are performance-limited by a firing semantics that enforces coherency through a lazy evaluation rule: Computation is enabled if all inputs to a block are simultaneously available. Adaptive LIP's (ALIP) and EC with early evaluation (ECEE) increase the performance by relaxing the evaluation rule: Computation is enabled as soon as the subset of inputs needed at a given time is available. Their difference in terms of implementation and behavior in selected cases justifies the need for the comparative analysis reported in this paper. Results have been obtained through simple examples, a single representative case-study already used in the context of both LIP's and EC and through extensive simulations over a suite of benchmarks

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Half-buffer retiming and token cages for synchronous elastic circuits

Author: Ampalam
Austin
Bañeres
Blaauw
Bowman
Brej
Bufistov
Carloni
Carloni
Carmona
Casu
Collins
Cortadella
Cortadella
Ghosh
Jacobson
Júlvez
Kam
Li
Li
Lu
Lu
M.R. Casu
Reese
Publication venue: IET
Publication date: 01/01/2011
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Issues in Implementing Latency Insensitive Protocols

Author: Casu Mario Roberto
Macchiarulo Luca
Publication venue: IEEE Computer Society
Publication date: 01/01/2004
Field of study

The performance of future Systems-on-Chip will be limited by the latency of long interconnects requiring more than one clock cycle for the signals to propagate. To deal with the problem L. Carloni et alii proposed the Latency Insensitive Protocols (LIP). A design that works under the assumption of zero-delay connections between functional modules is modified in a Latency Insensitive Design (LID) by encapsulating them within wrappers (“shells”) and connecting them through internally pipelined blocks (“relay stations”) complying with a protocol that guarantees identity of behavior [1]. The wrappers perform:- Data Validation: each output channel signals whether the datum therein present has still to be consumed.- Back Pressure: when the pearl is stopped the shell generates a stop signal sent in the opposite direction of inputs;- Clock Gating: a module waiting for new data and/or stopped keeps its present state. Such a protocol was implemented [2] through the introductio

CiteSeerX

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Improving Synchronous Elastic Circuits: Token Cages and Half-Buffer Retiming

Author: Casu Mario Roberto
Publication venue: IEEE
Publication date: 01/01/2010
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Comprehensive characterization of an open source document search engine

Author: Antoniou Georgia
Hadjilambrou Zacharias
Kleanthous Marios
Portero Antoni
Sazeides Yiannakis
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

This work performs a thorough characterization and analysis of the open source Lucene search library. The article describes in detail the architecture, functionality, and micro-architectural behavior of the search engine, and investigates prominent online document search research issues. In particular, we study how intra-server index partitioning affects the response time and throughput, explore the potential use of low power servers for document search, and examine the sources of performance degradation ands the causes of tail latencies. Some of our main conclusions are the following: (a) intra-server index partitioning can reduce tail latencies but with diminishing benefits as incoming query traffic increases, (b) low power servers given enough partitioning can provide same average and tail response times as conventional high performance servers, (c) index search is a CPU-intensive cache-friendly application, and (d) C-states are the main culprits for performance degradation in document search.Web of Science162art. no. 1

A general model for performance optimization of sequential systems

Author: Bufistov Dmitry
Cortadella Jordi
Kishinevsky Michael
Sapatnekar Sachin S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Retiming, c-slow retiming and recycling are different transformations for the performance optimization of sequential circuits. For retiming and c-slow retiming, different models that provide exact solutions have already been proposed. An exact model for recycling was yet unknown. This paper presents a general formulation that covers the combination of the three schemes for performance optimization. It provides an exact model based on integer linear programming that resorts to the structural theory of marked graphs. A set of experiments has been designed to show the benefits in performance obtained by combining retiming and recycling. The results also show the applicability of the method in large circuits.Peer ReviewedPostprint (published version

Balancing static islands in dynamically scheduled circuits using continuous petri nets

Author: Cheng J
Constantinides G
Fraca Santamaria E
Wickerson J
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 28/06/2023
Field of study

High-level synthesis (HLS) tools automatically transform a high-level program, for example in C/C++, into a low-level hardware description. A key challenge in HLS is scheduling, i.e. determining the start time of all the operations in the untimed program. A major shortcoming of existing approaches to scheduling – whether they are static (start times determined at compile-time), dynamic (start times determined at run-time), or a hybrid of both – is that the static analysis cannot efficiently explore the run-time hardware behaviours. Existing approaches either assume the timing behaviour in extreme cases, which can cause sub-optimal performance or larger area, or use simulation-based approaches, which take a long time to explore enough program traces. In this article, we propose an efficient approach using probabilistic analysis for HLS tools to efficiently explore the timing behaviour of scheduled hardware. We capture the performance of the hardware using Timed Continous Petri nets with immediate transitions, allowing us to leverage efficient Petri net analysis tools for making HLS decisions. We demonstrate the utility of our approach by using it to automatically estimate the hardware throughput for balancing the throughput for statically scheduled components (also known as static islands) computing in a dynamically scheduled circuit. Over a set of benchmarks, we show that our approach on average incurs a 2% overhead in area-delay product compared to optimal designs by exhaustive search

Spiral - Imperial College Digital Repository