Search CORE

16,801 research outputs found

Spatial parallelism in the routers of asynchronous on-chip networks

Author: Edwards Douglas
Song Wei
Publication venue
Publication date: 01/01/2011
Field of study

State-of-the-art multi-processor systems-on-chip use on-chip networks as their communication fabric. Although most on-chip networks are implemented synchronously, asynchronous on-chip networks have several advantages over their synchronous counterparts. Timing division multiplexing (TDM) flow control methods have been utilized in asynchronous on-chip networks extensively. The synchronization required by TDM leads to significant speed penalties. Compared with using TDM methods, spatial parallelism methods, such as the spatial division multiplexing (SDM) flow control method, achieve better network throughput with less area overhead.This thesis proposes several techniques to increase spatial parallelism in the routers of asynchronous on-chip networks.Channel slicing is a new pipeline structure that alleviates the speed penalty by removing the synchronization among bit-level data pipelines. It is also found out that the lookahead pipeline using early evaluated acknowledgement can be used in routers to further improve speed.SDM is a new flow control method proposed for asynchronous on-chip networks. It improves network throughput without introducing synchronization among buffers of different frames, which is required by TDM methods. It is also found that the area overhead of SDM is smaller than the virtual channel (VC) flow control method -- the most used TDM method. The major design problem of SDM is the area consuming crossbars. A novel 2-stage Clos switch structure is proposed to replace the crossbar in SDM routers, which significantly reduces the area overhead. This Clos switch is dynamically reconfigured by a new asynchronous Clos scheduler.Several asynchronous SDM routers are implemented using these new techniques. An asynchronous VC router is also reproduced for comparison. Performance analyses show that the SDM routers outperform the VC router in throughput, area overhead and energy efficiency.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

OpenGrey Repository

Spatial parallelism in the routers of asynchronous on-chip networks

Author: Edwards Douglas
Song Wei
Publication venue
Publication date: 01/01/2011
Field of study

ESE - Salento University Publishing

Università del Salento: ESE - Salento University Publishing

OpenGrey Repository

Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

Author: Gujarathi Hemal S
McDonald-Maier Klaus D
Qadri Muhammad Yasir
Publication venue: 'Academy Publisher'
Publication date: 01/01/2009
Field of study

The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

University of Essex Research Repository

CiteSeerX

Crossref

Asynchronous techniques for system-on-chip design

Author: Martin Alain J.
Nyström Mika
Publication venue
Publication date: 01/06/2006
Field of study

SoC design will require asynchronous techniques as the large parameter variations across the chip will make it impossible to control delays in clock networks and other global signals efficiently. Initially, SoCs will be globally asynchronous and locally synchronous (GALS). But the complexity of the numerous asynchronous/synchronous interfaces required in a GALS will eventually lead to entirely asynchronous solutions. This paper introduces the main design principles, methods, and building blocks for asynchronous VLSI systems, with an emphasis on communication and synchronization. Asynchronous circuits with the only delay assumption of isochronic forks are called quasi-delay-insensitive (QDI). QDI is used in the paper as the basis for asynchronous logic. The paper discusses asynchronous handshake protocols for communication and the notion of validity/neutrality tests, and completion tree. Basic building blocks for sequencing, storage, function evaluation, and buses are described, and two alternative methods for the implementation of an arbitrary computation are explained. Issues of arbitration, and synchronization play an important role in complex distributed systems and especially in GALS. The two main asynchronous/synchronous interfaces needed in GALS-one based on synchronizer, the other on stoppable clock-are described and analyzed

Caltech Authors

Desynchronization: Synthesis of asynchronous circuits from synchronous specifications

Author: Alex Kondratyev
Christos Sotiriou
Jordi Cortadella
Lavagno Luciano
Publication venue
Publication date: 01/01/2006
Field of study

Asynchronous implementation techniques, which measure logic delays at run time and activate registers accordingly, are inherently more robust than their synchronous counterparts, which estimate worst-case delays at design time, and constrain the clock cycle accordingly. De-synchronization is a new paradigm to automate the design of asynchronous circuits from synchronous specifications, thus permitting widespread adoption of asynchronicity, without requiring special design skills or tools. In this paper, we first of all study different protocols for de-synchronization and formally prove their correctness, using techniques originally developed for distributed deployment of synchronous language specifications. We also provide a taxonomy of existing protocols for asynchronous latch controllers, covering in particular the four-phase handshake protocols devised in the literature for micro-pipelines. We then propose a new controller which exhibits provably maximal concurrency, and analyze the performance of desynchronized circuits with respect to the original synchronous optimized implementation. We finally prove the feasibility and effectiveness of our approach, by showing its application to a set of real designs, including a complete implementation of the DLX microprocessor architectur

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Recommended from our members

Noise shaping Asynchronous SAR ADC based time to digital converter

Author: Katragadda Sowmya
Publication venue
Publication date: 06/08/2018
Field of study

Time-to-digital converters (TDCs) are key elements for the digitization of timing information in modern mixed-signal circuits such as digital PLLs, DLLs, ADCs, and on-chip jitter-monitoring circuits. Especially, high-resolution TDCs are increasingly employed in on-chip timing tests, such as jitter and clock skew measurements, as advanced fabrication technologies allow fine on-chip time resolutions. Its main purpose is to quantize the time interval of a pulse signal or the time interval between the rising edges of two clock signals. Similarly to ADCs, the performance of TDCs are also primarily characterized by Resolution, Sampling Rate, FOM, SNDR, Dynamic Range and DNL/INL. This work proposes and demonstrates 2nd order noise shaping Asynchronous SAR ADC based TDC architecture with highest resolution of 0.25 ps among current state of art designs with respect to post-layout simulation results. This circuit is a combination of low power/High Resolution 2nd Order Noise Shaped Asynchronous SAR ADC backend with simple Time to Amplitude converter (TAC) front-end and is implemented in 40nm CMOS technology. Additionally, special emphasis is given on the discussion on various current state of art TDC architectures.Electrical and Computer Engineerin

Texas ScholarWorks

Atari games and Intel processors

Author: Adamski Robert
Grel Tomasz
Klimek Maciej
Michalewski Henryk
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/05/2017
Field of study

The asynchronous nature of the state-of-the-art reinforcement learning algorithms such as the Asynchronous Advantage Actor-Critic algorithm, makes them exceptionally suitable for CPU computations. However, given the fact that deep reinforcement learning often deals with interpreting visual information, a large part of the train and inference time is spent performing convolutions. In this work we present our results on learning strategies in Atari games using a Convolutional Neural Network, the Math Kernel Library and TensorFlow 0.11rc0 machine learning framework. We also analyze effects of asynchronous computations on the convergence of reinforcement learning algorithms

arXiv.org e-Print Archive

Crossref

Design of an FPGA-based smart camera and its application towards object tracking : a thesis presented in partial fulfilment of the requirements for the degree of Master of Engineering in Electronics and Computer Engineering at Massey University, Manawatu, New Zealand

Author: Contreras Miguel
Publication venue: 'Massey University'
Publication date: 01/01/2016
Field of study

Smart cameras and hardware image processing are not new concepts, yet despite the fact both have existed several decades, not much literature has been presented on the design and development process of hardware based smart cameras. This thesis will examine and demonstrate the principles needed to develop a smart camera on hardware, based on the experiences from developing an FPGA-based smart camera. The smart camera is applied on a Terasic DE0 FPGA development board, using Terasic’s 5 megapixel GPIO camera. The algorithm operates at 120 frames per second at a resolution of 640x480 by utilising a modular streaming approach. Two case studies will be explored in order to demonstrate the development techniques established in this thesis. The first case study will develop the global vision system for a robot soccer implementation. The algorithm will identify and calculate the positions and orientations of each robot and the ball. Like many robot soccer implementations each robot has colour patches on top to identify each robot and aid finding its orientation. The ball is comprised of a single solid colour that is completely distinct from the colour patches. Due to the presence of uneven light levels a YUV-like colour space labelled YC1C2 is used in order to make the colour values more light invariant. The colours are then classified using a connected components algorithm to segment the colour patches. The shapes of the classified patches are then used to identify the individual robots, and a CORDIC function is used to calculate the orientation. The second case study will investigate an improved colour segmentation design. A new HSY colour space is developed by remapping the Cartesian coordinate system from the YC1C2 to a polar coordinate system. This provides improved colour segmentation results by allowing for variations in colour value caused by uneven light patterns and changing light levels

Massey Research Online