Search CORE

15 research outputs found

Functional and Performance Analysis of Network-on-Chips Using Actor-based Modeling and Formal Verification

Author: Mohammadi Siamak
Mosaffa Mahdi
sharifi zeinab
Sirjani Marjan
Publication venue: European Association of Software Science and Technology
Publication date: 10/02/2014
Field of study

Network on Chip (NoC) has emerged as a promising architecture paradigmfor todays many-core systems. As complexity grows in NoCs, functional verificationand performance prediction in the early stages of the design process are suggestedas ways to reduce the fabrication cost. Formal methods have gained moreattention as alternative ways for analyzing NoC designs. In this paper we propose amethod to model different characteristics of the system, and also verify various functionaland performance properties by generating the full state space of the model fordifferent scenarios. We present a formal model for two-dimensional mesh GloballyAsynchronous Locally Synchronous (GALS) NoCs with four-phase handshakecommunication protocol, using the actor-based modeling language Rebeca. Functionaland timing behaviors, routing algorithm and communication protocol are capturedin the model. Deadlock freedom, message arrival, and end-to-end packet latencyare checked. In order to analyze large NoCs we propose a scalable approachbased on compositional verification for estimating maximum end-to-end packet latency.The compositional approach is specific for the XY-routing algorithm. Resultsof verification are compared and matched to simulation results of HSPICE using32nm technology

Electronic Communications of the EASST (European Association of Software Science and Technology)

Design and Validation of Network-on-Chip Architectures for the Next Generation of Multi-synchronous, Reliable, and Reconfigurable Embedded Systems

Author: Strano Alessandro
Publication venue: Università degli studi di Ferrara
Publication date: 01/01/2013
Field of study

NETWORK-ON-CHIP (NoC) design is today at a crossroad. On one hand, the design principles to efficiently implement interconnection networks in the resource-constrained on-chip setting have stabilized. On the other hand, the requirements on embedded system design are far from stabilizing. Embedded systems are composed by assembling together heterogeneous components featuring differentiated operating speeds and ad-hoc counter measures must be adopted to bridge frequency domains. Moreover, an unmistakable trend toward enhanced reconfigurability is clearly underway due to the increasing complexity of applications. At the same time, the technology effect is manyfold since it provides unprecedented levels of system integration but it also brings new severe constraints to the forefront: power budget restrictions, overheating concerns, circuit delay and power variability, permanent fault, increased probability of transient faults. Supporting different degrees of reconfigurability and flexibility in the parallel hardware platform cannot be however achieved with the incremental evolution of current design techniques, but requires a disruptive approach and a major increase in complexity. In addition, new reliability challenges cannot be solved by using traditional fault tolerance techniques alone but the reliability approach must be also part of the overall reconfiguration methodology. In this thesis we take on the challenge of engineering a NoC architectures for the next generation systems and we provide design methods able to overcome the conventional way of implementing multi-synchronous, reliable and reconfigurable NoC. Our analysis is not only limited to research novel approaches to the specific challenges of the NoC architecture but we also co-design the solutions in a single integrated framework. Interdependencies between different NoC features are detected ahead of time and we finally avoid the engineering of highly optimized solutions to specific problems that however coexist inefficiently together in the final NoC architecture. To conclude, a silicon implementation by means of a testchip tape-out and a prototype on a FPGA board validate the feasibility and effectivenes

Archivio istituzionale della ricerca - Università di Ferrara

Master of Science

Author: Takur Dheeraj singh
Publication venue: University of Utah
Publication date: 01/01/2016
Field of study

thesisIntegrated circuits often consist of multiple processing elements that are regularly tiled across the two-dimensional surface of a die. This work presents the design and integration of high speed relative timed routers for asynchronous network-on-chip. It researches NoC's efficiency through simplicity by directly translating simple T-router, source-routing, single-flit packet to higher radix routers. This work is intended to study performance and power trade-offs adding higher radix routers, 3D topologies, Virtual Channels, Accurate NoC modeling, and Transmission line communication links. Routers with and without virtual channels are designed and integrated to arrayed communication networks. Furthermore, the work investigates 3D networks with diffusive RC wires and transmission lines on long wrap interconnects

The University of Utah: J. Willard Marriott Digital Library

Comparative Analysis of NoCs for Two-Dimensional Versus Three-Dimensional SoCs Supporting Multiple Voltage and Frequency Islands

Author: Benini Luca
De Micheli Giovanni
Murali Srinivasan
Seiculescu Ciprian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/07/2010
Field of study

In many of today’s system-on-chip (SoC) designs, the cores are partitioned into multiple voltage and frequency islands (VFIs), and the global interconnect is implemented using a packetswitched network on chip (NoC). In such VFI-based designs, the benefits of 3-D integration in reducing the NoC power or delay are unclear, as a significant fraction of power is spent in link-level synchronization, and stacked designs may impose many synchronization boundaries. In this brief, we show the quantitative benefits of the 3-D technology on NoC power and delay values for such application-specific designs. We show a design flow for building application-specific NoCs for both 2-D and 3-D SoCs with multiple VFIs. We present a detailed case study of NoCs designed using the flow for a mobile platform. Our results show that power savings strongly depend on the number of VFIs used (up to 32% reduction). This motivates the need for an early architectural space exploration, as allowed by our flow. Our experiments also show that the reduction in delay is only marginal when moving from 2-D to 3-D systems (up to 11%), if both are designed efficiently

Infoscience - École polytechnique fédérale de Lausanne

An Asynchronous Network-On-Chip Router with Low Standby Power

Author: Elshennawy Amr
Publication venue
Publication date: 28/04/2015
Field of study

The Network-on-Chip (NoC) paradigm is now widely used to interconnect the processing elements (PEs) in a chip multiprocessor (CMP). It has been reported that the NoC consumes about a third of the total power consumption of the multi-core processor. To address this, asynchronous NoC routers have been proposed, to eliminate the clocking power associated with the NoC implementation, which is typically a large fraction of the NoC power consumption. In this work, we present a technique to reduce the standby power of a state-of-the-art asynchronous NoC router. In our approach, the router is put in a known input state when idle, and each gate in the unmodified router is replaced by a logically equivalent gate whose supply pin is connected to a PMOS device with a high threshold voltage in case its output in the idle state was 0. On the other hand, if the output of the unmodified gate in the idle state was 1, it is replaced by a logically equivalent gate whose ground terminal is connected to a NMOS device with a high threshold voltage. Our router is inserted in an NoC, and verified logically for correct routing functionality. We also simulated it at the circuit level using a 45nm fabrication technology, and show that it has a low wake-up time from sleep, and a minimal steady-state routing delay (13%) and area (23%) overhead, and a 8.1× lower standby power, when compared to an unmodified asynchronous NoC router, which was also implemented. Our leakage improvement is achieved in part by using a novel method to control the leakage of the inverter chain used to drive the sleep signal, something which that is not possible with traditional leakage reduction techniques

Texas A&M Repository

CROSS-LAYER DESIGN, OPTIMIZATION AND PROTOTYPING OF NoCs FOR THE NEXT GENERATION OF HOMOGENEOUS MANY-CORE SYSTEMS

Author: Tatenguem Fankem Hervé
Publication venue
Publication date: 01/01/2014
Field of study

This thesis provides a whole set of design methods to enable and manage the runtime heterogeneity of features-rich industry-ready Tile-Based Networkon- Chips at different abstraction layers (Architecture Design, Network Assembling, Testing of NoC, Runtime Operation). The key idea is to maintain the functionalities of the original layers, and to improve the performance of architectures by allowing, joint optimization and layer coordinations. In general purpose systems, we address the microarchitectural challenges by codesigning and co-optimizing feature-rich architectures. In application-specific NoCs, we emphasize the event notification, so that the platform is continuously under control. At the network assembly level, this thesis proposes a Hold Time Robustness technique, to tackle the hold time issue in synchronous NoCs. At the network architectural level, the choice of a suitable synchronization paradigm requires a boost of synthesis flow as well as the coexistence with the DVFS. On one hand this implies the coexistence of mesochronous synchronizers in the network with dual-clock FIFOs at network boundaries. On the other hand, dual-clock FIFOs may be placed across inter-switch links hence removing the need for mesochronous synchronizers. This thesis will study the implications of the above approaches both on the design flow and on the performance and power quality metrics of the network. Once the manycore system is composed together, the issue of testing it arises. This thesis takes on this challenge and engineers various testing infrastructures. At the upper abstraction layer, the thesis addresses the issue of managing the fully operational system and proposes a congestion management technique named HACS. Moreover, some of the ideas of this thesis will undergo an FPGA prototyping. Finally, we provide some features for emerging technology by characterizing the power consumption of Optical NoC Interfaces

EprintsUnife

Archivio istituzionale della ricerca - Università di Ferrara

Recommended from our members

Design and performance optimization of asynchronous networks-on-chip

Author: Jiang Weiwei
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

As digital systems continue to grow in complexity, the design of conventional synchronous systems is facing unprecedented challenges. The number of transistors on individual chips is already in the multi-billion range, and a greatly increasing number of components are being integrated onto a single chip. As a consequence, modern digital designs are under strong time-to-market pressure, and there is a critical need for composable design approaches for large complex systems. In the past two decades, networks-on-chip (NoC’s) have been a highly active research area. In a NoC-based system, functional blocks are first designed individually and may run at different clock rates. These modules are then connected through a structured network for on-chip global communication. However, due to the rigidity of centrally-clocked NoC’s, there have been bottlenecks of system scalability, energy and performance, which cannot be easily solved with synchronous approaches. As a result, there has been significant recent interest in combing the notion of asynchrony with NoC designs. Since the NoC approach inherently separates the communication infrastructure, and its timing, from computational elements, it is a natural match for an asynchronous paradigm. Asynchronous NoC’s, therefore, enable a modular and extensible system composition for an ‘object-orient’ design style. The thesis aims to significantly advance the state-of-art and viability of asynchronous and globally-asynchronous locally-synchronous (GALS) networks-on-chip, to enable high-performance and low-energy systems. The proposed asynchronous NoC’s are nearly entirely based on standard cells, which eases their integration into industrial design flows. The contributions are instantiated in three different directions. First, practical acceleration techniques are proposed for optimizing the system latency, in order to break through the latency bottleneck in the memory interfaces of many on-chip parallel processors. Novel asynchronous network protocols are proposed, along with concrete NoC designs. A new concept, called ‘monitoring network’, is introduced. Monitoring networks are lightweight shadow networks used for fast-forwarding anticipated traffic information, ahead of the actual packet traffic. The routers are therefore allowed to initiate and perform arbitration and channel allocation in advance. The technique is successfully applied to two topologies which belong to two different categories – a variant mesh-of-trees (MoT) structure and a 2D-mesh topology. Considerable and stable latency improvements are observed across a wide range of traffic patterns, along with moderate throughput gains. Second, for the first time, a high-performance and low-power asynchronous NoC router is compared directly to a leading commercial synchronous counterpart in an advanced industrial technology. The asynchronous router design shows significant performance improvements, as well as area and power savings. The proposed asynchronous router integrates several advanced techniques, including a low-latency circular FIFO for buffer design, and a novel end-to-end credit-based virtual channel (VC) flow control. In addition, a semi-automated design flow is created, which uses portions of a standard synchronous tool flow. Finally, a high-performance multi-resource asynchronous arbiter design is developed. This small but important component can be directly used in existing asynchronous NoC’s for performance optimization. In addition, this standalone design promises use in opening up new NoC directions, as well as for general use in parallel systems. In the proposed arbiter design, the allocation of a resource to a client is divided into several steps. Multiple successive client-resource pairs can be selected rapidly in pipelined sequence, and the completion of the assignments can overlap in parallel. In sum, the thesis provides a set of advanced design solutions for performance optimization of asynchronous and GALS networks-on-chip. These solutions are at different levels, from network protocols, down to router- and component-level optimizations, which can be directly applied to existing basic asynchronous NoC designs to provide a leap in performance improvement

Columbia University Academic Commons

Spatial parallelism in the routers of asynchronous on-chip networks

Author: Edwards Douglas
Song Wei
Publication venue
Publication date: 01/01/2011
Field of study

State-of-the-art multi-processor systems-on-chip use on-chip networks as their communication fabric. Although most on-chip networks are implemented synchronously, asynchronous on-chip networks have several advantages over their synchronous counterparts. Timing division multiplexing (TDM) flow control methods have been utilized in asynchronous on-chip networks extensively. The synchronization required by TDM leads to significant speed penalties. Compared with using TDM methods, spatial parallelism methods, such as the spatial division multiplexing (SDM) flow control method, achieve better network throughput with less area overhead.This thesis proposes several techniques to increase spatial parallelism in the routers of asynchronous on-chip networks.Channel slicing is a new pipeline structure that alleviates the speed penalty by removing the synchronization among bit-level data pipelines. It is also found out that the lookahead pipeline using early evaluated acknowledgement can be used in routers to further improve speed.SDM is a new flow control method proposed for asynchronous on-chip networks. It improves network throughput without introducing synchronization among buffers of different frames, which is required by TDM methods. It is also found that the area overhead of SDM is smaller than the virtual channel (VC) flow control method -- the most used TDM method. The major design problem of SDM is the area consuming crossbars. A novel 2-stage Clos switch structure is proposed to replace the crossbar in SDM routers, which significantly reduces the area overhead. This Clos switch is dynamically reconfigured by a new asynchronous Clos scheduler.Several asynchronous SDM routers are implemented using these new techniques. An asynchronous VC router is also reproduced for comparison. Performance analyses show that the SDM routers outperform the VC router in throughput, area overhead and energy efficiency.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

ESE - Salento University Publishing

Università del Salento: ESE - Salento University Publishing

OpenGrey Repository

Spatial parallelism in the routers of asynchronous on-chip networks

Author: Edwards Douglas
Song Wei
Publication venue
Publication date: 01/01/2011
Field of study

OpenGrey Repository

Doctor of Philosophy

Author: Vij Vikas S.
Publication venue: University of Utah
Publication date: 01/12/2013
Field of study

dissertationAsynchronous design has a very promising potential even though it has largely received a cold reception from industry. Part of this reluctance has been due to the necessity of custom design languages and computer aided design (CAD) flows to design, optimize, and validate asynchronous modules and systems. Next generation asynchronous flows should support modern programming languages (e.g., Verilog) and application specific integrated circuits (ASIC) CAD tools. They also have to support multifrequency designs with mixed synchronous (clocked) and asynchronous (unclocked) designs. This work presents a novel relative timing (RT) based methodology for generating multifrequency designs using synchronous CAD tools and flows. Synchronous CAD tools must be constrained for them to work with asynchronous circuits. Identification of these constraints and characterization flow to automatically derive the constraints is presented. The effect of the constraints on the designs and the way they are handled by the synchronous CAD tools are analyzed and reported in this work. The automation of the generation of asynchronous design templates and also the constraint generation is an important problem. Algorithms for automation of reset addition to asynchronous circuits and power and/or performance optimizations applied to the circuits using logical effort are explored thus filling an important hole in the automation flow. Constraints representing cyclic asynchronous circuits as directed acyclic graphs (DAGs) to the CAD tools is necessary for applying synchronous CAD optimizations like sizing, path delay optimizations and also using static timing analysis (STA) on these circuits. A thorough investigation for the requirements of cycle cutting while preserving timing paths is presented with an algorithm to automate the process of generating them. A large set of designs for 4 phase handshake protocol circuit implementations with early and late data validity are characterized for area, power and performance. Benchmark circuits with automated scripts to generate various configurations for better understanding of the designs are proposed and analyzed. Extension to the methodology like addition of scan insertion using automatic test pattern generation (ATPG) tools to add testability of datapath in bundled data asynchronous circuit implementations and timing closure approaches are also described. Energy, area, and performance of purely asynchronous circuits and circuits with mixed synchronous and asynchronous blocks are explored. Results indicate the benefits that can be derived by generating circuits with asynchronous components using this methodology

The University of Utah: J. Willard Marriott Digital Library