Search CORE

322 research outputs found

Comparison of multi-layer bus interconnection and a network on chip solution

Author: Huttula N. (Niko)
Publication venue: University of Oulu
Publication date: 01/04/2019
Field of study

Abstract. This thesis explains the basic subjects that are required to take in consideration when designing a network on chip solutions in the semiconductor world. For example, general topologies such as mesh, torus, octagon and fat tree are explained. In addition, discussion related to network interfaces, switches, arbitration, flow control, routing, error avoidance and error handling are provided. Furthermore, there is discussion related to design flow, a computer aided designing tools and a few comprehensive researches. However, several networks are designed for the minimum latency, although there are also versions which trade performance for decreased bus widths. These designed networks are compared with a corresponding multi-layer bus interconnection and both synthesis and register transfer level simulations are run. For example, results from throughput, latency, logic area and power consumptions are gathered and compared. It was discovered that overall throughput was well balanced with the network on chip solutions, although its maximum throughput was limited by protocol conversions. For example, the multi-layer bus interconnection was capable of providing a few times smaller latencies and higher throughputs when only a single interface was injected at the time. However, with parallel traffic and high-performance requirements a network on chip solution provided better results, even though the difference decreased when performance requirements were lower. Furthermore, it was discovered that the network on chip solutions required approximately 3–4 times higher total cell area than the multi-layer bus interconnection and that resources were mainly located at network interfaces and switches. In addition, power consumption was approximately 2–3 times higher and was mostly caused by dynamic consumption.Monitasoisen väyläarkkitehtuurin ja tietokoneverkkomaisen ratkaisun vertailua. Tiivistelmä. Tutkielmassa käsitellään tärkeimpiä aihealueita, jotka tulee huomioida suunniteltaessa tietokoneverkkomaisia väyläratkaisuja puolijohdemaailmassa. Esimerkiksi yleiset rakenteet, kuten verkko-, torus-, kahdeksankulmio- ja puutopologiat käsitellään lyhyesti. Lisäksi alustetaan verkon liitäntäkohdat, kytkimet, vuorottelu, vuon hallinta, reititys, virheiden välttely ja -käsittely. Lopuksi kerrotaan suunnitteluvuon oleellisimmat välivaiheet ja niihin soveltuvia kaupallisia työkaluja, sekä käsitellään lyhyesti muutaman aiemman julkaisun tuloksia. Tutkielmassa käytetään suunnittelutyökalua muutaman tietokoneverkkomaisen ratkaisun toteutukseen ja tavoitteena on saavuttaa pienin mahdollinen latenssi. Toisaalta myös hieman suuremman latenssin versioita suunnitellaan, mutta pienemmillä väylänleveyksillä. Lisäksi suunniteltuja tietokoneverkkomaisia ratkaisuja vertaillaan perinteisempään monitasoiseen väyläarkkitehtuuriin. Esimerkiksi synteesi- ja simulaatiotuloksia, kuten logiikan vaatimaa pinta-alaa, tehonkulutusta, latenssia ja suorituskykyä, vertaillaan keskenään. Tutkielmassa selvisi, että suunnittelutyökalulla toteutetut tietokoneverkkomaiset ratkaisut mahdollistivat tasaisemman suorituskyvyn, joskin niiden suurin saavutettu suorituskyky ja pienin latenssi määräytyivät protokollan käännöksen aiheuttamasta viiveestä. Tutkielmassa havaittiin, että perinteisemmillä menetelmillä saavutettiin noin kaksi kertaa suurempi suorituskyky ja pienempi latenssi, kun verkossa ei ollut muuta liikennettä. Rinnakkaisen liikenteen lisääntyessä tietokoneverkkomainen ratkaisu tarjosi keskimäärin paremman suorituskyvyn, kun sille asetetut tehokkuusvaateet olivat suuret, mutta suorituskykyvaatimuksien laskiessa erot kapenivat. Lisäksi huomattiin, että tietokoneverkkomaisten ratkaisujen käyttämä pinta-ala oli noin 3–4 kertaa suurempi kuin monitasoisella väyläarkkitehtuurilla ja että resurssit sijaitsivat enimmäkseen verkon liittymäkohdissa ja kytkimissä. Lisäksi tehonkulutuksen huomattiin olevan noin 2–3 kertaa suurempi, joskin sen havaittiin koostuvan pääosin dynaamisesta kulutuksesta

University of Oulu Repository - Jultika

Design and Verification of a Round-Robin Arbiter

Author: Toe Aung
Publication venue: RIT Scholar Works
Publication date: 01/08/2018
Field of study

As the number of bus masters increases in chip, the performance of a system largely depends on the arbitration scheme. The throughput of the system is affected by the arbiter circuit which controls the grant for various requestors. An arbitration scheme is usually chosen based on the application. A memory arbiter decides which CPU will get access for each cycle. A packet switch uses an arbiter to decide which input packet will be scheduled to the output. This paper introduces a Round-robin arbitration with adjustable weight of resource access time. The Round-robin arbiter mechanism is useful when no starvation of grants is allowed. The arbiter quantizes time shares each requestor is allowed to have. A minimal fairness is guaranteed by granting requestors in Round-robin manner. The requestors can prioritize their time shares by the weight. For example, if requestor A has a weight of two and requestor B has a weight of four, arbiter will allocate requestor B with time slice two times longer than that of requestor A’s. The verification of the design is carried out using SystemVerilog. The inputs of the arbiter are randomized, outputs are predicted in a software model and verification coverage is collected. The work in this paper includes design and verification of a weighted Round-robin arbiter

RIT Scholar Works

Simplifying the Creation of Multi-core Processors: An Interconnection Architecture and Tool Framework

Author: Grossman Samuel Robert
Publication venue: 'University of Waterloo'
Publication date: 01/01/2012
Field of study

The contribution of this thesis is two-fold: an on-chip interconnection architecture designed specifically for multi-core processors and a tool framework that simplifies the process of designing a multi-core processor. Both contributions primarily target ASIC fabrication, though prototyping on an FPGA is also supported. SG-Multi, the on-chip interconnection architecture, distinguishes itself from other interconnection architectures by emphasizing universal adaptability; that is, a primary design goal is to ensure compatibility with industry-supplied cores originally intended for other architectures. This goal is achieved through the use of bus adapters and without introducing clock cycle latency. SG-Multi is a multi-bus architecture that uses slave-side arbitration and supports multiple simultaneous transactions between independent devices. All transactions are pipelined in two stages, an address phase and a data phase, and for improved performance slave devices must signal their status for a given clock cycle at the beginning of that cycle. SG-Multi Designer, the tool framework which builds systems that use SG-Multi, provides a higher level of abstraction compared to other competing system-building solutions; the set of components with which a designer must be concerned is much more limited, and low-level details such as hardware interface compatibility are removed from active consideration. Experimental results demonstrate that the hardware cost of using SG-Multi is reasonable compared to using a processor's native bus architecture, although the current implementation of arbitration is identifiable as an area for future improvement. It is also shown that SG-Multi is scalable; the reference systems grow linearly with respect to the number of cores when tested for ASIC fabrication and slightly sublinearly when tested for FPGA prototyping, and the maximum achievable clock frequency remains almost constant as the number of cores grows beyond four. Because the reference systems tested are an accurate reflection of the types of systems SG-Multi Designer produces, it is concluded that the abstraction model used by SG-Multi Designer does not over-simplify the design process in a way that causes excessive performance degradation or increased hardware resource consumption

University of Waterloo's Institutional Repository

Design of an asynchronous processor

Author: Sotiriou Christos Panagiotis
Publication venue: The University of Edinburgh
Publication date: 01/01/2001
Field of study

Edinburgh Research Archive

Design Space Exploration of FPGA-Based NoC Routers

Author: Imbewa Abdelrazag
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2012
Field of study

Currently, FPGAs serve as Field–Programmable–Systems–on–Chip (FPSoCs) and are widely used to implement computationally intensive applications. As the number of components in FPSoCs increases, the interconnect schemes based on Network–on–Chip (NoC) approach are increasingly used. Routers greatly impact the performance and cost of NoCs. In this thesis, we explore the design space of FPGA–based NoC routers. We implement three types of packet switched NoC routers on a Stratix II FPGA using parameterized VHDL models. To reduce the area and increase the speed, we use novel techniques. Buffer size is decreased by minimizing the number of control fields in a packet. Both edges of the clock are utilized, and credit based flow control is used to accelerate the router. The proposed routers were evaluated based on area, frequency, and zero load latency. Synthesis results and zero load latency evaluations show that they are significantly superior to widely referenced, previously proposed routers

Scholarship at UWindsor

Recommended from our members

Behavioral synthesis from VHDL using structured modeling

Author: Gajski Daniel D.
Lis Joseph S.
Publication venue: eScholarship, University of California
Publication date: 01/01/1991
Field of study

This dissertation describes work in behavioral synthesis involving the development of a VHDL Synthesis System VSS which accepts a VHDL behavioral input specification and performs technology independent synthesis to generate a circuit netlist of generic components. The VHDL language is used for input and output descriptions. An intermediate representation which incorporates signal typing and component attributes simplifies compilation and facilitates design optimization.A Structured Modeling methodology has been developed to suggest standard VHDL modeling practices for synthesis. Structured modeling provides recommendations for the use of available VHDL description styles so that optimal designs will be synthesized.A design composed of generic components is synthesized from the input description through a process of Graph Compilation, Graph Criticism, and Design Compilation. Experiments were performed to demonstrate the effects of different modeling styles on the quality of the design produced by VSS. Several alternative VHDL models were examined for each benchmark, illustrating the improvements in design quality achieved when Structured Modeling guidelines were followed

eScholarship - University of California

Enabling Task Level Parallelism in HandelC

Author: AbuYasin Thamer S.
Publication venue: 'Paleontological Institute at The University of Kansas'
Publication date: 01/01/2007
Field of study

HandelC is a programming language used to target hardware and is similar in syntax to ANSI-C. HandelC offers constructs that allow programmers to express instruction level parallelism. Also, HandelC offers primitives that allow task level parallelism. However, HandelC does not offer any runtime support that enables programmers to express task level parallelism efficiently. This thesis discusses this issue and suggests a support library called HCthreads as a solution. HCthreads offers a subset of Pthreads functionality and interface relevant to the HandelC environment. This study offers means to identify the best configuration of HCthreads to achieve the highest speedups in real systems. This thesis investigates the issue of integrating HandelC within platforms not supported by Celoxica. A support library is implemented to solve this issue by utilizing the high level abstractions offered by Hthreads. This support library abstracts away any HWTI specific synchronization making the coding experience quite close to software. HCthreads is proven effective and generic for various algorithms with different threading behaviors. HCthreads is an adequate method to implement recursive algorithms even if no task level parallelism is warranted. Not only HCthreads offers such versatility, it achieves modest speedups over instruction level parallelism ad-hoc approaches. The Hthreads support library served its intended purpose by allowing HCthreads real system tests to proceed on a third party platform. No major issues were reported while conducting these tests, still additional investigation and verification is required

KU ScholarWorks