322 research outputs found

    Comparison of multi-layer bus interconnection and a network on chip solution

    Get PDF
    Abstract. This thesis explains the basic subjects that are required to take in consideration when designing a network on chip solutions in the semiconductor world. For example, general topologies such as mesh, torus, octagon and fat tree are explained. In addition, discussion related to network interfaces, switches, arbitration, flow control, routing, error avoidance and error handling are provided. Furthermore, there is discussion related to design flow, a computer aided designing tools and a few comprehensive researches. However, several networks are designed for the minimum latency, although there are also versions which trade performance for decreased bus widths. These designed networks are compared with a corresponding multi-layer bus interconnection and both synthesis and register transfer level simulations are run. For example, results from throughput, latency, logic area and power consumptions are gathered and compared. It was discovered that overall throughput was well balanced with the network on chip solutions, although its maximum throughput was limited by protocol conversions. For example, the multi-layer bus interconnection was capable of providing a few times smaller latencies and higher throughputs when only a single interface was injected at the time. However, with parallel traffic and high-performance requirements a network on chip solution provided better results, even though the difference decreased when performance requirements were lower. Furthermore, it was discovered that the network on chip solutions required approximately 3–4 times higher total cell area than the multi-layer bus interconnection and that resources were mainly located at network interfaces and switches. In addition, power consumption was approximately 2–3 times higher and was mostly caused by dynamic consumption.Monitasoisen vĂ€ylĂ€arkkitehtuurin ja tietokoneverkkomaisen ratkaisun vertailua. TiivistelmĂ€. Tutkielmassa kĂ€sitellÀÀn tĂ€rkeimpiĂ€ aihealueita, jotka tulee huomioida suunniteltaessa tietokoneverkkomaisia vĂ€ylĂ€ratkaisuja puolijohdemaailmassa. Esimerkiksi yleiset rakenteet, kuten verkko-, torus-, kahdeksankulmio- ja puutopologiat kĂ€sitellÀÀn lyhyesti. LisĂ€ksi alustetaan verkon liitĂ€ntĂ€kohdat, kytkimet, vuorottelu, vuon hallinta, reititys, virheiden vĂ€lttely ja -kĂ€sittely. Lopuksi kerrotaan suunnitteluvuon oleellisimmat vĂ€livaiheet ja niihin soveltuvia kaupallisia työkaluja, sekĂ€ kĂ€sitellÀÀn lyhyesti muutaman aiemman julkaisun tuloksia. Tutkielmassa kĂ€ytetÀÀn suunnittelutyökalua muutaman tietokoneverkkomaisen ratkaisun toteutukseen ja tavoitteena on saavuttaa pienin mahdollinen latenssi. Toisaalta myös hieman suuremman latenssin versioita suunnitellaan, mutta pienemmillĂ€ vĂ€ylĂ€nleveyksillĂ€. LisĂ€ksi suunniteltuja tietokoneverkkomaisia ratkaisuja vertaillaan perinteisempÀÀn monitasoiseen vĂ€ylĂ€arkkitehtuuriin. Esimerkiksi synteesi- ja simulaatiotuloksia, kuten logiikan vaatimaa pinta-alaa, tehonkulutusta, latenssia ja suorituskykyĂ€, vertaillaan keskenÀÀn. Tutkielmassa selvisi, ettĂ€ suunnittelutyökalulla toteutetut tietokoneverkkomaiset ratkaisut mahdollistivat tasaisemman suorituskyvyn, joskin niiden suurin saavutettu suorituskyky ja pienin latenssi mÀÀrĂ€ytyivĂ€t protokollan kÀÀnnöksen aiheuttamasta viiveestĂ€. Tutkielmassa havaittiin, ettĂ€ perinteisemmillĂ€ menetelmillĂ€ saavutettiin noin kaksi kertaa suurempi suorituskyky ja pienempi latenssi, kun verkossa ei ollut muuta liikennettĂ€. Rinnakkaisen liikenteen lisÀÀntyessĂ€ tietokoneverkkomainen ratkaisu tarjosi keskimÀÀrin paremman suorituskyvyn, kun sille asetetut tehokkuusvaateet olivat suuret, mutta suorituskykyvaatimuksien laskiessa erot kapenivat. LisĂ€ksi huomattiin, ettĂ€ tietokoneverkkomaisten ratkaisujen kĂ€yttĂ€mĂ€ pinta-ala oli noin 3–4 kertaa suurempi kuin monitasoisella vĂ€ylĂ€arkkitehtuurilla ja ettĂ€ resurssit sijaitsivat enimmĂ€kseen verkon liittymĂ€kohdissa ja kytkimissĂ€. LisĂ€ksi tehonkulutuksen huomattiin olevan noin 2–3 kertaa suurempi, joskin sen havaittiin koostuvan pÀÀosin dynaamisesta kulutuksesta

    Design and Verification of a Round-Robin Arbiter

    Get PDF
    As the number of bus masters increases in chip, the performance of a system largely depends on the arbitration scheme. The throughput of the system is affected by the arbiter circuit which controls the grant for various requestors. An arbitration scheme is usually chosen based on the application. A memory arbiter decides which CPU will get access for each cycle. A packet switch uses an arbiter to decide which input packet will be scheduled to the output. This paper introduces a Round-robin arbitration with adjustable weight of resource access time. The Round-robin arbiter mechanism is useful when no starvation of grants is allowed. The arbiter quantizes time shares each requestor is allowed to have. A minimal fairness is guaranteed by granting requestors in Round-robin manner. The requestors can prioritize their time shares by the weight. For example, if requestor A has a weight of two and requestor B has a weight of four, arbiter will allocate requestor B with time slice two times longer than that of requestor A’s. The verification of the design is carried out using SystemVerilog. The inputs of the arbiter are randomized, outputs are predicted in a software model and verification coverage is collected. The work in this paper includes design and verification of a weighted Round-robin arbiter

    Simplifying the Creation of Multi-core Processors: An Interconnection Architecture and Tool Framework

    Get PDF
    The contribution of this thesis is two-fold: an on-chip interconnection architecture designed specifically for multi-core processors and a tool framework that simplifies the process of designing a multi-core processor. Both contributions primarily target ASIC fabrication, though prototyping on an FPGA is also supported. SG-Multi, the on-chip interconnection architecture, distinguishes itself from other interconnection architectures by emphasizing universal adaptability; that is, a primary design goal is to ensure compatibility with industry-supplied cores originally intended for other architectures. This goal is achieved through the use of bus adapters and without introducing clock cycle latency. SG-Multi is a multi-bus architecture that uses slave-side arbitration and supports multiple simultaneous transactions between independent devices. All transactions are pipelined in two stages, an address phase and a data phase, and for improved performance slave devices must signal their status for a given clock cycle at the beginning of that cycle. SG-Multi Designer, the tool framework which builds systems that use SG-Multi, provides a higher level of abstraction compared to other competing system-building solutions; the set of components with which a designer must be concerned is much more limited, and low-level details such as hardware interface compatibility are removed from active consideration. Experimental results demonstrate that the hardware cost of using SG-Multi is reasonable compared to using a processor's native bus architecture, although the current implementation of arbitration is identifiable as an area for future improvement. It is also shown that SG-Multi is scalable; the reference systems grow linearly with respect to the number of cores when tested for ASIC fabrication and slightly sublinearly when tested for FPGA prototyping, and the maximum achievable clock frequency remains almost constant as the number of cores grows beyond four. Because the reference systems tested are an accurate reflection of the types of systems SG-Multi Designer produces, it is concluded that the abstraction model used by SG-Multi Designer does not over-simplify the design process in a way that causes excessive performance degradation or increased hardware resource consumption

    Design of an asynchronous processor

    Get PDF

    Design Space Exploration of FPGA-Based NoC Routers

    Get PDF
    Currently, FPGAs serve as Field–Programmable–Systems–on–Chip (FPSoCs) and are widely used to implement computationally intensive applications. As the number of components in FPSoCs increases, the interconnect schemes based on Network–on–Chip (NoC) approach are increasingly used. Routers greatly impact the performance and cost of NoCs. In this thesis, we explore the design space of FPGA–based NoC routers. We implement three types of packet switched NoC routers on a Stratix II FPGA using parameterized VHDL models. To reduce the area and increase the speed, we use novel techniques. Buffer size is decreased by minimizing the number of control fields in a packet. Both edges of the clock are utilized, and credit based flow control is used to accelerate the router. The proposed routers were evaluated based on area, frequency, and zero load latency. Synthesis results and zero load latency evaluations show that they are significantly superior to widely referenced, previously proposed routers

    Enabling Task Level Parallelism in HandelC

    Get PDF
    HandelC is a programming language used to target hardware and is similar in syntax to ANSI-C. HandelC offers constructs that allow programmers to express instruction level parallelism. Also, HandelC offers primitives that allow task level parallelism. However, HandelC does not offer any runtime support that enables programmers to express task level parallelism efficiently. This thesis discusses this issue and suggests a support library called HCthreads as a solution. HCthreads offers a subset of Pthreads functionality and interface relevant to the HandelC environment. This study offers means to identify the best configuration of HCthreads to achieve the highest speedups in real systems. This thesis investigates the issue of integrating HandelC within platforms not supported by Celoxica. A support library is implemented to solve this issue by utilizing the high level abstractions offered by Hthreads. This support library abstracts away any HWTI specific synchronization making the coding experience quite close to software. HCthreads is proven effective and generic for various algorithms with different threading behaviors. HCthreads is an adequate method to implement recursive algorithms even if no task level parallelism is warranted. Not only HCthreads offers such versatility, it achieves modest speedups over instruction level parallelism ad-hoc approaches. The Hthreads support library served its intended purpose by allowing HCthreads real system tests to proceed on a third party platform. No major issues were reported while conducting these tests, still additional investigation and verification is required

    Low power digital signal processing

    Get PDF

    System-on-Chip design of a high performance low power full hardware cabac encoder in H.264/AVC

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    • 

    corecore