136 research outputs found

    FPGA multi-core processors power consumption: soft- core vs. hard-core

    Get PDF
    Actually multi-core processors designs are limited in power consumption and performance. Consequently, it is not possible to optimize further the performance without increasing power consumption. The main challenge in multi-core processors is the fact that they have heterogeneous hardware components. This article will study different technologies for implementing multi-core processors in FPGA devices. The minimum requirement to ensure low power consumption is parallelism. The purpose of this study is to highlight the latest methodologies used in terms of environment, clock signal, testing, flexibility, cost, availability and power consumption

    The MANGO clockless network-on-chip: Concepts and implementation

    Get PDF

    Elastic bundles :modelling and architecting asynchronous circuits with granular rigidity

    Get PDF
    PhD ThesisIntegrated Circuit (IC) designs these days are predominantly System-on-Chips (SoCs). The complexity of designing a SoC has increased rapidly over the years due to growing process and environmental variations coupled with global clock distribution di culty. Moreover, traditional synchronous design is not apt to handle the heterogeneous timing nature of modern SoCs. As a countermeasure, the semiconductor industry witnessed a strong revival of asynchronous design principles. A new paradigm of digital circuits emerged, as a result, namely mixed synchronous-asynchronous circuits. With a wave of recent innovations in synchronous-asynchronous CAD integration, this paradigm is showing signs of commercial adoption in future SoCs mainly due to the scope for reuse of synchronous functional blocks and IP cores, and the co-existence of synchronous and asynchronous design styles in a common EDA framework. However, there is a lack of formal methods and tools to facilitate mixed synchronousasynchronous design. In this thesis, we propose a formal model based on Petri nets with step semantics to describe these circuits behaviourally. Implication of this model in the veri cation and synthesis of mixed synchronous-asynchronous circuits is studied. Till date, this paradigm has been mainly explored on the basis of Globally Asynchronous Locally Synchronous (GALS) systems. Despite decades of research, GALS design has failed to gain traction commercially. To understand its drawbacks, a simulation framework characterising the physical and functional aspects of GALS SoCs is presented. A novel method for synthesising mixed synchronous-asynchronous circuits with varying levels of rigidity is proposed. Starting with a high-level data ow model of a system which is intrinsically asynchronous, the key idea is to introduce rigidity of chosen granularity levels in the model without changing functional behaviour. The system is then partitioned into functional blocks of synchronous and asynchronous elements before being transformed into an equivalent circuit which can be synthesised using standard EDA tools

    Multi-Softcore Architecture on FPGA

    Get PDF
    To meet the high performance demands of embedded multimedia applications, embedded systems are integrating multiple processing units. However, they are mostly based on custom-logic design methodology. Designing parallel multicore systems using available standards intellectual properties yet maintaining high performance is also a challenging issue. Softcore processors and field programmable gate arrays (FPGAs) are a cheap and fast option to develop and test such systems. This paper describes a FPGA-based design methodology to implement a rapid prototype of parametric multicore systems. A study of the viability of making the SoC using the NIOS II soft-processor core from Altera is also presented. The NIOS II features a general-purpose RISC CPU architecture designed to address a wide range of applications. The performance of the implemented architecture is discussed, and also some parallel applications are used for testing speedup and efficiency of the system. Experimental results demonstrate the performance of the proposed multicore system, which achieves better speedup than the GPU (29.5% faster for the FIR filter and 23.6% faster for the matrix-matrix multiplication)

    A multi-synchronous bi-directional NoC (MBiNoC) architecture with dynamic self-reconfigurable channel for the GALS infrastructure

    Get PDF
    Abstract To enhance the performance of on-chip communications of Globally Asynchronous Locally Synchronous Systems (GALS), a dynamic reconfigurable multi-synchronous router architecture is proposed to increase network on chip (NoC) efficiency by changing the path of the communication link in the runtime traffic situation. In order to address GALS issues and bandwidth requirements, the proposed multi-synchronous bidirectional NoC’s router is developed and it guarantees higher packet consumption rate, better bandwidth utilization with lower packet delivery latency. All the input/output ports of the proposed router behave as a bi-directional ports and communicate through a novel multi-synchronous first-in first-out (FIFO) buffer. The bidirectional port is controlled by a dynamic channel control module which provides a dynamic reconfigurable channel to the router itself and associated sub-modules. This proposed multi-synchronous bidirectional router architecture is synthesized using Xilinx ISE 14.7 and FPGA Virtex 6 family device XC6VLX760 is considered as target technology and its performance is evaluated in terms of power, area and delay.Peer ReviewedPostprint (author's final draft

    Programmable flexible cores for SoC applications

    Get PDF
    Tese de mestrado. Engenharia Electrotécnica e de Computadores. Faculdade de Engenharia. Universidade do Porto. 200

    Memory hierarchy and data communication in heterogeneous reconfigurable SoCs

    Get PDF
    The miniaturization race in the hardware industry aiming at continuous increasing of transistor density on a die does not bring respective application performance improvements any more. One of the most promising alternatives is to exploit a heterogeneous nature of common applications in hardware. Supported by reconfigurable computation, which has already proved its efficiency in accelerating data intensive applications, this concept promises a breakthrough in contemporary technology development. Memory organization in such heterogeneous reconfigurable architectures becomes very critical. Two primary aspects introduce a sophisticated trade-off. On the one hand, a memory subsystem should provide well organized distributed data structure and guarantee the required data bandwidth. On the other hand, it should hide the heterogeneous hardware structure from the end-user, in order to support feasible high-level programmability of the system. This thesis work explores the heterogeneous reconfigurable hardware architectures and presents possible solutions to cope the problem of memory organization and data structure. By the example of the MORPHEUS heterogeneous platform, the discussion follows the complete design cycle, starting from decision making and justification, until hardware realization. Particular emphasis is made on the methods to support high system performance, meet application requirements, and provide a user-friendly programmer interface. As a result, the research introduces a complete heterogeneous platform enhanced with a hierarchical memory organization, which copes with its task by means of separating computation from communication, providing reconfigurable engines with computation and configuration data, and unification of heterogeneous computational devices using local storage buffers. It is distinguished from the related solutions by distributed data-flow organization, specifically engineered mechanisms to operate with data on local domains, particular communication infrastructure based on Network-on-Chip, and thorough methods to prevent computation and communication stalls. In addition, a novel advanced technique to accelerate memory access was developed and implemented

    Asynchronous Bypass Channels Improving Performance for Multi-synchronous Network-on-chips

    Get PDF
    Dr. Paul V. Gratz Network-on-Chip (NoC) designs have emerged as a replacement for traditional shared-bus designs for on-chip communications. As with all current VLSI design, however, reducing power consumption in NoCs is a critical challenge. One approach to reduce power is to dynamically scale the voltage and frequency of each network node or groups of nodes (DVFS). Another approach to reduce power consumption is to replace the balanced clock tree with a globally-asynchronous, locally-synchronous (GALS) clocking scheme. NoCs implemented with either of these schemes, however, tend to have high latencies as packets must be synchronized at the intermediate nodes between source and destination. In this work, we propose a novel router microarchitecture which offers superior performance versus typical synchroniz- ing router designs. Our approach features Asynchronous Bypass Channels (ABCs) at intermediate nodes thus avoiding synchronization delay. We also propose a new network topology and routing algorithm that leverage the advantages of the bypass channel offered by our router design. Our experiments show that our design improves the performance of a conventional synchronizing design with similar resources by up to 26 percent at low loads and increases saturation throughput by up to 50 percent

    Automated Debugging Methodology for FPGA-based Systems

    Get PDF
    Electronic devices make up a vital part of our lives. These are seen from mobiles, laptops, computers, home automation, etc. to name a few. The modern designs constitute billions of transistors. However, with this evolution, ensuring that the devices fulfill the designer’s expectation under variable conditions has also become a great challenge. This requires a lot of design time and effort. Whenever an error is encountered, the process is re-started. Hence, it is desired to minimize the number of spins required to achieve an error-free product, as each spin results in loss of time and effort. Software-based simulation systems present the main technique to ensure the verification of the design before fabrication. However, few design errors (bugs) are likely to escape the simulation process. Such bugs subsequently appear during the post-silicon phase. Finding such bugs is time-consuming due to inherent invisibility of the hardware. Instead of software simulation of the design in the pre-silicon phase, post-silicon techniques permit the designers to verify the functionality through the physical implementations of the design. The main benefit of the methodology is that the implemented design in the post-silicon phase runs many order-of-magnitude faster than its counterpart in pre-silicon. This allows the designers to validate their design more exhaustively. This thesis presents five main contributions to enable a fast and automated debugging solution for reconfigurable hardware. During the research work, we used an obstacle avoidance system for robotic vehicles as a use case to illustrate how to apply the proposed debugging solution in practical environments. The first contribution presents a debugging system capable of providing a lossless trace of debugging data which permits a cycle-accurate replay. This methodology ensures capturing permanent as well as intermittent errors in the implemented design. The contribution also describes a solution to enhance hardware observability. It is proposed to utilize processor-configurable concentration networks, employ debug data compression to transmit the data more efficiently, and partially reconfiguring the debugging system at run-time to save the time required for design re-compilation as well as preserve the timing closure. The second contribution presents a solution for communication-centric designs. Furthermore, solutions for designs with multi-clock domains are also discussed. The third contribution presents a priority-based signal selection methodology to identify the signals which can be more helpful during the debugging process. A connectivity generation tool is also presented which can map the identified signals to the debugging system. The fourth contribution presents an automated error detection solution which can help in capturing the permanent as well as intermittent errors without continuous monitoring of debugging data. The proposed solution works for designs even in the absence of golden reference. The fifth contribution proposes to use artificial intelligence for post-silicon debugging. We presented a novel idea of using a recurrent neural network for debugging when a golden reference is present for training the network. Furthermore, the idea was also extended to designs where golden reference is not present

    Comparison of multi-layer bus interconnection and a network on chip solution

    Get PDF
    Abstract. This thesis explains the basic subjects that are required to take in consideration when designing a network on chip solutions in the semiconductor world. For example, general topologies such as mesh, torus, octagon and fat tree are explained. In addition, discussion related to network interfaces, switches, arbitration, flow control, routing, error avoidance and error handling are provided. Furthermore, there is discussion related to design flow, a computer aided designing tools and a few comprehensive researches. However, several networks are designed for the minimum latency, although there are also versions which trade performance for decreased bus widths. These designed networks are compared with a corresponding multi-layer bus interconnection and both synthesis and register transfer level simulations are run. For example, results from throughput, latency, logic area and power consumptions are gathered and compared. It was discovered that overall throughput was well balanced with the network on chip solutions, although its maximum throughput was limited by protocol conversions. For example, the multi-layer bus interconnection was capable of providing a few times smaller latencies and higher throughputs when only a single interface was injected at the time. However, with parallel traffic and high-performance requirements a network on chip solution provided better results, even though the difference decreased when performance requirements were lower. Furthermore, it was discovered that the network on chip solutions required approximately 3–4 times higher total cell area than the multi-layer bus interconnection and that resources were mainly located at network interfaces and switches. In addition, power consumption was approximately 2–3 times higher and was mostly caused by dynamic consumption.Monitasoisen väyläarkkitehtuurin ja tietokoneverkkomaisen ratkaisun vertailua. Tiivistelmä. Tutkielmassa käsitellään tärkeimpiä aihealueita, jotka tulee huomioida suunniteltaessa tietokoneverkkomaisia väyläratkaisuja puolijohdemaailmassa. Esimerkiksi yleiset rakenteet, kuten verkko-, torus-, kahdeksankulmio- ja puutopologiat käsitellään lyhyesti. Lisäksi alustetaan verkon liitäntäkohdat, kytkimet, vuorottelu, vuon hallinta, reititys, virheiden välttely ja -käsittely. Lopuksi kerrotaan suunnitteluvuon oleellisimmat välivaiheet ja niihin soveltuvia kaupallisia työkaluja, sekä käsitellään lyhyesti muutaman aiemman julkaisun tuloksia. Tutkielmassa käytetään suunnittelutyökalua muutaman tietokoneverkkomaisen ratkaisun toteutukseen ja tavoitteena on saavuttaa pienin mahdollinen latenssi. Toisaalta myös hieman suuremman latenssin versioita suunnitellaan, mutta pienemmillä väylänleveyksillä. Lisäksi suunniteltuja tietokoneverkkomaisia ratkaisuja vertaillaan perinteisempään monitasoiseen väyläarkkitehtuuriin. Esimerkiksi synteesi- ja simulaatiotuloksia, kuten logiikan vaatimaa pinta-alaa, tehonkulutusta, latenssia ja suorituskykyä, vertaillaan keskenään. Tutkielmassa selvisi, että suunnittelutyökalulla toteutetut tietokoneverkkomaiset ratkaisut mahdollistivat tasaisemman suorituskyvyn, joskin niiden suurin saavutettu suorituskyky ja pienin latenssi määräytyivät protokollan käännöksen aiheuttamasta viiveestä. Tutkielmassa havaittiin, että perinteisemmillä menetelmillä saavutettiin noin kaksi kertaa suurempi suorituskyky ja pienempi latenssi, kun verkossa ei ollut muuta liikennettä. Rinnakkaisen liikenteen lisääntyessä tietokoneverkkomainen ratkaisu tarjosi keskimäärin paremman suorituskyvyn, kun sille asetetut tehokkuusvaateet olivat suuret, mutta suorituskykyvaatimuksien laskiessa erot kapenivat. Lisäksi huomattiin, että tietokoneverkkomaisten ratkaisujen käyttämä pinta-ala oli noin 3–4 kertaa suurempi kuin monitasoisella väyläarkkitehtuurilla ja että resurssit sijaitsivat enimmäkseen verkon liittymäkohdissa ja kytkimissä. Lisäksi tehonkulutuksen huomattiin olevan noin 2–3 kertaa suurempi, joskin sen havaittiin koostuvan pääosin dynaamisesta kulutuksesta
    corecore