64 research outputs found

    Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

    Get PDF
    The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

    A multi-synchronous bi-directional NoC (MBiNoC) architecture with dynamic self-reconfigurable channel for the GALS infrastructure

    Get PDF
    Abstract To enhance the performance of on-chip communications of Globally Asynchronous Locally Synchronous Systems (GALS), a dynamic reconfigurable multi-synchronous router architecture is proposed to increase network on chip (NoC) efficiency by changing the path of the communication link in the runtime traffic situation. In order to address GALS issues and bandwidth requirements, the proposed multi-synchronous bidirectional NoC’s router is developed and it guarantees higher packet consumption rate, better bandwidth utilization with lower packet delivery latency. All the input/output ports of the proposed router behave as a bi-directional ports and communicate through a novel multi-synchronous first-in first-out (FIFO) buffer. The bidirectional port is controlled by a dynamic channel control module which provides a dynamic reconfigurable channel to the router itself and associated sub-modules. This proposed multi-synchronous bidirectional router architecture is synthesized using Xilinx ISE 14.7 and FPGA Virtex 6 family device XC6VLX760 is considered as target technology and its performance is evaluated in terms of power, area and delay.Peer ReviewedPostprint (author's final draft

    Architecture and Control of a Digital Frequency-Locked Loop for Fine-Grain Dynamic Voltage and Frequency Scaling in Globally Asynchronous Locally Synchronous Structures

    Get PDF
    International audienceA small area fast-reprogrammable Digital Frequency-Locked Loop (DFLL) engine is presented as a solution for the Dynamic Voltage and Frequency Scaling (DVFS) circuitry in Globally Asynchronous Locally Synchronous (GALS) architectures implemented in 32 nm CMOS technology. The DFLL control is designed so that the closed-loop system is able to cope with process variability while it rejects temperature changes and supply voltage slow variations. Therefore the DFLL is made of three main blocks, namely a Digitally Controlled Oscillator (DCO), a "sensor" that measures the frequency of the signal at the output of the DCO and a controller. A strong emphasis is set on the loop filter architecture choice and the tuning of its parameters. An analytical model of the DCO is deduced from accurate Spice simulations. The delay introduced by the sensor is also taken into account to design. From these models, an optimal and robust controller with a minimum implementation area is developed. Here, "optimal" means that the controller is computed via the minimization of a given criterion while the "robustness" capability ensures that the closed-loop system is tolerant to process and temperature variations in a given range. Therefore, performances of the closed-loop system are ensured whatever the system characteristics are in a given range

    Master of Science

    Get PDF
    thesisIntegrated circuits often consist of multiple processing elements that are regularly tiled across the two-dimensional surface of a die. This work presents the design and integration of high speed relative timed routers for asynchronous network-on-chip. It researches NoC's efficiency through simplicity by directly translating simple T-router, source-routing, single-flit packet to higher radix routers. This work is intended to study performance and power trade-offs adding higher radix routers, 3D topologies, Virtual Channels, Accurate NoC modeling, and Transmission line communication links. Routers with and without virtual channels are designed and integrated to arrayed communication networks. Furthermore, the work investigates 3D networks with diffusive RC wires and transmission lines on long wrap interconnects

    Doctor of Philosophy

    Get PDF
    dissertationThe bandwidth requirement for each link on a network-on-chip (NoC) may differ based on topology and traffic properties of the IP cores. Available bandwidth on an asynchronous NoC link will also vary depending on the wire length between sender and receiver. This work explores the benefit to NoC performance, area, and energy when this property is used to optimize bandwidth on specific links based on its bandwidth required by a target SoC design. Three asynchronous routers were designed for implementing of asynchronous NoCs. Simple routing scheme and single-flit packet format lead to performance- and area-efficient router designs. Their performance was evaluated in consideration of link wire delay. Comprehensive analysis of pipeline latch insertion in asynchronous communication links is performed in regard to link bandwidth. Optimal placement of pipeline latch for maximizing benefit to increase of bandwidth is described. Specific methods are proposed for performance, area and energy optimization, respectively. Performance optimization is achieved by increasing bandwidth of high trafficked and high utilized links in an NoC, as inserting pipeline latches in those links. Through decrease of bandwidth of links with low traffic and low utilization by halving data-path width, reduction of wire area of an NoC is accomplished. Energy optimization is performed using wide spacing between wires in links with high energy consumption. An analytical model for asynchronous link bandwidth estimation is presented. It is utilized to deploy NoC optimization methods as identifying adequate links for each optimization method. Energy and latency characteristics of an asynchronous NoC are compared to a similarly-designed synchronous NoC. The results indicate that the asynchronous network has lower energy, and link-specific bandwidth optimization has improved NoC performance. Evaluation of proposed optimization methods by employing to an asynchronous NoC shows achievements of performance enhancement, wire area reduction and wire energy saving

    Energy autonomous systems : future trends in devices, technology, and systems

    Get PDF
    The rapid evolution of electronic devices since the beginning of the nanoelectronics era has brought about exceptional computational power in an ever shrinking system footprint. This has enabled among others the wealth of nomadic battery powered wireless systems (smart phones, mp3 players, GPS, …) that society currently enjoys. Emerging integration technologies enabling even smaller volumes and the associated increased functional density may bring about a new revolution in systems targeting wearable healthcare, wellness, lifestyle and industrial monitoring applications

    Exploration and Design of Power-Efficient Networked Many-Core Systems

    Get PDF
    Multiprocessing is a promising solution to meet the requirements of near future applications. To get full benefit from parallel processing, a manycore system needs efficient, on-chip communication architecture. Networkon- Chip (NoC) is a general purpose communication concept that offers highthroughput, reduced power consumption, and keeps complexity in check by a regular composition of basic building blocks. This thesis presents power efficient communication approaches for networked many-core systems. We address a range of issues being important for designing power-efficient manycore systems at two different levels: the network-level and the router-level. From the network-level point of view, exploiting state-of-the-art concepts such as Globally Asynchronous Locally Synchronous (GALS), Voltage/ Frequency Island (VFI), and 3D Networks-on-Chip approaches may be a solution to the excessive power consumption demanded by today’s and future many-core systems. To this end, a low-cost 3D NoC architecture, based on high-speed GALS-based vertical channels, is proposed to mitigate high peak temperatures, power densities, and area footprints of vertical interconnects in 3D ICs. To further exploit the beneficial feature of a negligible inter-layer distance of 3D ICs, we propose a novel hybridization scheme for inter-layer communication. In addition, an efficient adaptive routing algorithm is presented which enables congestion-aware and reliable communication for the hybridized NoC architecture. An integrated monitoring and management platform on top of this architecture is also developed in order to implement more scalable power optimization techniques. From the router-level perspective, four design styles for implementing power-efficient reconfigurable interfaces in VFI-based NoC systems are proposed. To enhance the utilization of virtual channel buffers and to manage their power consumption, a partial virtual channel sharing method for NoC routers is devised and implemented. Extensive experiments with synthetic and real benchmarks show significant power savings and mitigated hotspots with similar performance compared to latest NoC architectures. The thesis concludes that careful codesigned elements from different network levels enable considerable power savings for many-core systems.Siirretty Doriast

    Doctor of Philosophy

    Get PDF
    dissertationCommunication surpasses computation as the power and performance bottleneck in forthcoming exascale processors. Scaling has made transistors cheap, but on-chip wires have grown more expensive, both in terms of latency as well as energy. Therefore, the need for low energy, high performance interconnects is highly pronounced, especially for long distance communication. In this work, we examine two aspects of the global signaling problem. The first part of the thesis focuses on a high bandwidth asynchronous signaling protocol for long distance communication. Asynchrony among intellectual property (IP) cores on a chip has become necessary in a System on Chip (SoC) environment. Traditional asynchronous handshaking protocol suffers from loss of throughput due to the added latency of sending the acknowledge signal back to the sender. We demonstrate a method that supports end-to-end communication across links with arbitrarily large latency, without limiting the bandwidth, so long as line variation can be reliably controlled. We also evaluate the energy and latency improvements as a result of the design choices made available by this protocol. The use of transmission lines as a physical interconnect medium shows promise for deep submicron technologies. In our evaluations, we notice a lower energy footprint, as well as vastly reduced wire latency for transmission line interconnects. We approach this problem from two sides. Using field solvers, we investigate the physical design choices to determine the optimal way to implement these lines for a given back-end-of-line (BEOL) stack. We also approach the problem from a system designer's viewpoint, looking at ways to optimize the lines for different performance targets. This work analyzes the advantages and pitfalls of implementing asynchronous channel protocols for communication over long distances. Finally, the innovations resulting from this work are applied to a network-on-chip design example and the resulting power-performance benefits are reported

    High-Level Design for Ultra-Fast Software Defined Radio Prototyping on Multi-Processors Heterogeneous Platforms

    Get PDF
    International audienceThe design of Software Defined Radio (SDR) equipments (terminals, base stations, etc.) is still very challenging. We propose here a design methodology for ultra-fast prototyping on heterogeneous platforms made of GPPs (General Purpose Processors), DSPs (Digital Signal Processors) and FPGAs (Field Programmable Gate Array). Lying on a component-based approach, the methodology mainly aims at automating as much as possible the design from an algorithmic validation to a multi-processing heterogeneous implementation. The proposed methodology is based on the SynDEx CAD design approach, which was originally dedicated to multi-GPPs networks. We show how this was changed so that it is made appropriate with an embedded context of DSP. The implication of FPGAs is then addressed and integrated in the design approach with very little restrictions. Apart from a manual HW/SW partitioning, all other operations may be kept automatic in a heterogeneous processing context. The targeted granularity of the components, which are to be assembled in the design flow, is roughly the same size as that of a FFT, a filter or a Viterbi decoder for instance. The re-use of third party or pre-developed IPs is a basis for this design approach. Thanks to the proposed design methodology it is possible to port "ultra" fast a radio application over several platforms. In addition, the proposed design methodology is not restricted to SDR equipment design, and can be useful for any real-time embedded heterogeneous design in a prototyping context
    • …
    corecore