1,431 research outputs found
Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review
The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER
Choose-Your-Own Adventure: A Lightweight, High-Performance Approach To Defect And Variation Mitigation In Reconfigurable Logic
For field-programmable gate arrays (FPGAs), fine-grained pre-computed alternative configurations, combined with simple test-based selection, produce limited per-chip specialization to counter yield loss, increased delay, and increased energy costs that come from fabrication defects and variation. This lightweight approach achieves much of the benefit of knowledge-based full specialization while reducing to practical, palatable levels the computational, testing, and load-time costs that obstruct the application of the knowledge-based approach. In practice this may more than double the power-limited computational capabilities of dies fabricated with 22nm technologies.
Contributions of this work:
• Choose-Your-own-Adventure (CYA), a novel, lightweight, scalable methodology to achieve defect and variation mitigation
• Implementation of CYA, including preparatory components (generation of diverse alternative paths) and FPGA load-time components
• Detailed performance characterization of CYA
– Comparison to conventional loading and dynamic frequency and voltage scaling (DFVS)
– Limit studies to characterize the quality of the CYA implementation and identify potential areas for further optimizatio
Hybrid FPGA: Architecture and Interface
Hybrid FPGAs (Field Programmable Gate Arrays) are composed of general-purpose logic resources
with different granularities, together with domain-specific coarse-grained units. This thesis proposes
a novel hybrid FPGA architecture with embedded coarse-grained Floating Point Units (FPUs) to
improve the floating point capability of FPGAs. Based on the proposed hybrid FPGA architecture,
we examine three aspects to optimise the speed and area for domain-specific applications.
First, we examine the interface between large coarse-grained embedded blocks (EBs) and fine-grained
elements in hybrid FPGAs. The interface includes parameters for varying: (1) aspect ratio of EBs,
(2) position of the EBs in the FPGA, (3) I/O pins arrangement of EBs, (4) interconnect flexibility of
EBs, and (5) location of additional embedded elements such as memory.
Second, we examine the interconnect structure for hybrid FPGAs. We investigate how large and highdensity
EBs affect the routing demand for hybrid FPGAs over a set of domain-specific applications.
We then propose three routing optimisation methods to meet the additional routing demand introduced
by large EBs: (1) identifying the best separation distance between EBs, (2) adding routing switches on
EBs to increase routing flexibility, and (3) introducing wider channel width near the edge of EBs. We
study and compare the trade-offs in delay, area and routability of these three optimisation methods.
Finally, we employ common subgraph extraction to determine the number of floating point adders/subtractors,
multipliers and wordblocks in the FPUs. The wordblocks include registers and can implement fixed
point operations. We study the area, speed and utilisation trade-offs of the selected FPU subgraphs
in a set of floating point benchmark circuits. We develop an optimised coarse-grained FPU, taking
into account both architectural and system-level issues. Furthermore, we investigate the trade-offs
between granularities and performance by composing small FPUs into a large FPU.
The results of this thesis would help design a domain-specific hybrid FPGA to meet user requirements,
by optimising for speed, area or a combination of speed and area
Circuit design and analysis for on-FPGA communication systems
On-chip communication system has emerged as a prominently important subject in Very-Large-
Scale-Integration (VLSI) design, as the trend of technology scaling favours logics more than interconnects.
Interconnects often dictates the system performance, and, therefore, research for new
methodologies and system architectures that deliver high-performance communication services
across the chip is mandatory. The interconnect challenge is exacerbated in Field-Programmable
Gate Array (FPGA), as a type of ASIC where the hardware can be programmed post-fabrication.
Communication across an FPGA will be deteriorating as a result of interconnect scaling. The programmable
fabrics, switches and the specific routing architecture also introduce additional latency
and bandwidth degradation further hindering intra-chip communication performance.
Past research efforts mainly focused on optimizing logic elements and functional units in FPGAs.
Communication with programmable interconnect received little attention and is inadequately understood.
This thesis is among the first to research on-chip communication systems that are built on
top of programmable fabrics and proposes methodologies to maximize the interconnect throughput
performance. There are three major contributions in this thesis: (i) an analysis of on-chip
interconnect fringing, which degrades the bandwidth of communication channels due to routing
congestions in reconfigurable architectures; (ii) a new analogue wave signalling scheme that significantly
improves the interconnect throughput by exploiting the fundamental electrical characteristics
of the reconfigurable interconnect structures. This new scheme can potentially mitigate
the interconnect scaling challenges. (iii) a novel Dynamic Programming (DP)-network to provide
adaptive routing in network-on-chip (NoC) systems. The DP-network architecture performs runtime
optimization for route planning and dynamic routing which, effectively utilizes the in-silicon
bandwidth. This thesis explores a new horizon in reconfigurable system design, in which new
methodologies and concepts are proposed to enhance the on-FPGA communication throughput
performance that is of vital importance in new technology processes
The Design of a System Architecture for Mobile Multimedia Computers
This chapter discusses the system architecture of a portable computer, called Mobile Digital Companion, which provides support for handling multimedia applications energy efficiently. Because battery life is limited and battery weight is an important factor for the size and the weight of the Mobile Digital Companion, energy management plays a crucial role in the architecture. As the Companion must remain usable in a variety of environments, it has to be flexible and adaptable to various operating conditions. The Mobile Digital Companion has an unconventional architecture that saves energy by using system decomposition at different levels of the architecture and exploits locality of reference with dedicated, optimised modules. The approach is based on dedicated functionality and the extensive use of energy reduction techniques at all levels of system design. The system has an architecture with a general-purpose processor accompanied by a set of heterogeneous autonomous programmable modules, each providing an energy efficient implementation of dedicated tasks. A reconfigurable internal communication network switch exploits locality of reference and eliminates wasteful data copies
Fault and Defect Tolerant Computer Architectures: Reliable Computing With Unreliable Devices
This research addresses design of a reliable computer from unreliable device technologies. A system architecture is developed for a fault and defect tolerant (FDT) computer. Trade-offs between different techniques are studied and yield and hardware cost models are developed. Fault and defect tolerant designs are created for the processor and the cache memory. Simulation results for the content-addressable memory (CAM)-based cache show 90% yield with device failure probabilities of 3 x 10(-6), three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10(-6). The required hardware redundancy is approximately 15 times that of a non-fault tolerant design. While larger than current FT designs, this architecture allows the use of devices much more likely to fail than silicon CMOS. As part of model development, an improved model is derived for NAND Multiplexing. The model is the first accurate model for small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results
Low-Power, High-Speed Transceivers for Network-on-Chip Communication
Networks on chips (NoCs) are becoming popular as they provide a solution for the interconnection problems on large integrated circuits (ICs). But even in a NoC, link-power can become unacceptably high and data rates are limited when conventional data transceivers are used. In this paper, we present a low-power, high-speed source-synchronous link transceiver which enables a factor 3.3 reduction in link power together with an 80% increase in data-rate. A low-swing capacitive pre-emphasis transmitter in combination with a double-tail sense-amplifier enable speeds in excess of 9 Gb/s over a 2 mm twisted differential interconnect, while consuming only 130 fJ/transition without the need for an additional supply. Multiple transceivers can be connected back-to-back to create a source-synchronous transceiver-chain with a wave-pipelined clock, operating with 6sigma offset reliability at 5 Gb/s
Interconnect yield analysis and fault tolerance for field programmable gate arrays
Imperial Users onl
- …