277 research outputs found

    DeSyRe: on-Demand System Reliability

    No full text
    The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints

    Domain-specific and reconfigurable instruction cells based architectures for low-power SoC

    Get PDF

    Proactive Aging Mitigation in CGRAs through Utilization-Aware Allocation

    Full text link
    Resource balancing has been effectively used to mitigate the long-term aging effects of Negative Bias Temperature Instability (NBTI) in multi-core and Graphics Processing Unit (GPU) architectures. In this work, we investigate this strategy in Coarse-Grained Reconfigurable Arrays (CGRAs) with a novel application-to-CGRA allocation approach. By introducing important extensions to the reconfiguration logic and the datapath, we enable the dynamic movement of configurations throughout the fabric and allow overutilized Functional Units (FUs) to recover from stress-induced NBTI aging. Implementing the approach in a resource-constrained state-of-the-art CGRA reveals 2.2×2.2\times lifetime improvement with negligible performance overheads and less than 10%10\% increase in area.Comment: Please cite this as: M. Brandalero, B. N. Lignati, A. Carlos Schneider Beck, M. Shafique and M. H\"ubner, "Proactive Aging Mitigation in CGRAs through Utilization-Aware Allocation," 2020 57th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 2020, pp. 1-6, doi: 10.1109/DAC18072.2020.921858

    Coarse-grained reconfigurable array architectures

    Get PDF
    Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code

    Dynamically reconfigurable asynchronous processor

    Get PDF
    The main design requirements for today's mobile applications are: · high throughput performance. · high energy efficiency. · high programmability. Until now, the choice of platform has often been limited to Application-Specific Integrated Circuits (ASICs), due to their best-of-breed performance and power consumption. The economies of scale possible with these high-volume markets have traditionally been able to hide the high Non-Recurring Engineering (NRE) costs required for designing and fabricating new ASICs. However, with the NREs and design time escalating with each generation of mobile applications, this practice may be reaching its limit. Designers today are looking at programmable solutions, so that they can respond more rapidly to changes in the market and spread costs over several generations of mobile applications. However, there have been few feasible alternatives to ASICs: Digital Signals Processors (DSPs) and microprocessors cannot meet the throughput requirements, whereas Field-Programmable Gate Arrays (FPGAs) require too much area and power. Coarse-grained dynamically reconfigurable architectures offer better solutions for high throughput applications, when power and area considerations are taken into account. One promising example is the Reconfigurable Instruction Cell Array (RICA). RICA consists of an array of cells with an interconnect that can be dynamically reconfigured on every cycle. This allows quite complex datapaths to be rendered onto the fabric and executed in a single configuration - making these architectures particularly suitable to stream processing. Furthermore, RICA can be programmed from C, making it a good fit with existing design methodologies. However the RICA architecture has a drawback: poor scalability in terms of area and power. As the core gets bigger, the number of sequential elements in the array must be increased significantly to maintain the ability to achieve high throughputs through pipelining. As a result, a larger clock tree is required to synchronise the increased number of sequential elements. The clock tree therefore takes up a larger percentage of the area and power consumption of the core. This thesis presents a novel Dynamically Reconfigurable Asynchronous Processor (DRAP), aimed at high-throughput mobile applications. DRAP is based on the RICA architecture, but uses asynchronous design techniques - methods of designing digital systems without clocks. The absence of a global clock signal makes DRAP more scalable in terms of power and area overhead than its synchronous counterpart. The DRAP architecture maintains most of the benefits of custom asynchronous design, whilst also providing programmability via conventional high-level languages. Results show that the DRAP processor delivers considerably lower power consumption when compared to a market-leading Very Long Instruction Word (VLIW) processor and a low-power ARM processor. For example, DRAP resulted in a reduction in power consumption of 20 times compared to the ARM7 processor, and 29 times compared to the TIC64x VLIW, when running the same benchmark capped to the same throughput and for the same process technology (0.13μm). When compared to an equivalent RICA design, DRAP was up to 22% larger than RICA but resulted in a power reduction of up to 1.9 times. It was also capable of achieving up to 2.8 times higher throughputs than RICA for the same benchmarks

    A Micro Power Hardware Fabric for Embedded Computing

    Get PDF
    Field Programmable Gate Arrays (FPGAs) mitigate many of the problemsencountered with the development of ASICs by offering flexibility, faster time-to-market, and amortized NRE costs, among other benefits. While FPGAs are increasingly being used for complex computational applications such as signal and image processing, networking, and cryptology, they are far from ideal for these tasks due to relatively high power consumption and silicon usage overheads compared to direct ASIC implementation. A reconfigurable device that exhibits ASIC-like power characteristics and FPGA-like costs and tool support is desirable to fill this void. In this research, a parameterized, reconfigurable fabric model named as domain specific fabric (DSF) is developed that exhibits ASIC-like power characteristics for Digital Signal Processing (DSP) style applications. Using this model, the impact of varying different design parameters on power and performance has been studied. Different optimization techniques like local search and simulated annealing are used to determine the appropriate interconnect for a specific set of applications. A design space exploration tool has been developed to automate and generate a tailored architectural instance of the fabric.The fabric has been synthesized on 160 nm cell-based ASIC fabrication process from OKI and 130 nm from IBM. A detailed power-performance analysis has been completed using signal and image processing benchmarks from the MediaBench benchmark suite and elsewhere with comparisons to other hardware and software implementations. The optimized fabric implemented using the 130 nm process yields energy within 3X of a direct ASIC implementation, 330X better than a Virtex-II Pro FPGA and 2016X better than an Intel XScale processor

    Low power digital signal processing

    Get PDF

    Energy-Efficient Computing for Mobile Signal Processing

    Full text link
    Mobile devices have rapidly proliferated, and deployment of handheld devices continues to increase at a spectacular rate. As today's devices not only support advanced signal processing of wireless communication data but also provide rich sets of applications, contemporary mobile computing requires both demanding computation and efficiency. Most mobile processors combine general-purpose processors, digital signal processors, and hardwired application-specific integrated circuits to satisfy their high-performance and low-power requirements. However, such a heterogeneous platform is inefficient in area, power and programmability. Improving the efficiency of programmable mobile systems is a critical challenge and an active area of computer systems research. SIMD (single instruction multiple data) architectures are very effective for data-level-parallelism intense algorithms in mobile signal processing. However, new characteristics of advanced wireless/multimedia algorithms require architectural re-evaluation to achieve better energy efficiency. Therefore, fourth generation wireless protocol and high definition mobile video algorithms are analyzed to enhance a wide-SIMD architecture. The key enhancements include 1) programmable crossbar to support complex data alignment, 2) SIMD partitioning to support fine-grain SIMD computation, and 3) fused operation to support accelerating frequently used instruction pairs. Near-threshold computation has been attractive in low-power architecture research because it balances performance and power. To further improve energy efficiency in mobile computing, near-threshold computation is applied to a wide SIMD architecture. This proposed near-threshold wide SIMD architecture-Diet SODA-presents interesting architectural design decisions such as 1) very wide SIMD datapath to compensate for degraded performance induced by near-threshold computation and 2) scatter-gather data prefetcher to exploit large latency gap between memory and the SIMD datapath. Although near-threshold computation provides excellent energy efficiency, it suffers from increased delay variations. A systematic study of delay variations in near-threshold computing is performed and simple techniques-structural duplication and voltage/frequency margining-are explored to tolerate and mitigate the delay variations in near-threshold wide SIMD architectures. This dissertation analyzes representative wireless/multimedia mobile signal processing algorithms, proposes an energy-efficient programmable platform, and evaluates performance and power. A main theme of this dissertation is that the performance and efficiency of programmable embedded systems can be significantly improved with a combination of parallel SIMD and near-threshold computations.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/86356/1/swseo_1.pd
    corecore