424 research outputs found

    A Construction Kit for Efficient Low Power Neural Network Accelerator Designs

    Get PDF
    Implementing embedded neural network processing at the edge requires efficient hardware acceleration that couples high computational performance with low power consumption. Driven by the rapid evolution of network architectures and their algorithmic features, accelerator designs are constantly updated and improved. To evaluate and compare hardware design choices, designers can refer to a myriad of accelerator implementations in the literature. Surveys provide an overview of these works but are often limited to system-level and benchmark-specific performance metrics, making it difficult to quantitatively compare the individual effect of each utilized optimization technique. This complicates the evaluation of optimizations for new accelerator designs, slowing-down the research progress. This work provides a survey of neural network accelerator optimization approaches that have been used in recent works and reports their individual effects on edge processing performance. It presents the list of optimizations and their quantitative effects as a construction kit, allowing to assess the design choices for each building block separately. Reported optimizations range from up to 10'000x memory savings to 33x energy reductions, providing chip designers an overview of design choices for implementing efficient low power neural network accelerators

    Trends and Challenges in CMOS Design for Emerging 60 GHz WPAN Applications

    Get PDF
    International audienceThe extensive growth of wireless communications industry is creating a big market opportunity. Wireless operators are currently searching for new solutions which would be implemented into the existing wireless communication networks to provide the broader bandwidth, the better quality and new value-added services. In the last decade, most commercial efforts were focused on the 1-10 GHz spectrum for voice and data applications for mobile phones and portable computers (Niknejad & Hashemi, 2008). Nowadays, the interest is growing in applications that use high rate wireless communications. Multigigabit- per-second communication requires a very large bandwidth. The Ultra-Wide Band (UWB) technology was basically used for this issue. However, this technology has some shortcomings including problems with interference and a limited data rate. Furthermore, the 3-5 GHz spectrum is relatively crowded with many interferers appearing in the WiFi bands (Niknejad & Hashemi, 2008). The use of millimeter wave frequency band is considered the most promising technology for broadband wireless. In 2001, the Federal Communications Commission (FCC) released a set of rules governing the use of spectrum between 57 and 66 GHz (Baldwin, 2007). Hence, a large bandwidth coupled with high allowable transmit power equals high possible data rates. Traditionally the implementation of 60 GHz radio technology required expensive technologies based on III-V compound semiconductors such as InP and GaAs (Smulders et al., 2007). The rapid progress of CMOS technology has enabled its application in millimeter wave applications. Currently, the transistors became small enough, consequently fast enough. As a result, the CMOS technology has become one of the most attractive choices in implementing 60 GHz radio due to its low cost and high level of integration (Doan et al., 2005). Despite the advantages of CMOS technology, the design of 60 GHz CMOS transceiver exhibits several challenges and difficulties that the designers must overcome. This chapter aims to explore the potential of the 60 GHz band in the use for emergent generation multi-gigabit wireless applications. The chapter presents a quick overview of the state-of-the-art of 60 GHz radio technology and its potentials to provide for high data rate and short range wireless communications. The chapter is organized as follows. Section 2 presents an overview about 60 GHz band. The advantages are presented to highlight the performance characteristics of this band. The opportunities of the physical layer of the IEEE 802.15.3c standard for emerging WPAN applications are discussed in section 3. The tremendous opportunities available with CMOS technology in the design of 60 GHz radio is discussed in section 4. Section 5 shows an example of 60 GHz radio system link. Some challenges and trade-offs on the design issues of circuits and systems for 60 GHz band are reported in section 6. Finally, section 7 presents the conclusion and some perspectives on future directions

    VLSI Revisited - Revival in Japan

    Get PDF
    This paper describes the abundance of semiconductor consortia that have come into existence in Japan since the mid-1990s. They clearly reflect the ambition of the government - through its reorganized ministry METI and company initiatives - to regain some of the industrial and technological leadership that Japan has lost. The consortia landscape is very different in Japan compared with EU and the US. Outside Japan the universities play a much bigger and very important role. In Europe there has emerged close collaboration, among national government agencies, companies and the EU Commission in supporting the IT sector with considerable attention to semiconductor technologies. Another major difference, and possibly the most important one, is the fact that US and EU consortia include and mix partners from different areas of the semiconductor landscape including wafer makers, material suppliers, equipment producers and integrated device makers.semiconductors, Hitachi, Sony, Toshiba, Elpida, Renesas, Sematech, VLSI, JESSI, MEDEA, ASPLA, MIRAI, innovation system

    Limits to Modularity: A Review of the Literature and Evidence from Chip Design

    Get PDF
    This working paper has been prepared as part of the East-West Center's research project on Globalization of Knowledge Work: Why is Chip Design Moving to Asia. In this paper, Dieter assesses what we know about the limits to modularity and their impact on firm organization and industry structure. He focuses on evidence form chip design, drawing on interview on 2002 and 2003 with a sample of 60 companies and 15 research institutions that are involved in chip design in the US, Taiwan, Korea, China and Malaysia. It is summarized "stylized" propositions of the modularity literature that are well-established, as well as predictions that are controversial. In addition, important limits to modularity and relevant management responses were reviewed.

    VLSI REVISITED – REVIVAL IN JAPAN

    Get PDF
    This paper describes the abundance of semiconductor consortia that have come into existence in Japan since the mid-1990s. They clearly reflect the ambition of the government – through its reorganized ministry METI and company initiatives - to regain some of the industrial and technological leadership that Japan has lost. The consortia landscape is very different in Japan compared with EU and the US. Outside Japan the universities play a much bigger and very important role. In Europe there has emerged close collaboration, among national government agencies, companies and the EU Commission in supporting the IT sector with considerable attention to semiconductor technologies. Another major difference, and possibly the most important one, is the fact that US and EU consortia include and mix partners from different areas of the semiconductor landscape including wafer makers, material suppliers, equipment producers and integrated device makers.semiconductors; Hitachi; Sony; Toshiba; Elpida; Renesas; Sematech; VLSI; JESSI; MEDEA; ASPLA; MIRAI; innovation system

    Timing speculation and adaptive reliable overclocking techniques for aggressive computer systems

    Get PDF
    Computers have changed our lives beyond our own imagination in the past several decades. The continued and progressive advancements in VLSI technology and numerous micro-architectural innovations have played a key role in the design of spectacular low-cost high performance computing systems that have become omnipresent in today\u27s technology driven world. Performance and dependability have become key concerns as these ubiquitous computing machines continue to drive our everyday life. Every application has unique demands, as they run in diverse operating environments. Dependable, aggressive and adaptive systems improve efficiency in terms of speed, reliability and energy consumption. Traditional computing systems run at a fixed clock frequency, which is determined by taking into account the worst-case timing paths, operating conditions, and process variations. Timing speculation based reliable overclocking advocates going beyond worst-case limits to achieve best performance while not avoiding, but detecting and correcting a modest number of timing errors. The success of this design methodology relies on the fact that timing critical paths are rarely exercised in a design, and typical execution happens much faster than the timing requirements dictated by worst-case design methodology. Better-than-worst-case design methodology is advocated by several recent research pursuits, which exploit dependability techniques to enhance computer system performance. In this dissertation, we address different aspects of timing speculation based adaptive reliable overclocking schemes, and evaluate their role in the design of low-cost, high performance, energy efficient and dependable systems. We visualize various control knobs in the design that can be favorably controlled to ensure different design targets. As part of this research, we extend the SPRIT3E, or Superscalar PeRformance Improvement Through Tolerating Timing Errors, framework, and characterize the extent of application dependent performance acceleration achievable in superscalar processors by scrutinizing the various parameters that impact the operation beyond worst-case limits. We study the limitations imposed by short-path constraints on our technique, and present ways to exploit them to maximize performance gains. We analyze the sensitivity of our technique\u27s adaptiveness by exploring the necessary hardware requirements for dynamic overclocking schemes. Experimental analysis based on SPEC2000 benchmarks running on a SimpleScalar Alpha processor simulator, augmented with error rate data obtained from hardware simulations of a superscalar processor, are presented. Even though reliable overclocking guarantees functional correctness, it leads to higher power consumption. As a consequence, reliable overclocking without considering on-chip temperatures will bring down the lifetime reliability of the chip. In this thesis, we analyze how reliable overclocking impacts the on-chip temperature of a microprocessor and evaluate the effects of overheating, due to such reliable dynamic frequency tuning mechanisms, on the lifetime reliability of these systems. We then evaluate the effect of performing thermal throttling, a technique that clamps the on-chip temperature below a predefined value, on system performance and reliability. Our study shows that a reliably overclocked system with dynamic thermal management achieves 25% performance improvement, while lasting for 14 years when being operated within 353K. Over the past five decades, technology scaling, as predicted by Moore\u27s law, has been the bedrock of semiconductor technology evolution. The continued downscaling of CMOS technology to deep sub-micron gate lengths has been the primary reason for its dominance in today\u27s omnipresent silicon microchips. Even as the transition to the next technology node is indispensable, the initial cost and time associated in doing so presents a non-level playing field for the competitors in the semiconductor business. As part of this thesis, we evaluate the capability of speculative reliable overclocking mechanisms to maximize performance at a given technology level. We evaluate its competitiveness when compared to technology scaling, in terms of performance, power consumption, energy and energy delay product. We present a comprehensive comparison for integer and floating point SPEC2000 benchmarks running on a simulated Alpha processor at three different technology nodes in normal and enhanced modes. Our results suggest that adopting reliable overclocking strategies will help skip a technology node altogether, or be competitive in the market, while porting to the next technology node. Reliability has become a serious concern as systems embrace nanometer technologies. In this dissertation, we propose a novel fault tolerant aggressive system that combines soft error protection and timing error tolerance. We replicate both the pipeline registers and the pipeline stage combinational logic. The replicated logic receives its inputs from the primary pipeline registers while writing its output to the replicated pipeline registers. The organization of redundancy in the proposed Conjoined Pipeline system supports overclocking, provides concurrent error detection and recovery capability for soft errors, intermittent faults and timing errors, and flags permanent silicon defects. The fast recovery process requires no checkpointing and takes three cycles. Back annotated post-layout gate-level timing simulations, using 45nm technology, of a conjoined two-stage arithmetic pipeline and a conjoined five-stage DLX pipeline processor, with forwarding logic, show that our approach, even under a severe fault injection campaign, achieves near 100% fault coverage and an average performance improvement of about 20%, when dynamically overclocked

    Challenges and solutions for large-scale integration of emerging technologies

    Get PDF
    Title from PDF of title page viewed June 15, 2021Dissertation advisor: Mostafizur RahmanVitaIncludes bibliographical references (pages 67-88)Thesis (Ph.D.)--School of Computing and Engineering and Department of Physics and Astronomy. University of Missouri--Kansas City, 2021The semiconductor revolution so far has been primarily driven by the ability to shrink devices and interconnects proportionally (Moore's law) while achieving incremental benefits. In sub-10nm nodes, device scaling reaches its fundamental limits, and the interconnect bottleneck is dominating power and performance. As the traditional way of CMOS scaling comes to an end, it is essential to find an alternative to continue this progress. However, an alternative technology for general-purpose computing remains elusive; currently pursued research directions face adoption challenges in all aspects from materials, devices to architecture, thermal management, integration, and manufacturing. Crosstalk Computing, a novel emerging computing technique, addresses some of the challenges and proposes a new paradigm for circuit design, scaling, and security. However, like other emerging technologies, Crosstalk Computing also faces challenges like designing large-scale circuits using existing CAD tools, scalability, evaluation and benchmarking of large-scale designs, experimentation through commercial foundry processes to compete/co-exist with CMOS for digital logic implementations. This dissertation addresses these issues by providing a methodology for circuit synthesis customizing the existing EDA tool flow, evaluating and benchmarking against state-of-the-art CMOS for large-scale circuits designed at 7nm from MCNC benchmark suits. This research also presents a study on Crosstalk technology's scalability aspects and shows how the circuits' properties evolve from 180nm to 7nm technology nodes. Some significant results are for primitive Crosstalk gate, designed in 180nm, 65nm, 32nm, and 7nm technology nodes, the average reduction in power is 42.5%, and an average improvement in performance is 34.5% comparing to CMOS for all mentioned nodes. For benchmarking large-scale circuits designed at 7nm, there are 48%, 57%, and 10% improvements against CMOS designs in terms of density, power, and performance, respectively. An experimental demonstration of a proof-of-concept prototype chip for Crosstalk Computing at TSMC 65nm technology is also presented in this dissertation, showing the Crosstalk gates can be realized using the existing manufacturing process. Additionally, the dissertation also provides a fine-grained thermal management approach for emerging technologies like transistor-level 3-D integration (Monolithic 3-D, Skybridge, SN3D), which holds the most promise beyond 2-D CMOS technology. However, such 3-D architectures within small form factors increase hotspots and demand careful consideration of thermal management at all integration levels. This research proposes a new direction for fine-grained thermal management approach for transistor-level 3-D integrated circuits through the insertion of architected heat extraction features that can be part of circuit design, and an integrated methodology for thermal evaluation of 3-D circuits combining different simulation outcomes at advanced nodes, which can be integrated to traditional CAD flow. The results show that the proposed heat extraction features effectively reduce the temperature from a heated location. Thus, the dissertation provides a new perspective to overcome the challenges faced by emerging technologies where the device, circuit, connectivity, heat management, and manufacturing are addressed in an integrated manner.Introduction and motivation -- Cross talk computing overview -- Logic simplification approach for Crosstalk circuit design -- Crostalk computing scalability study: from 180 nm to 7 nm -- Designing large*scale circuits in Crosstalk at 7 nm -- Comparison and benchmarking -- Experimental demonstration of Crosstalk computing -- Thermal management challenges and mitigation techniques for transistor-level- 3D integratio
    corecore