32 research outputs found

    Advances in parallel programming for electronic design automation

    Get PDF
    The continued miniaturization of the technology node increases not only the chip capacity but also the circuit design complexity. How does one efficiently design a chip with millions or billions transistors? This has become a challenging problem in the integrated circuit (IC) design industry, especially for the developers of electronic design automation (EDA) tools. To boost the performance of EDA tools, one promising direction is via parallel computing. In this dissertation, we explore different parallel computing approaches, from CPU to GPU to distributed computing, for EDA applications. Nowadays multi-core processors are prevalent from mobile devices to laptops to desktop, and it is natural for software developers to utilize the available cores to maximize the performance of their applications. Therefore, in this dissertation we first focus on multi-threaded programming. We begin by reviewing a C++ parallel programming library called Cpp-Taskflow. Cpp-Taskflow is designed to facilitate programming parallel applications, and has been successfully applied to an EDA timing analysis tool. We will demonstrate Cpp-Taskflow’s programming model and interface, software architecture and execution flow. Then, we improve Cpp-Taskflow in several aspects. First, we enhance Cpp-Taskflow’s usability through restructuring the software architecture. Second, we introduce task graph composition to support composability and modularity, which makes it easier for users to construct large and complex parallel patterns. Third, we add a new task type in Cpp-Taskflow to let users control the graph execution flow. This feature empowers the graph model with the ability to describe complex control flow. Aside from the above enhancements, we have designed a new scheduler to adaptively manage the threads based on available parallelism. The new scheduler uses a simple and effective strategy which can not only prevent resource from being underutilized, but also mitigate resource over-subscription. We have evaluated the new scheduler on both micro-benchmarks and a very-large-scale integration (VLSI) application, and the results show that the new scheduler can achieve good performance and is very energy-efficient. Next we study the applicability of heterogeneous computing, specifically the graphics processing unit (GPU), to EDA. We demonstrate how to use GPU to accelerate VLSI placement, and we show that GPU can bring substantial performance gain to VLSI placement. Finally, as the design size keeps increasing, a more scalable solution will be distributed computing. We introduce a distributed power grid analysis framework built on top of DtCraft. This framework allows users to flexibly partition the design and automatically deploy the computations across several machines. In addition, we propose a job scheduler that can efficiently utilize cluster resource to improve the framework’s performance

    Dependable Embedded Systems

    Get PDF
    This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems

    Energy Efficient Spiking Neuromorphic Architectures for Pattern Recognition

    Get PDF
    There is a growing concern over reliability, power consumption, and performance of traditional Von Neumann machines, especially when dealing with complex tasks like pattern recognition. In contrast, the human brain can address such problems with great ease. Brain-inspired neuromorphic computing has attracted much research interest, as it provides an appealing architectural solution to difficult tasks due to its energy efficiency, built-in parallelism, and potential scalability. Meanwhile, the inherent error resilience in neuro-computing allows promising opportunities for leveraging approximate computing for additional energy and silicon area benefits. This thesis focuses on energy efficient neuromorphic architectures which exploit parallel processing and approximate computing for pattern recognition. Firstly, two parallel spiking neural architectures are presented. The first architecture is based on spiking neural network with global inhibition (SNNGI), which integrates digital leaky integrate-and-fire spiking neurons to mimic their biological counterparts and the corresponding on-chip learning circuits for implementing the spiking timing dependent plasticity rules. In order to achieve efficient parallelization, this work addresses a number of critical issues pertaining to memory organization, parallel processing, hardware reuse for different operating modes, as well as the tradeoffs between throughput, area, and power overheads for different configurations. For the application of handwritten digit recognition, a promising training speedup of 13.5x and a recognition speedup of 25.8x over the serial SNNGI architecture are achieved. In spite of the 120MHz operating frequency, the 32-way parallel hardware design demonstrates a 59.4x training speedup over a 2.2GHz general-purpose CPU. Besides the SNNGI, we also propose another architecture based on the liquid state machine (LSM), a recurrent spiking neural network. The LSM architecture is fully parallelized and consists of randomly connected digital neurons in a reservoir and a readout stage, the latter of which is tuned by a bio-inspired learning rule. When evaluated using the TI46 speech benchmark, the FPGA LSM system demonstrates a runtime speedup of 88x over a 2.3GHz AMD CPU. In addition, approximate computing contributes significantly to the overall energy reduction of the proposed architectures. In particular, addition computations occupy a considerable portion of power and area in the neuromorphic systems, especially in the LSM. By exploiting the built-in resilience of neuro-computing, we propose a real-time reconfigurable approximate adder for FPGA implementation to reduce the energy consumption substantially. Although there exist many mature approximate adders, these designs lose their advantages in terms of area, power, and delay on the FPGA platform. Therefore, a novel approximate adder dedicated to the FPGA is necessary. The proposed adder is based on a carry skip model which reduces carry propagation delay and power, and the resulting errors are controlled by a proposed error analysis method. Also, a real-time adjustable precision mechanism is integrated to further reduce dynamic power consumption. Implemented on the Virtex-6 FPGA, it is shown that the proposed adder consumes 18.7% and 32.6% less power than the built-in Xilinx adder in two precision modes, respectively, and that the approximate adder in both modes is 1.32x faster and requires fewer FPGA resources. Besides the adders, the firing-activity based power gating for silent neurons and booth approximate multipliers are also introduced. These three proposed schemes have been applied to our neuromorphic systems. The approximate errors incurred by these schemes have been shown to be negligible, but energy reductions of up to 20% and 30.1% over the exact training computation are achieved for the SNNGI and LSM systems, respectively

    Belle II Technical Design Report

    Full text link
    The Belle detector at the KEKB electron-positron collider has collected almost 1 billion Y(4S) events in its decade of operation. Super-KEKB, an upgrade of KEKB is under construction, to increase the luminosity by two orders of magnitude during a three-year shutdown, with an ultimate goal of 8E35 /cm^2 /s luminosity. To exploit the increased luminosity, an upgrade of the Belle detector has been proposed. A new international collaboration Belle-II, is being formed. The Technical Design Report presents physics motivation, basic methods of the accelerator upgrade, as well as key improvements of the detector.Comment: Edited by: Z. Dole\v{z}al and S. Un

    Interim research assessment 2003-2005 - Computer Science

    Get PDF
    This report primarily serves as a source of information for the 2007 Interim Research Assessment Committee for Computer Science at the three technical universities in the Netherlands. The report also provides information for others interested in our research activities

    On-Chip Analog Circuit Design Using Built-In Self-Test and an Integrated Multi-Dimensional Optimization Platform

    Get PDF
    Nowadays, the rapid development of system-on-chip (SoC) market introduces tremendous complexity into the integrated circuit (IC) design. Meanwhile, the IC fabrication process is scaling down to allow higher density of integration but makes the chips more sensitive to the process-voltage-temperature (PVT) variations. A successful IC product not only imposes great pressure on the IC designers, who have to handle wider variations and enforce more design margins, but also challenges the test procedure, leading to more check points and longer test time. To relax the designers’ burden and reduce the cost of testing, it is valuable to make the IC chips able to test and tune itself to some extent. In this dissertation, a fully integrated in-situ design validation and optimization (VO) hardware for analog circuits is proposed. It implements in-situ built-in self-test (BIST) techniques for analog circuits. Based on the data collected from BIST, the error between the measured and the desired performance of the target circuit is evaluated using a cost function. A digital multi-dimensional optimization engine is implemented to adaptively adjust the analog circuit parameters, seeking the minimum value of the cost function and achieving the desired performance. To verify this concept, study cases of a 2nd/4th active-RC band-pass filter (BPF) and a 2nd order Gm-C BPF, as well as all BIST and optimization blocks, are adopted on-chip. Apart from the VO system, several improved BIST techniques are also proposed in this dissertation. A single-tone sinusoidal waveform generator based on a finite-impulse-response (FIR) architecture, which utilizes an optimization algorithm to enhance its spur free dynamic range (SFDR), is proposed. It achieves an SFDR of 59 to 70 dBc from 150 to 850 MHz after the optimization procedure. A low-distortion current-steering two-tone sinusoidal signal synthesizer based on a mixing-FIR architecture is also proposed. The two-tone synthesizer extends the FIR architecture to two stages and implements an up-conversion mixer to generate the two tones, achieving better than -68 dBc IM3 below 480 MHz LO frequency without calibration. Moreover, an on-chip RF receiver linearity BIST methodology for continuous and discrete-time hybrid baseband chain is proposed. The proposed receiver chain implements a charge-domain FIR filter to notch the two excitation signals but expose the third order intermodulation (IM3) tones. It simplifies the linearity measurement procedure–using a power detector is enough to analyze the receiver’s linearity. Finally, a low cost fully digital built-in analog tester for linear-time-invariant (LTI) analog blocks is proposed. It adopts a time-to-digital converter (TDC) to measure the delays corresponded to a ramp excitation signal and is able to estimate the pole or zero locations of a low-pass LTI system

    CBM Progress Report 2014

    Get PDF

    Flow-Based Optimization of Products or Devices

    Get PDF
    Flow-based optimization of products and devices is an immature field compared to the corresponding topology optimization based on solid mechanics. However, it is an essential part of component development with both internal and/or external flow. The aim of this book is two-fold: (i) to provide state-of-the-art examples of flow-based optimization and (ii) to present a review of topology optimization for fluid-based problems
    corecore