32 research outputs found
Advances in parallel programming for electronic design automation
The continued miniaturization of the technology node increases not only the chip capacity but also the circuit design complexity. How does one efficiently design a chip with millions or billions transistors? This has become a challenging problem in the integrated circuit (IC) design industry, especially for the developers of electronic design automation (EDA) tools. To boost the performance of EDA tools, one promising direction is via parallel computing. In this dissertation, we explore different parallel computing approaches, from CPU to GPU to distributed computing, for EDA applications.
Nowadays multi-core processors are prevalent from mobile devices to laptops to desktop, and it is natural for software developers to utilize the available cores to maximize the performance of their applications. Therefore, in this dissertation we first focus on multi-threaded programming. We begin by reviewing a C++ parallel programming library called Cpp-Taskflow. Cpp-Taskflow is designed to facilitate programming parallel applications, and has been successfully applied to an EDA timing analysis tool. We will demonstrate Cpp-Taskflow’s programming model and interface, software architecture and execution flow. Then, we improve Cpp-Taskflow in several aspects. First, we enhance Cpp-Taskflow’s usability through restructuring the software architecture. Second, we introduce task graph composition to support composability and modularity, which makes it easier for users to construct large and complex parallel patterns. Third, we add a new task type in Cpp-Taskflow to let users control the graph execution flow. This feature empowers the graph model with the ability to describe complex control flow. Aside from the above enhancements, we have designed a new scheduler to adaptively manage the threads based on available parallelism. The new scheduler uses a simple and effective strategy which can not only prevent resource from being underutilized, but also mitigate resource over-subscription. We have evaluated the new scheduler on both micro-benchmarks and a very-large-scale integration (VLSI) application, and the results show that the new scheduler can achieve good performance and is very energy-efficient.
Next we study the applicability of heterogeneous computing, specifically the graphics processing unit (GPU), to EDA. We demonstrate how to use GPU to accelerate VLSI placement, and we show that GPU can bring substantial performance gain to VLSI placement. Finally, as the design size keeps increasing, a more scalable solution will be distributed computing. We introduce a distributed power grid analysis framework built on top of DtCraft. This framework allows users to flexibly partition the design and automatically deploy the computations across several machines. In addition, we propose a job scheduler that can efficiently utilize cluster resource to improve the framework’s performance
Dependable Embedded Systems
This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems
Energy Efficient Spiking Neuromorphic Architectures for Pattern Recognition
There is a growing concern over reliability, power consumption, and performance of traditional Von Neumann machines, especially when dealing with complex tasks like pattern recognition. In contrast, the human brain can address such problems with great ease. Brain-inspired neuromorphic computing has attracted much research interest, as it provides an appealing architectural solution to difficult tasks due to its energy efficiency, built-in parallelism, and potential scalability. Meanwhile, the inherent error resilience in neuro-computing allows promising opportunities for leveraging approximate computing for additional energy and silicon area benefits. This thesis focuses on energy efficient neuromorphic architectures which exploit parallel processing and approximate computing for pattern recognition.
Firstly, two parallel spiking neural architectures are presented. The first architecture is based on spiking neural network with global inhibition (SNNGI), which integrates digital leaky integrate-and-fire spiking neurons to mimic their biological counterparts and the corresponding on-chip learning circuits for implementing the spiking timing dependent plasticity rules. In order to achieve efficient parallelization, this work addresses a number of critical issues pertaining to memory organization, parallel processing, hardware reuse for different operating modes, as well as the tradeoffs between throughput, area, and power overheads for different configurations. For the application of handwritten digit recognition, a promising training speedup of 13.5x and a recognition speedup of 25.8x over the serial SNNGI architecture are achieved. In spite of the 120MHz operating frequency, the 32-way parallel hardware design demonstrates a 59.4x training speedup over a 2.2GHz general-purpose CPU. Besides the SNNGI, we also propose another architecture based on the liquid state machine (LSM), a recurrent spiking neural network. The LSM architecture is fully parallelized and consists of randomly connected digital neurons in a reservoir and a readout stage, the latter of which is tuned by a bio-inspired learning rule. When evaluated using the TI46 speech benchmark, the FPGA LSM system demonstrates a runtime speedup of 88x over a 2.3GHz AMD CPU.
In addition, approximate computing contributes significantly to the overall energy reduction of the proposed architectures. In particular, addition computations occupy a considerable portion of power and area in the neuromorphic systems, especially in the LSM. By exploiting the built-in resilience of neuro-computing, we propose a real-time reconfigurable approximate adder for FPGA implementation to reduce the energy consumption substantially. Although there exist many mature approximate adders, these designs lose their advantages in terms of area, power, and delay on the FPGA platform. Therefore, a novel approximate adder dedicated to the FPGA is necessary. The proposed adder is based on a carry skip model which reduces carry propagation delay and power, and the resulting errors are controlled by a proposed error analysis method. Also, a real-time adjustable precision mechanism is integrated to further reduce dynamic power consumption. Implemented on the Virtex-6 FPGA, it is shown that the proposed adder consumes 18.7% and 32.6% less power than the built-in Xilinx adder in two precision modes, respectively, and that the approximate adder in both modes is 1.32x faster and requires fewer FPGA resources. Besides the adders, the firing-activity based power gating for silent neurons and booth approximate multipliers are also introduced. These three proposed schemes have been applied to our neuromorphic systems. The approximate errors incurred by these schemes have been shown to be negligible, but energy reductions of up to 20% and 30.1% over the exact training computation are achieved for the SNNGI and LSM systems, respectively
Belle II Technical Design Report
The Belle detector at the KEKB electron-positron collider has collected
almost 1 billion Y(4S) events in its decade of operation. Super-KEKB, an
upgrade of KEKB is under construction, to increase the luminosity by two orders
of magnitude during a three-year shutdown, with an ultimate goal of 8E35 /cm^2
/s luminosity. To exploit the increased luminosity, an upgrade of the Belle
detector has been proposed. A new international collaboration Belle-II, is
being formed. The Technical Design Report presents physics motivation, basic
methods of the accelerator upgrade, as well as key improvements of the
detector.Comment: Edited by: Z. Dole\v{z}al and S. Un
Interim research assessment 2003-2005 - Computer Science
This report primarily serves as a source of information for the 2007 Interim Research Assessment Committee for Computer Science at the three technical universities in the Netherlands. The report also provides information for others interested in our research activities
On-Chip Analog Circuit Design Using Built-In Self-Test and an Integrated Multi-Dimensional Optimization Platform
Nowadays, the rapid development of system-on-chip (SoC) market introduces
tremendous complexity into the integrated circuit (IC) design. Meanwhile, the IC
fabrication process is scaling down to allow higher density of integration but makes
the chips more sensitive to the process-voltage-temperature (PVT) variations. A
successful IC product not only imposes great pressure on the IC designers, who have
to handle wider variations and enforce more design margins, but also challenges the
test procedure, leading to more check points and longer test time. To relax the
designers’ burden and reduce the cost of testing, it is valuable to make the IC chips
able to test and tune itself to some extent.
In this dissertation, a fully integrated in-situ design validation and optimization
(VO) hardware for analog circuits is proposed. It implements in-situ built-in self-test
(BIST) techniques for analog circuits. Based on the data collected from BIST,
the error between the measured and the desired performance of the target circuit is
evaluated using a cost function. A digital multi-dimensional optimization engine is
implemented to adaptively adjust the analog circuit parameters, seeking the minimum
value of the cost function and achieving the desired performance. To verify
this concept, study cases of a 2nd/4th active-RC band-pass filter (BPF) and a 2nd
order Gm-C BPF, as well as all BIST and optimization blocks, are adopted on-chip.
Apart from the VO system, several improved BIST techniques are also proposed
in this dissertation. A single-tone sinusoidal waveform generator based on a finite-impulse-response (FIR) architecture, which utilizes an optimization algorithm to
enhance its spur free dynamic range (SFDR), is proposed. It achieves an SFDR of
59 to 70 dBc from 150 to 850 MHz after the optimization procedure. A low-distortion
current-steering two-tone sinusoidal signal synthesizer based on a mixing-FIR architecture is also proposed. The two-tone synthesizer extends the FIR architecture to
two stages and implements an up-conversion mixer to generate the two tones, achieving better than -68 dBc IM3 below 480 MHz LO frequency without calibration.
Moreover, an on-chip RF receiver linearity BIST methodology for continuous and
discrete-time hybrid baseband chain is proposed. The proposed receiver chain
implements a charge-domain FIR filter to notch the two excitation signals but expose
the third order intermodulation (IM3) tones. It simplifies the linearity measurement
procedure–using a power detector is enough to analyze the receiver’s linearity.
Finally, a low cost fully digital built-in analog tester for linear-time-invariant
(LTI) analog blocks is proposed. It adopts a time-to-digital converter (TDC) to
measure the delays corresponded to a ramp excitation signal and is able to estimate
the pole or zero locations of a low-pass LTI system
Flow-Based Optimization of Products or Devices
Flow-based optimization of products and devices is an immature field compared to the corresponding topology optimization based on solid mechanics. However, it is an essential part of component development with both internal and/or external flow. The aim of this book is two-fold: (i) to provide state-of-the-art examples of flow-based optimization and (ii) to present a review of topology optimization for fluid-based problems