14 research outputs found

    DESIGN AND IMPLEMENTATION OF ASYNCHRONOUS FIR FILTER

    Get PDF
    This paper presents the architecture of a micropipeline asynchronous digital signal processing chain coupled to non-uniformly sampled data in time. Non-uniform sampling has been proven to be a better scheme than the uniform sampling to sample low activity signals. With such signals, it generates fewer samples, which means less data to process and lower power consumption. In addition, it is well-known that asynchronous logic is a low power technology. We focus on a Finite Impulse Response filter (FIR) applied to this non-uniform sampled signal obtained from an asynchronous analog to digital converter (A-ADC). The FIR filter blocks are implemented using verilog code

    Asynchronous Data Processing Platforms for Energy Efficiency, Performance, and Scalability

    Get PDF
    The global technology revolution is changing the integrated circuit industry from the one driven by performance to the one driven by energy, scalability and more-balanced design goals. Without clock-related issues, asynchronous circuits enable further design tradeoffs and in operation adaptive adjustments for energy efficiency. This dissertation work presents the design methodology of the asynchronous circuit using NULL Convention Logic (NCL) and multi-threshold CMOS techniques for energy efficiency and throughput optimization in digital signal processing circuits. Parallel homogeneous and heterogeneous platforms implementing adaptive dynamic voltage scaling (DVS) based on the observation of system fullness and workload prediction are developed for balanced control of the performance and energy efficiency. Datapath control logic with NULL Cycle Reduction (NCR) and arbitration network are incorporated in the heterogeneous platform for large scale cascading. The platforms have been integrated with the data processing units using the IBM 130 nm 8RF process and fabricated using the MITLL 90 nm FDSOI process. Simulation and physical testing results show the energy efficiency advantage of asynchronous designs and the effective of the adaptive DVS mechanism in balancing the energy and performance in both platforms

    Non-invasive power gating techniques for bursty computation workloads using micro-electro-mechanical relays

    Get PDF
    PhD ThesisElectrostatically-actuated Micro-Electro-Mechanical/Nano-Electro- Mechanical (MEM/NEM) relays are promising devices overcoming the energy-efficiency limitations of CMOS transistors. Many exploratory research projects are currently under way investigating the mechanical, electrical and logical characteristics of MEM/NEM relays. One particular issue that this work addresses is the need for a scalable and accurate physical model of the MEM/NEM switches that can be plugged into the standard EDA software. The existing models are accurate and detailed but they suffer from the convergence problem. This problem requires finding ad-hoc workarounds and significantly impacts the designer’s productivity. In this thesis we propose a new simplified Verilog-AMS model. To test scalability of the proposed model we cross-checked it against our analysis of a range of benchmark circuits. Results show that, compared to standard models, the proposed model is sufficiently accurate with an average of 6% error and can handle larger designs without divergence. This thesis also investigates the modelling, designing and optimization of various MEM/NEM switches using 3D Finite Element Analysis (FEA) performed by the COMSOL multiphysics simulation tool. An extensive parametric sweep simulation is performed to study the energy-latency trade-offs of MEM/NEM relays. To accurately simulate MEMS/NEMS-based digital circuits, a Verilog-AMS model is proposed based on the evaluated parameters obtained from the multiphysics simulation tool. This allows an accurate calibration of the MEM/NEM relays with a significant reduction in simulation speed compared to that of 3D FEA exercised on COMSOL tool. The effectiveness of two power gating approaches in asynchronous micropipelines is also investigated using MEM/NEM switches and sleep transistors in reducing idle power dissipation with a particular target throughput. Sleep transistors are traditionally used to power gate idle circuits, however, these transistors have fundamental limitations in their effectiveness. Alternatively, MEM/NEM relays with zero leakage current can achieve greater energy savings under a certain data rate and design architecture. An asynchronous FIR filter 4 phase bundled data handshake protocol is presented. Implementation is accomplished in 90nm technology node and simulation exercised at various data rates and design complexities. It was demonstrated that our proposed approach offers 69% energy improvements at a data rate 1KHz compared to 39% of the previous work. The current trends for greater heterogeneity in future Systems-on- Chip (SoC) do not only concern their functionality but also their timing and power aspects. The increasing diversity of timing and power supply conditions, and associated concurrently operating modes, within an SoC calls for more efficient power delivery networks (PDN) for battery operated devices. This is especially important for systems with mixed duty cycling, where some parts are required to work regularly with low-throughput while other parts are activated spontaneously, i.e. in bursts. To improve their reaction time vs energy efficiency, this work proposes to incorporate a power-switching network based on MEM relays to switch the SoC power-performance state (PPS) into an active mode while eliminating the leakage current when it is idle. Results show that even with today0s large and high pull-in voltages, a MEM-relay-based power switching network (PSN) can achieve a 1000x savings in energy compared to its CMOS counterpart for low duty cycle. A simple case of optimising an on-chip charge pump required to switch-on the relay has been investigated and its energy-latency overhead has been evaluated. Heterogeneous many-core systems are increasingly being employed in modern embedded platforms for high throughput at low energy cost considerations. These applications typically exhibit bursty workloads that provide opportunities to minimize system energy. CMOS-based power gating circuitry, typically consisting of sleep transistors, is used as an effective technique for idle energy reduction in such applications. However, these transistors contribute high leakage current when driving large capacitive loads, making effective energy minimization challenging. This thesis proposes a novel MEMS-based idle energy control approach. Core to this approach is an integrated sleep mode management based on the performance-energy states and bursty workloads indicated by the performance counters. A number of PARSEC benchmark applications are used as case studies of bursty workloads, including CPU- and memory- intensive ones. These applications are exercised on an Exynos 5422 heterogeneous many-core platform, engineered with a performance counter facilities, showing 55.5% energy savings compared with an on-demand governor. Furthermore, an extensive trade-off analysis demonstrates the comparative advantages of the MEMS-based controller, including zero-leakage current and non-invasive implementations suitable for commercial off-the-shelf systems.Higher committee of education development in Iraq (HCED

    Design of application-specific instruction set processors with asynchronous methodology for embedded digital signal processing applications.

    Get PDF
    Kwok Yan-lun Andy.Thesis submitted in: November 2004.Thesis (M.Phil.)--Chinese University of Hong Kong, 2005.Includes bibliographical references (leaves 133-137).Abstracts in English and Chinese.Abstract --- p.i摘要 --- p.iiAcknowledgements --- p.iiiList of Figures --- p.viiList of Tables and Examples --- p.xChapter 1. --- Introduction --- p.1Chapter 1.1. --- Motivation --- p.1Chapter 1.2. --- Objective and Approach --- p.4Chapter 1.3. --- Thesis Organization --- p.5Chapter 2. --- Related Work --- p.7Chapter 2.1. --- Coverage --- p.7Chapter 2.2. --- ASIP Design Methodologies --- p.8Chapter 2.3. --- Asynchronous Technology on Processors --- p.12Chapter 2.4. --- Summary --- p.14Chapter 3. --- Asynchronous Design Methodology --- p.15Chapter 3.1. --- Overview --- p.15Chapter 3.2. --- Asynchronous Design Style --- p.17Chapter 3.2.1. --- Micropipelines --- p.17Chapter 3.2.2. --- Fine-grain Pipelining --- p.20Chapter 3.2.3. --- Globally-Asynchronous Locally-Synchronous (GALS) Design --- p.22Chapter 3.3. --- Advantages of GALS in ASIP Design --- p.27Chapter 3.3.1. --- Reuse of Synchronous and Asynchronous IP --- p.27Chapter 3.3.2. --- Fine Tuning of Performance and Power Consumption --- p.27Chapter 3.3.3. --- Synthesis-based Design Flow --- p.28Chapter 3.4. --- Design of GALS Asynchronous Wrapper --- p.28Chapter 3.4.1. --- Handshake Protocol --- p.28Chapter 3.4.2. --- Pausible Clock Generator --- p.29Chapter 3.4.3. --- Port Controllers --- p.30Chapter 3.4.4. --- Performance of the Asynchronous Wrapper --- p.33Chapter 3.5. --- Summary --- p.35Chapter 4. --- Platform Based ASIP Design Methodology --- p.36Chapter 4.1. --- Platform Based Approach --- p.36Chapter 4.1.1. --- The Definition of Our Platform --- p.37Chapter 4.1.2. --- The Definition of the Platform Based Design --- p.37Chapter 4.2. --- Platform Architecture --- p.38Chapter 4.2.1. --- The Nature of DSP Algorithms --- p.38Chapter 4.2.2. --- Design Space of Datapath Optimization --- p.46Chapter 4.2.3. --- Proposed Architecture --- p.49Chapter 4.2.4. --- The Strategy of Realizing an Optimized Datapath --- p.51Chapter 4.2.5. --- Pipeline Organization --- p.59Chapter 4.2.6. --- GALS Partitioning --- p.61Chapter 4.2.7. --- Operation Mechanism --- p.63Chapter 4.3. --- Overall Design Flow --- p.67Chapter 4.4. --- Summary --- p.70Chapter 5. --- Design of the ASIP Platform --- p.72Chapter 5.1. --- Design Goal --- p.72Chapter 5.2. --- Instruction Fetch --- p.74Chapter 5.2.1. --- Instruction fetch unit --- p.74Chapter 5.2.2. --- Zero-overhead loops and Subroutines --- p.75Chapter 5.3. --- Instruction Decode --- p.77Chapter 5.3.1. --- Instruction decoder --- p.77Chapter 5.3.2. --- The Encoding of Parallel and Complex Instructions --- p.80Chapter 5.4. --- Datapath --- p.81Chapter 5.4.1. --- Base Functional Units --- p.81Chapter 5.4.2. --- Functional Unit Wrapper Interface --- p.83Chapter 5.5. --- Register File Systems --- p.84Chapter 5.5.1. --- Memory Hierarchy --- p.84Chapter 5.5.2. --- Register File Organization --- p.85Chapter 5.5.3. --- Address Generation --- p.93Chapter 5.5.4. --- Load and Store --- p.98Chapter 5.6. --- Design Verification --- p.100Chapter 5.7. --- Summary --- p.104Chapter 6. --- Case Studies --- p.105Chapter 6.1. --- Objective --- p.105Chapter 6.2. --- Approach --- p.105Chapter 6.3. --- Based versus Optimized --- p.106Chapter 6.3.1. --- Matrix Manipulation --- p.106Chapter 6.3.2. --- Autocorrelation --- p.109Chapter 6.3.3. --- CORDIC --- p.110Chapter 6.4. --- Optimized versus Advanced Commercial DSPs --- p.113Chapter 6.4.1. --- Introduction to TMS320C62x and SC140 --- p.113Chapter 6.4.2. --- Results --- p.115Chapter 6.5. --- Summary --- p.116Chapter 7. --- Conclusion --- p.118Chapter 7.1. --- When ASIPs encounter asynchronous --- p.118Chapter 7.2. --- Contributions --- p.120Chapter 7.3. --- Future Directions --- p.121Chapter A --- Synthesis of Extended Burst-Mode Asynchronous Finite State Machine --- p.122Chapter B --- Base Instruction Set --- p.124Chapter C --- Special Registers --- p.127Chapter D --- Synthesizable Model of GALS Wrapper --- p.130Reference --- p.13

    Scalable Energy-Recovery Architectures.

    Full text link
    Energy efficiency is a critical challenge for today's integrated circuits, especially for high-end digital signal processing and communications that require both high throughput and low energy dissipation for extended battery life. Charge-recovery logic recovers and reuses charge using inductive elements and has the potential to achieve order-of-magnitude improvement in energy efficiency while maintaining high performance. However, the lack of large-scale high-speed silicon demonstrations and inductor area overheads are two major concerns. This dissertation focuses on scalable charge-recovery designs. We present a semi-automated design flow to enable the design of large-scale charge-recovery chips. We also present a new architecture that uses in-package inductors, eliminating the area overheads caused by the use of integrated inductors in high-performance charge-recovery chips. To demonstrate our semi-automated flow, which uses custom-designed standard-cell-like dynamic cells, we have designed a 576-bit charge-recovery low-density parity-check (LDPC) decoder chip. Functioning correctly at clock speeds above 1 GHz, this prototype is the first-ever demonstration of a GHz-speed charge-recovery chip of significant complexity. In terms of energy consumption, this chip improves over recent state-of-the-art LDPCs by at least 1.3 times with comparable or better area efficiency. To demonstrate our architecture for eliminating inductor overheads, we have designed a charge-recovery LDPC decoder chip with in-package inductors. This test-chip has been fabricated in a 65nm CMOS flip-chip process. A custom 6-layer FC-BGA package substrate has been designed with 16 inductors embedded in the fifth layer of the package substrate, yielding higher Q and significantly improving area efficiency and energy efficiency compared to their on-chip counterparts. From measurements, this chip achieves at least 2.3 times lower energy consumption with better area efficiency over state-of-the-art published designs.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116653/1/terryou_1.pd

    Energy autonomous systems : future trends in devices, technology, and systems

    Get PDF
    The rapid evolution of electronic devices since the beginning of the nanoelectronics era has brought about exceptional computational power in an ever shrinking system footprint. This has enabled among others the wealth of nomadic battery powered wireless systems (smart phones, mp3 players, GPS, …) that society currently enjoys. Emerging integration technologies enabling even smaller volumes and the associated increased functional density may bring about a new revolution in systems targeting wearable healthcare, wellness, lifestyle and industrial monitoring applications

    The Fifth NASA Symposium on VLSI Design

    Get PDF
    The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design

    Circuit design for logic automata

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (p. 143-148).The Logic Automata model is a universal distributed computing structure which pushes parallelism to the bit-level extreme. This new model drastically differs from conventional computer architectures in that it exposes, rather than hides, the physics underlying the computation by accommodating data processing and storage in a local and distributed manner. Based on Logic Automata, highly scalable computing structures for digital and analog processing have been developed; and they are verified at the transistor level in this thesis. The Asynchronous Logic Automata (ALA) model is derived by adding the temporal locality, i.e., the asynchrony in data exchanges, in addition to the spacial locality of the Logic Automata model. As a demonstration of this incrementally extensible, clockless structure, we designed an ALA cell library in 90 nm CMOS technology and established a "pick-and-place" design flow for fast ALA circuit layout. The work flow gracefully aligns the description of computer programs and circuit realizations, providing a simpler and more scalable solution for Application Specific Integrated Circuit (ASIC) designs, which are currently limited by global constraints such as the clock and long interconnects. The potential of the ALA circuit design flow is tested with example applications for mathematical operations. The same Logic Automata model can also be augmented by relaxing the digital states into analog ones for interesting analog computations. The Analog Logic Automata (AnLA) model is a merge of the Analog Logic principle and the Logic Automata architecture, in which efficient processing is embedded onto a scalable construction.(cont.) In order to study the unique property of this mixed-signal computing structure, we designed and fabricated an AnLA test chip in AMI 0.5[mu]m CMOS technology. Chip tests of an AnLA Noise-Locked Loop (NLL) circuit as well as application tests of AnLA image processing and Error-Correcting Code (ECC) decoding, show large potential of the AnLA structure.by Kailiang Chen.S.M
    corecore