14 research outputs found
DESIGN AND IMPLEMENTATION OF ASYNCHRONOUS FIR FILTER
This paper presents the architecture of a micropipeline asynchronous digital signal processing chain coupled to non-uniformly sampled data in time. Non-uniform sampling has been proven to be a better scheme than the uniform sampling to sample low activity signals. With such signals, it generates fewer samples, which means less data to process and lower power consumption. In addition, it is well-known that asynchronous logic is a low power technology. We focus on a Finite Impulse Response filter (FIR) applied to this non-uniform sampled signal obtained from an asynchronous analog to digital converter (A-ADC). The FIR filter blocks are implemented using verilog code
Asynchronous Data Processing Platforms for Energy Efficiency, Performance, and Scalability
The global technology revolution is changing the integrated circuit industry from the one driven by performance to the one driven by energy, scalability and more-balanced design goals. Without clock-related issues, asynchronous circuits enable further design tradeoffs and in operation adaptive adjustments for energy efficiency. This dissertation work presents the design methodology of the asynchronous circuit using NULL Convention Logic (NCL) and multi-threshold CMOS techniques for energy efficiency and throughput optimization in digital signal processing circuits. Parallel homogeneous and heterogeneous platforms implementing adaptive dynamic voltage scaling (DVS) based on the observation of system fullness and workload prediction are developed for balanced control of the performance and energy efficiency. Datapath control logic with NULL Cycle Reduction (NCR) and arbitration network are incorporated in the heterogeneous platform for large scale cascading. The platforms have been integrated with the data processing units using the IBM 130 nm 8RF process and fabricated using the MITLL 90 nm FDSOI process. Simulation and physical testing results show the energy efficiency advantage of asynchronous designs and the effective of the adaptive DVS mechanism in balancing the energy and performance in both platforms
Non-invasive power gating techniques for bursty computation workloads using micro-electro-mechanical relays
PhD ThesisElectrostatically-actuated Micro-Electro-Mechanical/Nano-Electro- Mechanical
(MEM/NEM) relays are promising devices overcoming the
energy-efficiency limitations of CMOS transistors. Many exploratory
research projects are currently under way investigating the mechanical,
electrical and logical characteristics of MEM/NEM relays. One
particular issue that this work addresses is the need for a scalable
and accurate physical model of the MEM/NEM switches that can be
plugged into the standard EDA software.
The existing models are accurate and detailed but they suffer
from the convergence problem. This problem requires finding ad-hoc
workarounds and significantly impacts the designer’s productivity. In
this thesis we propose a new simplified Verilog-AMS model. To test
scalability of the proposed model we cross-checked it against our analysis
of a range of benchmark circuits. Results show that, compared to
standard models, the proposed model is sufficiently accurate with an
average of 6% error and can handle larger designs without divergence.
This thesis also investigates the modelling, designing and optimization
of various MEM/NEM switches using 3D Finite Element Analysis
(FEA) performed by the COMSOL multiphysics simulation tool. An
extensive parametric sweep simulation is performed to study the
energy-latency trade-offs of MEM/NEM relays. To accurately simulate
MEMS/NEMS-based digital circuits, a Verilog-AMS model is
proposed based on the evaluated parameters obtained from the multiphysics
simulation tool. This allows an accurate calibration of the
MEM/NEM relays with a significant reduction in simulation speed
compared to that of 3D FEA exercised on COMSOL tool.
The effectiveness of two power gating approaches in asynchronous
micropipelines is also investigated using MEM/NEM switches and
sleep transistors in reducing idle power dissipation with a particular
target throughput. Sleep transistors are traditionally used to power
gate idle circuits, however, these transistors have fundamental limitations
in their effectiveness. Alternatively, MEM/NEM relays with zero
leakage current can achieve greater energy savings under a certain
data rate and design architecture. An asynchronous FIR filter 4 phase
bundled data handshake protocol is presented. Implementation is
accomplished in 90nm technology node and simulation exercised at
various data rates and design complexities. It was demonstrated that
our proposed approach offers 69% energy improvements at a data rate
1KHz compared to 39% of the previous work.
The current trends for greater heterogeneity in future Systems-on-
Chip (SoC) do not only concern their functionality but also their timing and power aspects. The increasing diversity of timing and power supply
conditions, and associated concurrently operating modes, within
an SoC calls for more efficient power delivery networks (PDN) for
battery operated devices. This is especially important for systems with
mixed duty cycling, where some parts are required to work regularly
with low-throughput while other parts are activated spontaneously,
i.e. in bursts. To improve their reaction time vs energy efficiency, this
work proposes to incorporate a power-switching network based on
MEM relays to switch the SoC power-performance state (PPS) into
an active mode while eliminating the leakage current when it is idle.
Results show that even with today0s large and high pull-in voltages, a
MEM-relay-based power switching network (PSN) can achieve a 1000x
savings in energy compared to its CMOS counterpart for low duty
cycle. A simple case of optimising an on-chip charge pump required
to switch-on the relay has been investigated and its energy-latency
overhead has been evaluated.
Heterogeneous many-core systems are increasingly being employed
in modern embedded platforms for high throughput at low energy cost
considerations. These applications typically exhibit bursty workloads
that provide opportunities to minimize system energy. CMOS-based
power gating circuitry, typically consisting of sleep transistors, is used
as an effective technique for idle energy reduction in such applications.
However, these transistors contribute high leakage current when
driving large capacitive loads, making effective energy minimization
challenging.
This thesis proposes a novel MEMS-based idle energy control approach.
Core to this approach is an integrated sleep mode management
based on the performance-energy states and bursty workloads
indicated by the performance counters. A number of PARSEC benchmark
applications are used as case studies of bursty workloads, including
CPU- and memory- intensive ones. These applications are
exercised on an Exynos 5422 heterogeneous many-core platform, engineered
with a performance counter facilities, showing 55.5% energy
savings compared with an on-demand governor. Furthermore, an extensive
trade-off analysis demonstrates the comparative advantages
of the MEMS-based controller, including zero-leakage current and
non-invasive implementations suitable for commercial off-the-shelf
systems.Higher committee of education development in
Iraq (HCED
Design of application-specific instruction set processors with asynchronous methodology for embedded digital signal processing applications.
Kwok Yan-lun Andy.Thesis submitted in: November 2004.Thesis (M.Phil.)--Chinese University of Hong Kong, 2005.Includes bibliographical references (leaves 133-137).Abstracts in English and Chinese.Abstract --- p.i摘要 --- p.iiAcknowledgements --- p.iiiList of Figures --- p.viiList of Tables and Examples --- p.xChapter 1. --- Introduction --- p.1Chapter 1.1. --- Motivation --- p.1Chapter 1.2. --- Objective and Approach --- p.4Chapter 1.3. --- Thesis Organization --- p.5Chapter 2. --- Related Work --- p.7Chapter 2.1. --- Coverage --- p.7Chapter 2.2. --- ASIP Design Methodologies --- p.8Chapter 2.3. --- Asynchronous Technology on Processors --- p.12Chapter 2.4. --- Summary --- p.14Chapter 3. --- Asynchronous Design Methodology --- p.15Chapter 3.1. --- Overview --- p.15Chapter 3.2. --- Asynchronous Design Style --- p.17Chapter 3.2.1. --- Micropipelines --- p.17Chapter 3.2.2. --- Fine-grain Pipelining --- p.20Chapter 3.2.3. --- Globally-Asynchronous Locally-Synchronous (GALS) Design --- p.22Chapter 3.3. --- Advantages of GALS in ASIP Design --- p.27Chapter 3.3.1. --- Reuse of Synchronous and Asynchronous IP --- p.27Chapter 3.3.2. --- Fine Tuning of Performance and Power Consumption --- p.27Chapter 3.3.3. --- Synthesis-based Design Flow --- p.28Chapter 3.4. --- Design of GALS Asynchronous Wrapper --- p.28Chapter 3.4.1. --- Handshake Protocol --- p.28Chapter 3.4.2. --- Pausible Clock Generator --- p.29Chapter 3.4.3. --- Port Controllers --- p.30Chapter 3.4.4. --- Performance of the Asynchronous Wrapper --- p.33Chapter 3.5. --- Summary --- p.35Chapter 4. --- Platform Based ASIP Design Methodology --- p.36Chapter 4.1. --- Platform Based Approach --- p.36Chapter 4.1.1. --- The Definition of Our Platform --- p.37Chapter 4.1.2. --- The Definition of the Platform Based Design --- p.37Chapter 4.2. --- Platform Architecture --- p.38Chapter 4.2.1. --- The Nature of DSP Algorithms --- p.38Chapter 4.2.2. --- Design Space of Datapath Optimization --- p.46Chapter 4.2.3. --- Proposed Architecture --- p.49Chapter 4.2.4. --- The Strategy of Realizing an Optimized Datapath --- p.51Chapter 4.2.5. --- Pipeline Organization --- p.59Chapter 4.2.6. --- GALS Partitioning --- p.61Chapter 4.2.7. --- Operation Mechanism --- p.63Chapter 4.3. --- Overall Design Flow --- p.67Chapter 4.4. --- Summary --- p.70Chapter 5. --- Design of the ASIP Platform --- p.72Chapter 5.1. --- Design Goal --- p.72Chapter 5.2. --- Instruction Fetch --- p.74Chapter 5.2.1. --- Instruction fetch unit --- p.74Chapter 5.2.2. --- Zero-overhead loops and Subroutines --- p.75Chapter 5.3. --- Instruction Decode --- p.77Chapter 5.3.1. --- Instruction decoder --- p.77Chapter 5.3.2. --- The Encoding of Parallel and Complex Instructions --- p.80Chapter 5.4. --- Datapath --- p.81Chapter 5.4.1. --- Base Functional Units --- p.81Chapter 5.4.2. --- Functional Unit Wrapper Interface --- p.83Chapter 5.5. --- Register File Systems --- p.84Chapter 5.5.1. --- Memory Hierarchy --- p.84Chapter 5.5.2. --- Register File Organization --- p.85Chapter 5.5.3. --- Address Generation --- p.93Chapter 5.5.4. --- Load and Store --- p.98Chapter 5.6. --- Design Verification --- p.100Chapter 5.7. --- Summary --- p.104Chapter 6. --- Case Studies --- p.105Chapter 6.1. --- Objective --- p.105Chapter 6.2. --- Approach --- p.105Chapter 6.3. --- Based versus Optimized --- p.106Chapter 6.3.1. --- Matrix Manipulation --- p.106Chapter 6.3.2. --- Autocorrelation --- p.109Chapter 6.3.3. --- CORDIC --- p.110Chapter 6.4. --- Optimized versus Advanced Commercial DSPs --- p.113Chapter 6.4.1. --- Introduction to TMS320C62x and SC140 --- p.113Chapter 6.4.2. --- Results --- p.115Chapter 6.5. --- Summary --- p.116Chapter 7. --- Conclusion --- p.118Chapter 7.1. --- When ASIPs encounter asynchronous --- p.118Chapter 7.2. --- Contributions --- p.120Chapter 7.3. --- Future Directions --- p.121Chapter A --- Synthesis of Extended Burst-Mode Asynchronous Finite State Machine --- p.122Chapter B --- Base Instruction Set --- p.124Chapter C --- Special Registers --- p.127Chapter D --- Synthesizable Model of GALS Wrapper --- p.130Reference --- p.13
Scalable Energy-Recovery Architectures.
Energy efficiency is a critical challenge for today's integrated circuits, especially for high-end digital signal processing and communications that require both high throughput and low energy dissipation for extended battery life. Charge-recovery logic recovers and reuses charge using inductive elements and has the potential to achieve order-of-magnitude improvement in energy efficiency while maintaining high performance. However, the lack of large-scale high-speed silicon demonstrations and inductor area overheads are two major concerns.
This dissertation focuses on scalable charge-recovery designs. We present a semi-automated design flow to enable the design of large-scale charge-recovery chips. We also present a new architecture that uses in-package inductors, eliminating the area overheads caused by the use of integrated inductors in high-performance charge-recovery chips.
To demonstrate our semi-automated flow, which uses custom-designed standard-cell-like dynamic cells, we have designed a 576-bit charge-recovery low-density parity-check (LDPC) decoder chip. Functioning correctly at clock speeds above 1 GHz, this prototype is the first-ever demonstration of a GHz-speed charge-recovery chip of significant complexity. In terms of energy consumption, this chip improves over recent state-of-the-art LDPCs by at least 1.3 times with comparable or better area efficiency.
To demonstrate our architecture for eliminating inductor overheads, we have designed a charge-recovery LDPC decoder chip with in-package inductors. This test-chip has been fabricated in a 65nm CMOS flip-chip process. A custom 6-layer FC-BGA package substrate has been designed with 16 inductors embedded in the fifth layer of the package substrate, yielding higher Q and significantly improving area efficiency and energy efficiency compared to their on-chip counterparts. From measurements, this chip achieves at least 2.3 times lower energy consumption with better area efficiency over state-of-the-art published designs.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116653/1/terryou_1.pd
Energy autonomous systems : future trends in devices, technology, and systems
The rapid evolution of electronic devices since the beginning of the nanoelectronics era has brought about exceptional computational power in an ever shrinking system footprint. This has enabled among others the wealth of nomadic battery powered wireless systems (smart phones, mp3 players, GPS, …) that society currently enjoys. Emerging integration technologies enabling even smaller volumes and the associated increased functional density may bring about a new revolution in systems targeting wearable healthcare, wellness, lifestyle and industrial monitoring applications
The Fifth NASA Symposium on VLSI Design
The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design
Circuit design for logic automata
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (p. 143-148).The Logic Automata model is a universal distributed computing structure which pushes parallelism to the bit-level extreme. This new model drastically differs from conventional computer architectures in that it exposes, rather than hides, the physics underlying the computation by accommodating data processing and storage in a local and distributed manner. Based on Logic Automata, highly scalable computing structures for digital and analog processing have been developed; and they are verified at the transistor level in this thesis. The Asynchronous Logic Automata (ALA) model is derived by adding the temporal locality, i.e., the asynchrony in data exchanges, in addition to the spacial locality of the Logic Automata model. As a demonstration of this incrementally extensible, clockless structure, we designed an ALA cell library in 90 nm CMOS technology and established a "pick-and-place" design flow for fast ALA circuit layout. The work flow gracefully aligns the description of computer programs and circuit realizations, providing a simpler and more scalable solution for Application Specific Integrated Circuit (ASIC) designs, which are currently limited by global constraints such as the clock and long interconnects. The potential of the ALA circuit design flow is tested with example applications for mathematical operations. The same Logic Automata model can also be augmented by relaxing the digital states into analog ones for interesting analog computations. The Analog Logic Automata (AnLA) model is a merge of the Analog Logic principle and the Logic Automata architecture, in which efficient processing is embedded onto a scalable construction.(cont.) In order to study the unique property of this mixed-signal computing structure, we designed and fabricated an AnLA test chip in AMI 0.5[mu]m CMOS technology. Chip tests of an AnLA Noise-Locked Loop (NLL) circuit as well as application tests of AnLA image processing and Error-Correcting Code (ECC) decoding, show large potential of the AnLA structure.by Kailiang Chen.S.M