25 research outputs found
VLSI implementation of discrete cosine transform using a new asynchronous pipelined architecture.
Lee Chi-wai.Thesis (M.Phil.)--Chinese University of Hong Kong, 2002.Includes bibliographical references (leaves 191-196).Abstracts in English and Chinese.Abstract of this thesis entitled: --- p.i摘要 --- p.iiiAcknowledgements --- p.vTable of Contents --- p.viiList of Tables --- p.xList of Figures --- p.xiChapter Chapter1 --- Introduction --- p.1Chapter 1.1 --- Synchronous Design --- p.1Chapter 1.2 --- Asynchronous Design --- p.2Chapter 1.3 --- Discrete Cosine Transform --- p.4Chapter 1.4 --- Motivation --- p.5Chapter 1.5 --- Organization of the Thesis --- p.6Chapter Chapter2 --- Asynchronous Design Methodology --- p.7Chapter 2.1 --- Overview --- p.7Chapter 2.2 --- Background --- p.8Chapter 2.3 --- Past Designs --- p.10Chapter 2.4 --- Micropipeline --- p.12Chapter 2.5 --- New Asynchronous Architecture --- p.15Chapter Chapter3 --- DCT/IDCT Processor Design Methodology --- p.24Chapter 3.1 --- Overview --- p.24Chapter 3.2 --- Hardware Architecture --- p.25Chapter 3.3 --- DCT Algorithm --- p.26Chapter 3.4 --- Used Architecture and DCT Algorithm --- p.30Chapter 3.4.1 --- Implementation on Programmable DSP Processor --- p.31Chapter 3.4.2 --- Implementation on Dedicated Processor --- p.33Chapter Chapter4 --- New Techniques for Operating Dynamic Logic in Low Frequency --- p.36Chapter 4.1 --- Overview --- p.36Chapter 4.2 --- Background --- p.37Chapter 4.3 --- Traditional Technique --- p.39Chapter 4.4 --- New Technique - Refresh Control Circuit --- p.40Chapter 4.4.1 --- Principle --- p.41Chapter 4.4.2 --- Voltage Sensor --- p.42Chapter 4.4.3 --- Ring Oscillator --- p.43Chapter 4.4.4 --- "Counter, Latch and Comparator" --- p.46Chapter 4.4.5 --- Recalibrate Circuit --- p.47Chapter 4.4.6 --- Operation Monitoring Circuit --- p.48Chapter 4.4.7 --- Overall Circuit --- p.48Chapter Chapter5 --- DCT Implementation on Programmable DSP Processor --- p.51Chapter 5.1 --- Overview --- p.51Chapter 5.2 --- Processor Architecture --- p.52Chapter 5.2.1 --- Arithmetic Unit --- p.53Chapter 5.2.2 --- Switching Network --- p.56Chapter 5.2.3 --- FIFO Memory --- p.59Chapter 5.2.4 --- Instruction Memory --- p.60Chapter 5.3 --- Programming --- p.62Chapter 5.4 --- DCT Implementation --- p.63Chapter Chapter6 --- DCT Implementation on Dedicated DCT Processor --- p.66Chapter 6.1 --- Overview --- p.66Chapter 6.2 --- DCT Chip Architecture --- p.67Chapter 6.2.1 --- ID DCT Core --- p.68Chapter 6.2.1.1 --- Core Architecture --- p.74Chapter 6.2.1.2 --- Flow of Operation --- p.76Chapter 6.2.1.3 --- Data Replicator --- p.79Chapter 6.2.1.4 --- DCT Coefficients Memory --- p.80Chapter 6.2.2 --- Combination of IDCT to 1D DCT core --- p.82Chapter 6.2.3 --- Accuracy --- p.85Chapter 6.3 --- Transpose Memory --- p.87Chapter 6.3.1 --- Architecture --- p.89Chapter 6.3.2 --- Address Generator --- p.91Chapter 6.3.3 --- RAM Block --- p.94Chapter Chapter7 --- Results and Discussions --- p.97Chapter 7.1 --- Overview --- p.97Chapter 7.2 --- Refresh Control Circuit --- p.97Chapter 7.2.1 --- Implementation Results and Performance --- p.97Chapter 7.2.2 --- Discussion --- p.100Chapter 7.3 --- Programmable DSP Processor --- p.102Chapter 7.3.1 --- Implementation Results and Performance --- p.102Chapter 7.3.2 --- Discussion --- p.104Chapter 7.4 --- ID DCT/IDCT Core --- p.107Chapter 7.4.1 --- Simulation Results --- p.107Chapter 7.4.2 --- Measurement Results --- p.109Chapter 7.4.3 --- Discussion --- p.113Chapter 7.5 --- Transpose Memory --- p.122Chapter 7.5.1 --- Simulated Results --- p.122Chapter 7.5.2 --- Measurement Results --- p.123Chapter 7.5.3 --- Discussion --- p.126Chapter Chapter8 --- Conclusions --- p.130Appendix --- p.133Operations of switches in DCT implementation of programmable DSP processor --- p.133C Program for evaluating the error in DCT/IDCT core --- p.135Pin Assignments of the Programmable DSP Processor Chip --- p.142Pin Assignments of the 1D DCT/IDCT Core Chip --- p.144Pin Assignments of the Transpose Memory Chip --- p.147Chip microphotograph of the 1D DCT/IDCT core --- p.150Chip Microphotograph of the Transpose Memory --- p.151Measured Waveforms of 1D DCT/IDCT Chip --- p.152Measured Waveforms of Transpose Memory Chip --- p.156Schematics of Refresh Control Circuit --- p.158Schematics of Programmable DSP Processor --- p.164Schematics of 1D DCT/IDCT Core --- p.180Schematics of Transpose Memory --- p.187References --- p.191Design Libraries - CD-ROM --- p.19
Doctor of Philosophy
dissertationThe design of integrated circuit (IC) requires an exhaustive verification and a thorough test mechanism to ensure the functionality and robustness of the circuit. This dissertation employs the theory of relative timing that has the advantage of enabling designers to create designs that have significant power and performance over traditional clocked designs. Research has been carried out to enable the relative timing approach to be supported by commercial electronic design automation (EDA) tools. This allows asynchronous and sequential designs to be designed using commercial cad tools. However, two very significant holes in the flow exist: the lack of support for timing verification and manufacturing test. Relative timing (RT) utilizes circuit delay to enforce and measure event sequencing on circuit design. Asynchronous circuits can optimize power-performance product by adjusting the circuit timing. A thorough analysis on the timing characteristic of each and every timing path is required to ensure the robustness and correctness of RT designs. All timing paths have to conform to the circuit timing constraints. This dissertation addresses back-end design robustness by validating full cyclical path timing verification with static timing analysis and implementing design for testability (DFT). Circuit reliability and correctness are necessary aspects for the technology to become commercially ready. In this study, scan-chain, a commercial DFT implementation, is applied to burst-mode RT designs. In addition, a novel testing approach is developed along with scan-chain to over achieve 90% fault coverage on two fault models: stuck-at fault model and delay fault model. This work evaluates the cost of DFT and its coverage trade-off then determines the best implementation. Designs such as a 64-point fast Fourier transform (FFT) design, an I2C design, and a mixed-signal design are built to demonstrate power, area, performance advantages of the relative timing methodology and are used as a platform for developing the backend robustness. Results are verified by performing post-silicon timing validation and test. This work strengthens overall relative timed circuit flow, reliability, and testability
Recommended from our members
Continuous-Time and Companding Digital Signal Processors Using Adaptivity and Asynchronous Techniques
The fully synchronous approach has been the norm for digital signal processors (DSPs) for many decades. Due to its simplicity, the classical DSP structure has been used in many applications. However, due to its rigid discrete-time operation, a classical DSP has limited efficiency or inadequate resolution for some emerging applications, such as processing of multimedia and biological signals. This thesis proposes fundamentally new approaches to designing DSPs, which are different from the classical scheme. The defining characteristic of all new DSPs examined in this thesis is the notion of "adaptivity" or "adaptability." Adaptive DSPs dynamically change their behavior to adjust to some property of their input stream, for example the rate of change of the input. This thesis presents both enhancements to existing adaptive DSPs, as well as new adaptive DSPs. The main class of DSPs that are examined throughout the thesis are continuous-time (CT) DSPs. CT DSPs are clock-less and event-driven; they naturally adapt their activity and power consumption to the rate of their inputs. The absence of a clock also provides a complete avoidance of aliasing in the frequency domain, hence improved signal fidelity. The core of this thesis deals with the complete and systematic design of a truly general-purpose CT DSP. A scalable design methodology for CT DSPs is presented. This leads to the main contribution of this thesis, namely a new CT DSP chip. This chip is the first general-purpose CT DSP chip, able to process many different classes of CT and synchronous signals. The chip has the property of handling various types of signals, i.e. various different digital modulations, both synchronous and asynchronous, without requiring any reconfiguration; such property is presented for the first time CT DSPs and is impossible for classical DSPs. As opposed to previous CT DSPs, which were limited to using only one type of digital format, and whose design was hard to scale for different bandwidths and bit-widths, this chip has a formal, robust and scalable design, due to the systematic usage of asynchronous design techniques. The second contribution of this thesis is a complete methodology to design adaptive delay lines. In particular, it is shown how to make the granularity, i.e. the number of stages, adaptive in a real-time delay line. Adaptive granularity brings about a significant improvement in the line's power consumption, up to 70% as reported by simulations on two design examples. This enhancement can have a direct large power impact on any CT DSP, since a delay line consumes the majority of a CT DSP's power. The robust methodology presented in this thesis allows safe dynamic reconfiguration of the line's granularity, on-the-fly and according to the input traffic. As a final contribution, the thesis also examines two additional DSPs: one operating the CT domain and one using the companding technique. The former operates only on level-crossing samples; the proposed methodology shows a potential for high-quality outputs by using a complex interpolation function. Finally, a companding DSP is presented for MPEG audio. Companding DSPs adapt their dynamic range to the amplitude of their input; the resulting can offer high-quality outputs even for small inputs. By applying companding to MPEG DSPs, it is shown how the DSP distortion can be made almost inaudible, without requiring complex arithmetic hardware
Platforms for handling and development of audiovisual data
Estágio realizado na MOG Solutions e orientado por Vítor TeixeiraTese de mestrado integrado. Engenharia Informátca e Computação. Faculdade de Engenharia. Universidade do Porto. 200
Media gateway utilizando um GPU
Mestrado em Engenharia de Computadores e Telemátic
Memristor-based design solutions for mitigating parametric variations in IoT applications
PhD ThesisRapid advancement of the internet of things (IoT) is predicated by two important factors
of the electronic technology, namely device size and energy-efficiency. With smaller
size comes the problem of process, voltage and temperature (PVT) variations of delays
which are the key operational parameters of devices. Parametric variability is also
an obstacle on the way to allowing devices to work in systems with unpredictable
power sources, such as those powered by energy-harvesters. Designers tackle these
problems holistically by developing new techniques such as asynchronous logic, where
mechanisms such as matching delays are widely used to adapt to delay variations. To
mitigate energy efficiency and power interruption issues the matching delays need to
be ideally retained in a non-volatile storage. Meanwhile, a resistive memory called
memristor becomes a promising component for power-restricted applications owing to
its inherent non-volatility. While providing non-volatility, the use of memristor in delay
matching incurs some power overheads. This creates the first challenge on the way of
introducing memristors into IoT devices for the delay matching.
Another important factor affecting the use of memristors in IoT devices is the
dependence of the memristor value on temperature. For example, a memristance
decoder used in the memristor-based components must be able to correct the read data
without incurring significant overheads on the overall system. This creates the second
challenge for overcoming the temperature effect in memristance decoding process.
In this research, we propose methods for improving PVT tolerance and energy
characteristics of IoT devices from the perspective of above two main challenges:
(i) utilising memristor to enhance the energy efficiency of the delay element (DE), and
(ii) improving the temperature awareness and energy robustness of the memristance
decoder.
For memristor-based delay element (MemDE), we applied a memristor between two
inverters to vary the path resistance, which determines the RC delay. This allows power
saving due to the low number of switching components and the absence of external delay
storage. We also investigate a solution for avoiding the unintended tuning (UT) and a
timing model to estimate the proper pulse width for memristance tuning. The simulation
results based on UMC 180nm technology and VTEAM model show the MemDE can
provide the delay between 0.55ns and 1.44ns which is compatible to the 4-bit multiplexerbased
delay element (MuxDE) in the same technology while consuming thirteen times
less power. The key contribution within (i) is the development of low-power MemDE to
mitigate the timing mismatch caused by PVT variations.
To estimate the temperature effect on memristance, we develop an empirical temperature
model which fits both titanium dioxide and silver chalcogenide memristors. The
temperature experiments are conducted using the latter device, and the results confirm
the validity of the proposed model with the accuracy R-squared >88%. The memristance
decoder is designed to deliver two key advantages. Firstly, the temperature model is
integrated into the VTEAM model to enable the temperature compensation. Secondly, it
supports resolution scalability to match the energy budget. The simulation results of the
2-bit decoder based on UMC 65nm technology show the energy can be varied between
49fJ and 98fJ. This is the second major contribution to address the challenge (ii).
This thesis gives future research directions into an in-depth study of the memristive
electronics as a variation-robust energy-efficient design paradigm and its impact on
developing future IoT applications.sponsored by the Royal Thai Governmen
Language and compiler support for stream programs
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 153-166).Stream programs represent an important class of high-performance computations. Defined by their regular processing of sequences of data, stream programs appear most commonly in the context of audio, video, and digital signal processing, though also in networking, encryption, and other areas. Stream programs can be naturally represented as a graph of independent actors that communicate explicitly over data channels. In this work we focus on programs where the input and output rates of actors are known at compile time, enabling aggressive transformations by the compiler; this model is known as synchronous dataflow. We develop a new programming language, StreamIt, that empowers both programmers and compiler writers to leverage the unique properties of the streaming domain. StreamIt offers several new abstractions, including hierarchical single-input single-output streams, composable primitives for data reordering, and a mechanism called teleport messaging that enables precise event handling in a distributed environment. We demonstrate the feasibility of developing applications in StreamIt via a detailed characterization of our 34,000-line benchmark suite, which spans from MPEG-2 encoding/decoding to GMTI radar processing. We also present a novel dynamic analysis for migrating legacy C programs into a streaming representation. The central premise of stream programming is that it enables the compiler to perform powerful optimizations. We support this premise by presenting a suite of new transformations. We describe the first translation of stream programs into the compressed domain, enabling programs written for uncompressed data formats to automatically operate directly on compressed data formats (based on LZ77). This technique offers a median speedup of 15x on common video editing operations.(cont.) We also review other optimizations developed in the StreamIt group, including automatic parallelization (offering an 11x mean speedup on the 16-core Raw machine), optimization of linear computations (offering a 5.5x average speedup on a Pentium 4), and cache-aware scheduling (offering a 3.5x mean speedup on a StrongARM 1100). While these transformations are beyond the reach of compilers for traditional languages such as C, they become tractable given the abundant parallelism and regular communication patterns exposed by the stream programming model.by William Thies.Ph.D
Intelligent Sensor Networks
In the last decade, wireless or wired sensor networks have attracted much attention. However, most designs target general sensor network issues including protocol stack (routing, MAC, etc.) and security issues. This book focuses on the close integration of sensing, networking, and smart signal processing via machine learning. Based on their world-class research, the authors present the fundamentals of intelligent sensor networks. They cover sensing and sampling, distributed signal processing, and intelligent signal learning. In addition, they present cutting-edge research results from leading experts