4 research outputs found

    Design of application-specific instruction set processors with asynchronous methodology for embedded digital signal processing applications.

    Get PDF
    Kwok Yan-lun Andy.Thesis submitted in: November 2004.Thesis (M.Phil.)--Chinese University of Hong Kong, 2005.Includes bibliographical references (leaves 133-137).Abstracts in English and Chinese.Abstract --- p.i摘芁 --- p.iiAcknowledgements --- p.iiiList of Figures --- p.viiList of Tables and Examples --- p.xChapter 1. --- Introduction --- p.1Chapter 1.1. --- Motivation --- p.1Chapter 1.2. --- Objective and Approach --- p.4Chapter 1.3. --- Thesis Organization --- p.5Chapter 2. --- Related Work --- p.7Chapter 2.1. --- Coverage --- p.7Chapter 2.2. --- ASIP Design Methodologies --- p.8Chapter 2.3. --- Asynchronous Technology on Processors --- p.12Chapter 2.4. --- Summary --- p.14Chapter 3. --- Asynchronous Design Methodology --- p.15Chapter 3.1. --- Overview --- p.15Chapter 3.2. --- Asynchronous Design Style --- p.17Chapter 3.2.1. --- Micropipelines --- p.17Chapter 3.2.2. --- Fine-grain Pipelining --- p.20Chapter 3.2.3. --- Globally-Asynchronous Locally-Synchronous (GALS) Design --- p.22Chapter 3.3. --- Advantages of GALS in ASIP Design --- p.27Chapter 3.3.1. --- Reuse of Synchronous and Asynchronous IP --- p.27Chapter 3.3.2. --- Fine Tuning of Performance and Power Consumption --- p.27Chapter 3.3.3. --- Synthesis-based Design Flow --- p.28Chapter 3.4. --- Design of GALS Asynchronous Wrapper --- p.28Chapter 3.4.1. --- Handshake Protocol --- p.28Chapter 3.4.2. --- Pausible Clock Generator --- p.29Chapter 3.4.3. --- Port Controllers --- p.30Chapter 3.4.4. --- Performance of the Asynchronous Wrapper --- p.33Chapter 3.5. --- Summary --- p.35Chapter 4. --- Platform Based ASIP Design Methodology --- p.36Chapter 4.1. --- Platform Based Approach --- p.36Chapter 4.1.1. --- The Definition of Our Platform --- p.37Chapter 4.1.2. --- The Definition of the Platform Based Design --- p.37Chapter 4.2. --- Platform Architecture --- p.38Chapter 4.2.1. --- The Nature of DSP Algorithms --- p.38Chapter 4.2.2. --- Design Space of Datapath Optimization --- p.46Chapter 4.2.3. --- Proposed Architecture --- p.49Chapter 4.2.4. --- The Strategy of Realizing an Optimized Datapath --- p.51Chapter 4.2.5. --- Pipeline Organization --- p.59Chapter 4.2.6. --- GALS Partitioning --- p.61Chapter 4.2.7. --- Operation Mechanism --- p.63Chapter 4.3. --- Overall Design Flow --- p.67Chapter 4.4. --- Summary --- p.70Chapter 5. --- Design of the ASIP Platform --- p.72Chapter 5.1. --- Design Goal --- p.72Chapter 5.2. --- Instruction Fetch --- p.74Chapter 5.2.1. --- Instruction fetch unit --- p.74Chapter 5.2.2. --- Zero-overhead loops and Subroutines --- p.75Chapter 5.3. --- Instruction Decode --- p.77Chapter 5.3.1. --- Instruction decoder --- p.77Chapter 5.3.2. --- The Encoding of Parallel and Complex Instructions --- p.80Chapter 5.4. --- Datapath --- p.81Chapter 5.4.1. --- Base Functional Units --- p.81Chapter 5.4.2. --- Functional Unit Wrapper Interface --- p.83Chapter 5.5. --- Register File Systems --- p.84Chapter 5.5.1. --- Memory Hierarchy --- p.84Chapter 5.5.2. --- Register File Organization --- p.85Chapter 5.5.3. --- Address Generation --- p.93Chapter 5.5.4. --- Load and Store --- p.98Chapter 5.6. --- Design Verification --- p.100Chapter 5.7. --- Summary --- p.104Chapter 6. --- Case Studies --- p.105Chapter 6.1. --- Objective --- p.105Chapter 6.2. --- Approach --- p.105Chapter 6.3. --- Based versus Optimized --- p.106Chapter 6.3.1. --- Matrix Manipulation --- p.106Chapter 6.3.2. --- Autocorrelation --- p.109Chapter 6.3.3. --- CORDIC --- p.110Chapter 6.4. --- Optimized versus Advanced Commercial DSPs --- p.113Chapter 6.4.1. --- Introduction to TMS320C62x and SC140 --- p.113Chapter 6.4.2. --- Results --- p.115Chapter 6.5. --- Summary --- p.116Chapter 7. --- Conclusion --- p.118Chapter 7.1. --- When ASIPs encounter asynchronous --- p.118Chapter 7.2. --- Contributions --- p.120Chapter 7.3. --- Future Directions --- p.121Chapter A --- Synthesis of Extended Burst-Mode Asynchronous Finite State Machine --- p.122Chapter B --- Base Instruction Set --- p.124Chapter C --- Special Registers --- p.127Chapter D --- Synthesizable Model of GALS Wrapper --- p.130Reference --- p.13

    Inter-module Interfacing techniques for SoCs with multiple clock domains to address challenges in modern deep sub-micron technologies

    Get PDF
    Miniaturization of integrated circuits (ICs) due to the improvement in lithographic techniques in modem deep sub-micron (DSM) technologies allows several complex processing elements to coexist in one IC, which are called System-on-Chip. As a first contribution, this thesis quantitatively analyzes the severity of timing constraints associated with Clock Distribution Network (CDN) in modem DSM technologies and shows that different processing elements may work in different dock domains to alleviate these constraints. Such systems are known as Globally Asynchronous Locally Synchronous (GALS) systems. It is imperative that different processing elements of a GALS system need to communicate with each other through some interfacing technique, and these interfaces can be asynchronous or synchronous. Conventionally, the asynchronous interfaces are described at the Register Transfer Logic (RTL) or system level. Such designs are susceptible to certain design constraints that cannot be addressed at higher abstraction levels; crosstalk glitch is one such constraint. This thesis initially identifies, using an analytical model, the possibility of asynchronous interface malfunction due to crosstalk glitch propagation. Next, we characterize crosstalk glitch propagation under normal operating conditions for two different classes of asynchronous protocols, namely bundled data protocol based and delay insensitive asynchronous designs. Subsequently, we propose a logic abstraction level modeling technique, which provides a framework to the designer to verify the asynchronous protocols against crosstalk glitches. The utility of this modeling technique is demonstrated experimentally on a Xilinx Virtex-II Pro FPGA. Furthermore, a novel methodology is proposed to quench such crosstalk glitch propagation through gating the asynchronous interface from sending the signal during potential glitch vulnerable instances. This methodology is termed as crosstalk glitch gating. This technique is successfully applied to obtain crosstalk glitch quenching in the representative interfaces. This thesis also addresses the dock skew challenges faced by high-performance synchronous interfacing methodologies in modem DSM technologies. The proposed methodology allows communicating modules to run at a frequency that is independent of the dock skew. Leveraging a novel clock-scheduling algorithm, our technique permits a faster module to communicate safely with a slower module without slowing down. Safe data communications for mesochronous schemes and for the cases when communicating modules have dock frequency ratios of integer or coprime numbers are theoretically explained and experimentally demonstrated. A clock-scheduling technique to dynamically accommodate phase variations is also proposed. These methods are implemented to the Xilinx Virtex II Pro technology. Experiments prove that the proposed interfacing scheme allows modules to communicate data safely, for mesochronous schemes, at 350 MHz, which is the limit of the technology used, under a dock skew of more than twice the time period (i.e. a dock skew of 12 ns

    Solutions and application areas of flip-flop metastability

    Get PDF
    PhD ThesisThe state space of every continuous multi-stable system is bound to contain one or more metastable regions where the net attraction to the stable states can be infinitely-small. Flip-flops are among these systems and can take an unbounded amount of time to decide which logic state to settle to once they become metastable. This problematic behavior is often prevented by placing the setup and hold time conditions on the flip-flop’s input. However, in applications such as clock domain crossing where these constraints cannot be placed flip-flops can become metastable and induce catastrophic failures. These events are fundamentally impossible to prevent but their probability can be significantly reduced by employing synchronizer circuits. The latter grant flip-flops longer decision time at the expense of introducing latency in processing the synchronized input. This thesis presents a collection of research work involving the phenomenon of flip-flop metastability in digital systems. The main contributions include three novel solutions for the problem of synchronization. Two of these solutions are speculative methods that rely on duplicate state machines to pre-compute data-dependent states ahead of the completion of synchronization. Speculation is a core theme of this thesis and is investigated in terms of its functional correctness, cost efficacy and fitness for being automated by electronic design automation tools. It is shown that speculation can outperform conventional synchronization solutions in practical terms and is a viable option for future technologies. The third solution attempts to address the problem of synchronization in the more-specific context of variable supply voltages. Finally, the thesis also identifies a novel application of metastability as a means of quantifying intra-chip physical parameters. A digital sensor is proposed based on the sensitivity of metastable flip-flops to changes in their environmental parameters and is shown to have better precision while being more compact than conventional digital sensors

    Clock Generator Circuits for Low-Power Heterogeneous Multiprocessor Systems-on-Chip

    Get PDF
    In this work concepts and circuits for local clock generation in low-power heterogeneous multiprocessor systems-on-chip (MPSoCs) are researched and developed. The targeted systems feature a globally asynchronous locally synchronous (GALS) clocking architecture and advanced power management functionality, as for example fine-grained ultra-fast dynamic voltage and frequency scaling (DVFS). To enable this functionality compact clock generators with low chip area, low power consumption, wide output frequency range and the capability for ultra-fast frequency changes are required. They are to be instantiated individually per core. For this purpose compact all digital phase-locked loop (ADPLL) frequency synthesizers are developed. The bang-bang ADPLL architecture is analyzed using a numerical system model and optimized for low jitter accumulation. A 65nm CMOS ADPLL is implemented, featuring a novel active current bias circuit which compensates the supply voltage and temperature sensitivity of the digitally controlled oscillator (DCO) for reduced digital tuning effort. Additionally, a 28nm ADPLL with a new ultra-fast lock-in scheme based on single-shot phase synchronization is proposed. The core clock is generated by an open-loop method using phase-switching between multi-phase DCO clocks at a fixed frequency. This allows instantaneous core frequency changes for ultra-fast DVFS without re-locking the closed loop ADPLL. The sensitivity of the open-loop clock generator with respect to phase mismatch is analyzed analytically and a compensation technique by cross-coupled inverter buffers is proposed. The clock generators show small area (0.0097mm2 (65nm), 0.00234mm2 (28nm)), low power consumption (2.7mW (65nm), 0.64mW (28nm)) and they provide core clock frequencies from 83MHz to 666MHz which can be changed instantaneously. The jitter performance is compliant to DDR2/DDR3 memory interface specifications. Additionally, high-speed clocks for novel serial on-chip data transceivers are generated. The ADPLL circuits have been verified successfully by 3 testchip implementations. They enable efficient realization of future low-power MPSoCs with advanced power management functionality in deep-submicron CMOS technologies.In dieser Arbeit werden Konzepte und Schaltungen zur lokalen Takterzeugung in heterogenen Multiprozessorsystemen (MPSoCs) mit geringer Verlustleistung erforscht und entwickelt. Diese Systeme besitzen eine global-asynchrone lokal-synchrone Architektur sowie FunktionalitĂ€t zum Power Management, wie z.B. das feingranulare, schnelle Skalieren von Spannung und Taktfrequenz (DVFS). Um diese FunktionalitĂ€t zu realisieren werden kompakte Taktgeneratoren benötigt, welche eine kleine ChipflĂ€che einnehmen, wenig Verlustleitung aufnehmen, einen weiten Bereich an Ausgangsfrequenzen erzeugen und diese sehr schnell Ă€ndern können. Sie sollen individuell pro Prozessorkern integriert werden. Dazu werden kompakte volldigitale Phasenregelkreise (ADPLLs) entwickelt, wobei eine bang-bang ADPLL Architektur numerisch modelliert und fĂŒr kleine Jitterakkumulation optimiert wird. Es wird eine 65nm CMOS ADPLL implementiert, welche eine neuartige Kompensationsschlatung fĂŒr den digital gesteuerten Oszillator (DCO) zur Verringerung der SensitivitĂ€t bezĂŒglich Versorgungsspannung und Temperatur beinhaltet. ZusĂ€tzlich wird eine 28nm CMOS ADPLL mit einer neuen Technik zum schnellen Einschwingen unter Nutzung eines Phasensynchronisierers realisiert. Der Prozessortakt wird durch ein neuartiges Phasenmultiplex- und Frequenzteilerverfahren erzeugt, welches es ermöglicht die Taktfrequenz sofort zu Ă€ndern um schnelles DVFS zu realisieren. Die SensitivitĂ€t dieses Frequenzgenerators bezĂŒglich Phasen-Mismatch wird theoretisch analysiert und durch Verwendung von kreuzgekoppelten TaktverstĂ€rkern kompensiert. Die hier entwickelten Taktgeneratoren haben eine kleine ChipflĂ€che (0.0097mm2 (65nm), 0.00234mm2 (28nm)) und Leistungsaufnahme (2.7mW (65nm), 0.64mW (28nm)). Sie stellen Frequenzen von 83MHz bis 666MHz bereit, welche sofort geĂ€ndert werden können. Die Schaltungen erfĂŒllen die Jitterspezifikationen von DDR2/DDR3 Speicherinterfaces. ZusĂ€tzliche können schnelle Takte fĂŒr neuartige serielle on-Chip Verbindungen erzeugt werden. Die ADPLL Schaltungen wurden erfolgreich in 3 Testchips erprobt. Sie ermöglichen die effiziente Realisierung von zukĂŒnftigen MPSoCs mit Power Management in modernsten CMOS Technologien
    corecore