Search CORE

19 research outputs found

Current Sensing Completion Detection in Single-Rail Asynchronous Systems

Author: Brenkuš Juraj
Nagy Lukáš
Stopjaková Viera
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 10/02/2015
Field of study

In this article, an alternative approach to detecting the computation completion of combinatorial blocks in asynchronous digital systems is presented. The proposed methodology is based on well-known phenomenon that occurs in digital systems fabricated in CMOS technology. Such logic circuits exhibit significantly higher current consumption during the signal transitions than in the idle state. Duration of these current peaks correlates very well with the actual computation time of the combinatorial block. Hence, this fact can be exploited for separation of the computation activity from static state. The paper presents fundamental background of addressed alternative completion detection and its implementation in single-rail encoded asynchronous systems, the proposed current sensing circuitry, achieved simulation results as well as the comparison to the state-of-the-art methods of completion detection. The presented method promises the enhancement of the performance of an asynchronous circuit, and under certain circumstances it also reduces the silicon area requirements of the completion detection block

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Self-timed field programmmable gate array architectures

Author: Payne Robert
Publication venue: The University of Edinburgh
Publication date: 01/01/1997
Field of study

Edinburgh Research Archive

동기 회로에서 시간 오류를 고려한 공급전압 제어

Author: 피에르
Publication venue: 서울대학교 대학원
Publication date: 01/02/2015
Field of study

학위논문 (석사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 2. 최기영.Modern embedded systems are becoming more and more constrained by power consumption. While we require those systems to compute even more data at faster speed, lowering energy consumption is essential to preserve battery life as well as integrity of devices. Amongst many techniques to reduce power consumption of chips such as power gating, clock gating, etc., lowering the supply voltage (maybe reducing chips frequency) is known to be the most effective one. However, lowering the supply voltage of chips too much down to near the threshold voltage of transistors causes the logic delay to vary exponentially with intrinsic and extrinsic variations (process variations, temperature, aging, etc.) and thus forces the designer to set increased timing margin. This thesis proposes a technique for automatically adjusting the supply voltage to match the speed of a logic block with a given time constraint. Depending on process and temperature variations, our technique chooses the minimum supply voltage to satisfy the timing constraint defined by the designer. This allows him/her to reduce the default supply voltage of the logic block and thus save power. In our experiments at the 28/32nm technology node, we succeeded in reducing the logic block power by 52% on average by varying the supply voltage between 0.55V and 1V, while the nominal supply voltage is 1.05V.Abstract Contents List of Figures List of Tables Chapter 1 Introduction 1 Chapter 2 Background 5 1.1 Near-Threshold Computing 5 1.2 Current Sensing Completion Detection 7 Chapter 3 Proposed Approach 12 Chapter 4 Experimental setup 16 4.1 Intrinsic Variations 16 4.2 Extrinsic Variations 17 4.3 Control Block 17 4.4 Logic Block 17 4.5 Experimental parameters 19 Chapter 5 Experimental Results 20 5.1 Results at the TT 22 5.2 Result at the FF 22 5.3 Results at the SS 22 5.4 Effect on temperature 25U 5.5 Final power savings 26 Chapter 6 Conclusion and future work 29 Bibliography 31Maste

SNU Open Repository and Archive

Asynchronous memory design.

Author
Publication venue
Publication date: 01/01/1998
Field of study

by Vincent Wing-Yun Sit.Thesis submitted in: June 1997.Thesis (M.Phil.)--Chinese University of Hong Kong, 1998.Includes bibliographical references (leaves 1-4 (3rd gp.)).Abstract also in Chinese.TABLE OF CONTENTSLIST OF FIGURESLIST OF TABLESACKNOWLEDGEMENTSABSTRACTChapter 1. --- INTRODUCTION --- p.1Chapter 1.1 --- ASYNCHRONOUS DESIGN --- p.2Chapter 1.1.1 --- POTENTIAL ADVANTAGES --- p.2Chapter 1.1.2 --- DESIGN METHODOLOGIES --- p.2Chapter 1.1.3 --- SYSTEM CHARACTERISTICS --- p.3Chapter 1.2 --- ASYNCHRONOUS MEMORY --- p.5Chapter 1.2.1 --- MOTIVATION --- p.5Chapter 1.2.2 --- DEFINITION --- p.9Chapter 1.3 --- PROPOSED MEMORY DESIGN --- p.10Chapter 1.3.1 --- CONTROL INTERFACE --- p.10Chapter 1.3.2 --- OVERVIEW --- p.11Chapter 1.3.3 --- HANDSHAKE CONTROL PROTOCOL --- p.13Chapter 2. --- THEORY --- p.16Chapter 2.1 --- VARIABLE BIT LINE LOAD --- p.17Chapter 2.1.1 --- DEFINITION --- p.17Chapter 2.1.2 --- ADVANTAGE --- p.17Chapter 2.2 --- CURRENT SENSING COMPLETION DETECTION --- p.18Chapter 2.2.1 --- BLOCK DIAGRAM --- p.19Chapter 2.2.2 --- GENERAL LSD CURRENT SENSOR --- p.21Chapter 2.2.3 --- CMOS LSD CURRENT SENSOR --- p.23Chapter 2.3 --- VOLTAGE SENSING COMPLETION DETECTION --- p.28Chapter 2.3.1 --- DATA READING IN MEMORY CIRCUIT --- p.29Chapter 2.3.2 --- BLOCK DIAGRAM --- p.30Chapter 2.4 --- MULTIPLE DELAYS COMPLETION GENERATION --- p.32Chapter 2.4.1 --- ADVANTAGE --- p.32Chapter 2.4.2 --- BLOCK DIAGRAM --- p.33Chapter 3. --- IMPLEMENTATION --- p.35Chapter 3.1 --- 1M-BIT SRAM FRAMEWORK --- p.36Chapter 3.1.1 --- INTRODUCTION --- p.36Chapter 3.1.2 --- FRAMEWORK --- p.36Chapter 3.2 --- CONTROL CIRCUIT --- p.40Chapter 3.2.1 --- CONTROL SIGNALS --- p.40Chapter 3.2.1.1 --- EXTERNAL CONTROL SIGNALS --- p.40Chapter 3.2.1.2 --- INTERNAL CONTROL SIGNALS --- p.41Chapter 3.2.2 --- READ / WRITE STATE TRANSITION GRAPHS --- p.42Chapter 3.2.3 --- IMPLEMENTATION --- p.43Chapter 3.3 --- BIT LINE SEGMENTATION --- p.45Chapter 3.3.1 --- FOUR REGIONS SEGMENTATION --- p.46Chapter 3.3.2 --- OPERATION --- p.50Chapter 3.3.3 --- MEMORY CELL --- p.51Chapter 3.4 --- CURRENT SENSING COMPLETION DETECTION --- p.52Chapter 3.4.1 --- ONE BIT DATA BUS --- p.53Chapter 3.4.2 --- EIGHT BITS DATA BUS --- p.55Chapter 3.5 --- VOLTAGE SENSING COMPLETION DETECTION --- p.57Chapter 3.5.1 --- ONE BIT DATA BUS --- p.57Chapter 3.5.2 --- EIGHT BITS DATA BUS --- p.59Chapter 3.6 --- MULTIPLE DELAYS COMPLETION GENERATION --- p.60Chapter 4. --- SIMULATION --- p.63Chapter 4.1 --- SIMULATION ENVIRONMENT --- p.64Chapter 4.1.1 --- SIMULATION PARAMETERS --- p.64Chapter 4.1.2 --- MEMORY TIMING SPECIFICATIONS --- p.64Chapter 4.1.3 --- BIT LINE LOAD DETERMINATION --- p.67Chapter 4.2 --- BENCHMARK SIMULATION --- p.69Chapter 4.2.1 --- CIRCUIT SCHEMATIC --- p.69Chapter 4.2.2 --- RESULTS --- p.71Chapter 4.3 --- CURRENT SENSING COMPLETION DETECTION --- p.73Chapter 4.3.1 --- CIRCUIT SCHEMATIC --- p.73Chapter 4.3.2 --- SENSE AMPLIFIER CURRENT CHARACTERISTICS --- p.75Chapter 4.3.3 --- RESULTS --- p.76Chapter 4.3.4 --- OBSERVATIONS --- p.80Chapter 4.4 --- VOLTAGE SENSING COMPLETION DETECTION --- p.82Chapter 4.4.1 --- CIRCUIT SCHEMATIC --- p.82Chapter 4.4.2 --- RESULTS --- p.83Chapter 4.5 --- MULTIPLE DELAYS COMPLETION GENERATION --- p.89Chapter 4.5.1 --- CIRCUIT SCHEMATIC --- p.89Chapter 4.5.2 --- RESULTS --- p.90Chapter 5. --- TESTING --- p.97Chapter 5.1 --- TEST CHIP DESIGN --- p.98Chapter 5.1.1 --- BLOCK DIAGRAM --- p.98Chapter 5.1.2 --- SCHEMATIC --- p.100Chapter 5.1.3 --- LAYOUT --- p.102Chapter 5.2 --- HSPICE POST-LAYOUT SIMULATION RESULTS --- p.104Chapter 5.2.1 --- GRAPHICAL RESULTS --- p.105Chapter 5.2.2 --- VOLTAGE SENSING COMPLETION DETECTION --- p.108Chapter 5.2.3 --- MULTIPLE DELAYS COMPLETION GENERATION --- p.114Chapter 5.3 --- MEASUREMENTS --- p.117Chapter 5.3.1 --- LOGIC RESULTS --- p.118Chapter 5.3.1.1 --- METHOD --- p.118Chapter 5.3.1.2 --- RESULTS --- p.118Chapter 5.3.2 --- TIMING RESULTS --- p.119Chapter 5.3.2.1 --- METHOD --- p.119Chapter 5.3.2.2 --- GRAPHICAL RESULTS --- p.121Chapter 5.3.2.3 --- VOLTAGE SENSING COMPLETION DETECTION --- p.123Chapter 5.3.2.4 --- MULTIPLE DELAYS COMPLETION GENERATION --- p.125Chapter 6. --- DISCUSSION --- p.127Chapter 6.1 --- CURRENT SENSING COMPLETION DETECTION --- p.128Chapter 6.1.1 --- COMMENTS AND CONCLUSION --- p.128Chapter 6.1.2 --- SUGGESTION --- p.128Chapter 6.2 --- VOLTAGE SENSING COMPLETION DETECTION --- p.129Chapter 6.2.1 --- RESULTS COMPARISON --- p.129Chapter 6.2.1.1 --- GENERAL --- p.129Chapter 6.2.1.2 --- BIT LINE LOAD --- p.132Chapter 6.2.1.3 --- BIT LINE SEGMENTATION --- p.133Chapter 6.2.2 --- RESOURCE CONSUMPTION --- p.133Chapter 6.2.2.1 --- AREA --- p.133Chapter 6.2.2.2 --- POWER --- p.134Chapter 6.2.3 --- COMMENTS AND CONCLUSION --- p.134Chapter 6.3 --- MULTIPLE DELAY COMPLETION GENERATION --- p.135Chapter 6.3.1 --- RESULTS COMPARISON --- p.135Chapter 6.3.1.1 --- GENERAL --- p.135Chapter 6.3.1.2 --- BIT LINE LOAD --- p.136Chapter 6.3.1.3 --- BIT LINE SEGMENTATION --- p.137Chapter 6.3.2 --- RESOURCE CONSUMPTION --- p.138Chapter 6.3.2.1 --- AREA --- p.138Chapter 6.3.2.2 --- POWER --- p.138Chapter 6.3.3 --- COMMENTS AND CONCLUSION --- p.138Chapter 6.4 --- GENERAL COMMENTS --- p.139Chapter 6.4.1 --- COMPARISON OF THE THREE TECHNIQUES --- p.139Chapter 6.4.2 --- BIT LINE SEGMENTATION --- p.141Chapter 6.5 --- APPLICATION --- p.142Chapter 6.6 --- FURTHER DEVELOPMENTS --- p.144Chapter 6.6.1 --- INTERACE WITH TWO-PHASE HCP --- p.144Chapter 6.6.2 --- DATA BUS EXPANSION --- p.146Chapter 6.6.3 --- SPEED OPTIMIZATION --- p.147Chapter 6.6.4 --- MODIFIED WRITE COMPLETION METHOD --- p.150Chapter 7. --- CONCLUSION --- p.152Chapter 7.1 --- PROBLEM DEFINITION --- p.152Chapter 7.2 --- IMPLEMENTATION --- p.152Chapter 7.3 --- EVALUATION --- p.153Chapter 7.4 --- COMMENTS AND SUGGESTIONS --- p.155Chapter 8. --- REFERENCES --- p.R-lChapter 9. --- APPENDIX --- p.A-lChapter 9.1 --- HSPICE SIMULATION PARAMETERS --- p.A-lChapter 9.1.1 --- TYPICAL SIMULATION CONDITION --- p.A-lChapter 9.1.2 --- FAST SIMULATION CONDITION --- p.A-3Chapter 9.1.3 --- SLOW SIMULATION CONDITION --- p.A-4Chapter 9.2 --- SRAM CELL LAYOUT AND NETLIST --- p.A-5Chapter 9.3 --- TEST CHIP SPECIFICATIONS --- p.A-8Chapter 9.3.1 --- GENERAL SPECIFICATIONS --- p.A-8Chapter 9.3.2 --- PIN ASSIGNMENT --- p.A-9Chapter 9.3.3 --- TIMING DIAGRAMS AND SPECIFICATIONS --- p.A-10Chapter 9.3.4 --- SCHEMATICS AND LAYOUTS --- p.A-11Chapter 9.3.4.1 --- STANDARD MEMORY COMPONENTS --- p.A-12Chapter 9.3.4.2 --- DVSCD AND MDCG COMPONENTS --- p.A-20Chapter 9.3.5 --- MICROPHOTOGRAPH --- p.A-2

CUHK Digital Repository

An ICT image processing chip based on fast computation algorithm and self-timed circuit technique.

Author
Publication venue
Publication date: 01/01/1997
Field of study

by Johnson, Tin-Chak Pang.Thesis (M.Phil.)--Chinese University of Hong Kong, 1997.Includes bibliographical references.AcknowledgmentsAbstractList of figuresList of tablesChapter 1. --- Introduction --- p.1-1Chapter 1.1 --- Introduction --- p.1-1Chapter 1.2 --- Introduction to asynchronous system --- p.1-5Chapter 1.2.1 --- Motivation --- p.1-5Chapter 1.2.2 --- Hazards --- p.1-7Chapter 1.2.3 --- Classes of Asynchronous circuits --- p.1-8Chapter 1.3 --- Introduction to Transform Coding --- p.1-9Chapter 1.4 --- Organization of the Thesis --- p.1-16Chapter 2. --- Asynchronous Design Methodologies --- p.2-1Chapter 2.1 --- Introduction --- p.2-1Chapter 2.2 --- Self-timed system --- p.2-2Chapter 2.3 --- DCVSL Methodology --- p.2-4Chapter 2.3.1 --- DCVSL gate --- p.2-5Chapter 2.3.2 --- Handshake Control --- p.2-7Chapter 2.4 --- Micropipeline Methodology --- p.2-11Chapter 2.4.1 --- Summary of previous design --- p.2-12Chapter 2.4.2 --- New Micropipeline structure and improvements --- p.2-17Chapter 2.4.2.1 --- Asymmetrical delay --- p.2-20Chapter 2.4.2.2 --- Variable Delay and Delay Value Selection --- p.2-22Chapter 2.5 --- Comparison between DCVSL and Micropipeline --- p.2-25Chapter 3. --- Self-timed Multipliers --- p.3-1Chapter 3.1 --- Introduction --- p.3-1Chapter 3.2 --- Design Example 1 : Bit-serial matrix multiplier --- p.3-3Chapter 3.2.1 --- DCVSL design --- p.3-4Chapter 3.2.2 --- Micropipeline design --- p.3-4Chapter 3.2.3 --- The first test chip --- p.3-5Chapter 3.2.4 --- Second test chip --- p.3-7Chapter 3.3 --- Design Example 2 - Modified Booth's Multiplier --- p.3-9Chapter 3.3.1 --- Circuit Design --- p.3-10Chapter 3.3.2 --- Simulation result --- p.3-12Chapter 3.3.3 --- The third test chip --- p.3-14Chapter 4. --- Current-Sensing Completion Detection --- p.4-1Chapter 4.1 --- Introduction --- p.4-1Chapter 4.2 --- Current-sensor --- p.4-2Chapter 4.2.1 --- Constant current source --- p.4-2Chapter 4.2.2 --- Current mirror --- p.4-4Chapter 4.2.3 --- Current comparator --- p.4-5Chapter 4.3 --- Self-timed logic using CSCD --- p.4-9Chapter 4.4 --- CSCD test chips and testing results --- p.4-10Chapter 4.4.1 --- Test result --- p.4-11Chapter 5. --- Self-timed ICT processor architecture --- p.5-1Chapter 5.1 --- Introduction --- p.5-1Chapter 5.2 --- Comparison of different architecture --- p.5-3Chapter 5.2.1 --- General purpose Digital Signal Processor --- p.5-5Chapter 5.2.1.1 --- Hardware and speed estimation : --- p.5-6Chapter 5.2.2 --- Micropipeline without fast algorithm --- p.5-7Chapter 5.2.2.1 --- Hardware and speed estimation : --- p.5-8Chapter 5.2.3 --- Micropipeline with fast algorithm (I) --- p.5-8Chapter 5.2.3.1 --- Hardware and speed estimation : --- p.5-9Chapter 5.2.4 --- Micropipeline with fast algorithm (II) --- p.5-10Chapter 5.2.4.1 --- Hardware and speed estimation : --- p.5-11Chapter 6. --- Implementation of self-timed ICT processor --- p.6-1Chapter 6.1 --- Introduction --- p.6-1Chapter 6.2 --- Implementation of Self-timed 2-D ICT processor (First version) --- p.6-3Chapter 6.2.1 --- 1-D ICT module --- p.6-4Chapter 6.2.2 --- Self-timed Transpose memory --- p.6-5Chapter 6.2.3 --- Layout Design --- p.6-8Chapter 6.3 --- Implementation of Self-timed 1-D ICT processor with fast algorithm (final version) --- p.6-9Chapter 6.3.1 --- I/O buffers and control units --- p.6-10Chapter 6.3.1.1 --- Input control --- p.6-11Chapter 6.3.1.2 --- Output control --- p.6-12Chapter 6.3.1.2.1 --- Self-timed Computational Block --- p.6-13Chapter 6.3.1.3 --- Handshake Control Unit --- p.6-14Chapter 6.3.1.4 --- Integer Execution Unit (IEU) --- p.6-18Chapter 6.3.1.5 --- Program memory and Instruction decoder --- p.6-20Chapter 6.3.2 --- Layout Design --- p.6-21Chapter 6.4 --- Specifications of the final version self-timed ICT chip --- p.6-22Chapter 7. --- Testing of Self-timed ICT processor --- p.7-1Chapter 7.1 --- Introduction --- p.7-1Chapter 7.2 --- Pin assignment of Self-timed 1 -D ICT chip --- p.7-2Chapter 7.3 --- Simulation --- p.7-3Chapter 7.4 --- Testing of Self-timed 1-D ICT processor --- p.7-5Chapter 7.4.1 --- Functional test --- p.7-5Chapter 7.4.1.1 --- Testing environment and results --- p.7-5Chapter 7.4.2 --- Transient Characteristics --- p.7-7Chapter 7.4.3 --- Comments on speed and power --- p.7-10Chapter 7.4.4 --- Determination of optimum delay control voltage --- p.7-12Chapter 7.5 --- Testing of delay element and other logic cells --- p.7-13Chapter 8. --- Conclusions --- p.8-1BibliographyAppendice

CUHK Digital Repository

Core interface optimization for multi-core neuromorphic processors

Author: Hwang Hyunjung
Indiveri Giacomo
Su Zhe
Torchet Tristan
Publication venue
Publication date: 08/08/2023
Field of study

Hardware implementations of Spiking Neural Networks (SNNs) represent a promising approach to edge-computing for applications that require low-power and low-latency, and which cannot resort to external cloud-based computing services. However, most solutions proposed so far either support only relatively small networks, or take up significant hardware resources, to implement large networks. To realize large-scale and scalable SNNs it is necessary to develop an efficient asynchronous communication and routing fabric that enables the design of multi-core architectures. In particular the core interface that manages inter-core spike communication is a crucial component as it represents the bottleneck of Power-Performance-Area (PPA) especially for the arbitration architecture and the routing memory. In this paper we present an arbitration mechanism with the corresponding asynchronous encoding pipeline circuits, based on hierarchical arbiter trees. The proposed scheme reduces the latency by more than 70% in sparse-event mode, compared to the state-of-the-art arbitration architectures, with lower area cost. The routing memory makes use of asynchronous Content Addressable Memory (CAM) with Current Sensing Completion Detection (CSCD), which saves approximately 46% energy, and achieves a 40% increase in throughput against conventional asynchronous CAM using configurable delay lines, at the cost of only a slight increase in area. In addition as it radically reduces the core interface resources in multi-core neuromorphic processors, the arbitration architecture and CAM architecture we propose can be also applied to a wide range of general asynchronous circuits and systems

arXiv.org e-Print Archive

Recommended from our members

Methods to improve the reliability and resiliency of near/sub-threshold digital circuits

Author: Crop Joseph A.
Publication venue: 'Oregon State University'
Publication date
Field of study

Energy consumption is one of the primary bottlenecks to both large and small scale modern compute platforms. Reducing the operating voltage of digital circuits to voltages where the supply voltage is near or below the threshold of the transistors has recently gained attention as a method to reduce the energy required for computations by as much as 6 times. However, when operating at near/sub-threshold voltages (where the supply voltage is near or below the threshold of the transistors), imperfections in transistor manufacturing, changes in temperature, and other difficult-to-predict factors cause wide variations in the timing of Complementary Metal-Oxide Semiconductor (CMOS) circuits due to an increased sensitivity at lower voltages. These increased variations result in poor aggregate performance and cause increased rates of error occurrence in computation. This work introduces several new methods to improve the reliability of near/sub-threshold circuits. The first is a design automation technique that is used to aid in low-voltage digital standard cell synthesis. Second, two circuit-level techniques are also introduced that aim to improve the reliability and resiliency of digital circuits by means of completion/error detection. These techniques are shown to improve speed and lower energy consumption at low overheads compared to previous methods. Most importantly, these circuit-level methods are specifically designed to operate at low voltages and can themselves tolerate variations and operation in harsh environments. Finally, a test-chip prototype designed in 65nm-CMOS demonstrates the practicality and feasibility of a proposed current sensing error detector

ScholarsArchive@OSU

Practical advances in asynchronous design

Author: Brunvand Erik L.
Nowick Steven
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1997
Field of study

Journal ArticleRecent practical advances in asynchronous circuit and system design have resulted in renewed interest by circuit designers. Asynchronous systems are being viewed as in increasingly viable alternative to globally synchronous system organization. This tutorial will present the current state of the art in asynchronous circuit and system design in three different areas. The first section details asynchronous control systems. The second describes a variety of approaches to asynchronous datapaths. The third section is on asynchronous and self-timed circuits applied to the design of general purpose processors

The University of Utah: J. Willard Marriott Digital Library

An integrated soft- and hard-programmable multithreaded architecture

Author: Zhong Shi
Publication venue: The University of Edinburgh
Publication date: 01/01/2007
Field of study

Edinburgh Research Archive

Dynamically reconfigurable asynchronous processor

Author: Fawaz Khodor Ahmad
Publication venue: The University of Edinburgh
Publication date: 25/06/2012
Field of study

The main design requirements for today's mobile applications are: · high throughput performance. · high energy efficiency. · high programmability. Until now, the choice of platform has often been limited to Application-Specific Integrated Circuits (ASICs), due to their best-of-breed performance and power consumption. The economies of scale possible with these high-volume markets have traditionally been able to hide the high Non-Recurring Engineering (NRE) costs required for designing and fabricating new ASICs. However, with the NREs and design time escalating with each generation of mobile applications, this practice may be reaching its limit. Designers today are looking at programmable solutions, so that they can respond more rapidly to changes in the market and spread costs over several generations of mobile applications. However, there have been few feasible alternatives to ASICs: Digital Signals Processors (DSPs) and microprocessors cannot meet the throughput requirements, whereas Field-Programmable Gate Arrays (FPGAs) require too much area and power. Coarse-grained dynamically reconfigurable architectures offer better solutions for high throughput applications, when power and area considerations are taken into account. One promising example is the Reconfigurable Instruction Cell Array (RICA). RICA consists of an array of cells with an interconnect that can be dynamically reconfigured on every cycle. This allows quite complex datapaths to be rendered onto the fabric and executed in a single configuration - making these architectures particularly suitable to stream processing. Furthermore, RICA can be programmed from C, making it a good fit with existing design methodologies. However the RICA architecture has a drawback: poor scalability in terms of area and power. As the core gets bigger, the number of sequential elements in the array must be increased significantly to maintain the ability to achieve high throughputs through pipelining. As a result, a larger clock tree is required to synchronise the increased number of sequential elements. The clock tree therefore takes up a larger percentage of the area and power consumption of the core. This thesis presents a novel Dynamically Reconfigurable Asynchronous Processor (DRAP), aimed at high-throughput mobile applications. DRAP is based on the RICA architecture, but uses asynchronous design techniques - methods of designing digital systems without clocks. The absence of a global clock signal makes DRAP more scalable in terms of power and area overhead than its synchronous counterpart. The DRAP architecture maintains most of the benefits of custom asynchronous design, whilst also providing programmability via conventional high-level languages. Results show that the DRAP processor delivers considerably lower power consumption when compared to a market-leading Very Long Instruction Word (VLIW) processor and a low-power ARM processor. For example, DRAP resulted in a reduction in power consumption of 20 times compared to the ARM7 processor, and 29 times compared to the TIC64x VLIW, when running the same benchmark capped to the same throughput and for the same process technology (0.13μm). When compared to an equivalent RICA design, DRAP was up to 22% larger than RICA but resulted in a power reduction of up to 1.9 times. It was also capable of achieving up to 2.8 times higher throughputs than RICA for the same benchmarks

Edinburgh Research Archive