19 research outputs found
Current Sensing Completion Detection in Single-Rail Asynchronous Systems
In this article, an alternative approach to detecting the computation completion of combinatorial blocks in asynchronous digital systems is presented. The proposed methodology is based on well-known phenomenon that occurs in digital systems fabricated in CMOS technology. Such logic circuits exhibit significantly higher current consumption during the signal transitions than in the idle state. Duration of these current peaks correlates very well with the actual computation time of the combinatorial block. Hence, this fact can be exploited for separation of the computation activity from static state. The paper presents fundamental background of addressed alternative completion detection and its implementation in single-rail encoded asynchronous systems, the proposed current sensing circuitry, achieved simulation results as well as the comparison to the state-of-the-art methods of completion detection. The presented method promises the enhancement of the performance of an asynchronous circuit, and under certain circumstances it also reduces the silicon area requirements of the completion detection block
동기 회로에서 시간 오류를 고려한 공급전압 제어
학위논문 (석사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 2. 최기영.Modern embedded systems are becoming more and more constrained by power consumption. While we require those systems to compute even more data at faster speed, lowering energy consumption is essential to preserve battery life as well as integrity of devices.
Amongst many techniques to reduce power consumption of chips such as power gating, clock gating, etc., lowering the supply voltage (maybe reducing chips frequency) is known to be the most effective one. However, lowering the supply voltage of chips too much down to near the threshold voltage of transistors causes the logic delay to vary exponentially with intrinsic and extrinsic variations (process variations, temperature, aging, etc.) and thus forces the designer to set increased timing margin.
This thesis proposes a technique for automatically adjusting the supply voltage to match the speed of a logic block with a given time constraint. Depending on process and temperature variations, our technique chooses the minimum supply voltage to satisfy the timing constraint defined by the designer. This allows him/her to reduce the default supply voltage of the logic block and thus save power. In our experiments at the 28/32nm technology node, we succeeded in reducing the logic block power by 52% on average by varying the supply voltage between 0.55V and 1V, while the nominal supply voltage is 1.05V.Abstract
Contents
List of Figures
List of Tables
Chapter 1 Introduction 1
Chapter 2 Background 5
1.1 Near-Threshold Computing 5
1.2 Current Sensing Completion Detection 7
Chapter 3 Proposed Approach 12
Chapter 4 Experimental setup 16
4.1 Intrinsic Variations 16
4.2 Extrinsic Variations 17
4.3 Control Block 17
4.4 Logic Block 17
4.5 Experimental parameters 19
Chapter 5 Experimental Results 20
5.1 Results at the TT 22
5.2 Result at the FF 22
5.3 Results at the SS 22
5.4 Effect on temperature 25U
5.5 Final power savings 26
Chapter 6 Conclusion and future work 29
Bibliography 31Maste
Asynchronous memory design.
by Vincent Wing-Yun Sit.Thesis submitted in: June 1997.Thesis (M.Phil.)--Chinese University of Hong Kong, 1998.Includes bibliographical references (leaves 1-4 (3rd gp.)).Abstract also in Chinese.TABLE OF CONTENTSLIST OF FIGURESLIST OF TABLESACKNOWLEDGEMENTSABSTRACTChapter 1. --- INTRODUCTION --- p.1Chapter 1.1 --- ASYNCHRONOUS DESIGN --- p.2Chapter 1.1.1 --- POTENTIAL ADVANTAGES --- p.2Chapter 1.1.2 --- DESIGN METHODOLOGIES --- p.2Chapter 1.1.3 --- SYSTEM CHARACTERISTICS --- p.3Chapter 1.2 --- ASYNCHRONOUS MEMORY --- p.5Chapter 1.2.1 --- MOTIVATION --- p.5Chapter 1.2.2 --- DEFINITION --- p.9Chapter 1.3 --- PROPOSED MEMORY DESIGN --- p.10Chapter 1.3.1 --- CONTROL INTERFACE --- p.10Chapter 1.3.2 --- OVERVIEW --- p.11Chapter 1.3.3 --- HANDSHAKE CONTROL PROTOCOL --- p.13Chapter 2. --- THEORY --- p.16Chapter 2.1 --- VARIABLE BIT LINE LOAD --- p.17Chapter 2.1.1 --- DEFINITION --- p.17Chapter 2.1.2 --- ADVANTAGE --- p.17Chapter 2.2 --- CURRENT SENSING COMPLETION DETECTION --- p.18Chapter 2.2.1 --- BLOCK DIAGRAM --- p.19Chapter 2.2.2 --- GENERAL LSD CURRENT SENSOR --- p.21Chapter 2.2.3 --- CMOS LSD CURRENT SENSOR --- p.23Chapter 2.3 --- VOLTAGE SENSING COMPLETION DETECTION --- p.28Chapter 2.3.1 --- DATA READING IN MEMORY CIRCUIT --- p.29Chapter 2.3.2 --- BLOCK DIAGRAM --- p.30Chapter 2.4 --- MULTIPLE DELAYS COMPLETION GENERATION --- p.32Chapter 2.4.1 --- ADVANTAGE --- p.32Chapter 2.4.2 --- BLOCK DIAGRAM --- p.33Chapter 3. --- IMPLEMENTATION --- p.35Chapter 3.1 --- 1M-BIT SRAM FRAMEWORK --- p.36Chapter 3.1.1 --- INTRODUCTION --- p.36Chapter 3.1.2 --- FRAMEWORK --- p.36Chapter 3.2 --- CONTROL CIRCUIT --- p.40Chapter 3.2.1 --- CONTROL SIGNALS --- p.40Chapter 3.2.1.1 --- EXTERNAL CONTROL SIGNALS --- p.40Chapter 3.2.1.2 --- INTERNAL CONTROL SIGNALS --- p.41Chapter 3.2.2 --- READ / WRITE STATE TRANSITION GRAPHS --- p.42Chapter 3.2.3 --- IMPLEMENTATION --- p.43Chapter 3.3 --- BIT LINE SEGMENTATION --- p.45Chapter 3.3.1 --- FOUR REGIONS SEGMENTATION --- p.46Chapter 3.3.2 --- OPERATION --- p.50Chapter 3.3.3 --- MEMORY CELL --- p.51Chapter 3.4 --- CURRENT SENSING COMPLETION DETECTION --- p.52Chapter 3.4.1 --- ONE BIT DATA BUS --- p.53Chapter 3.4.2 --- EIGHT BITS DATA BUS --- p.55Chapter 3.5 --- VOLTAGE SENSING COMPLETION DETECTION --- p.57Chapter 3.5.1 --- ONE BIT DATA BUS --- p.57Chapter 3.5.2 --- EIGHT BITS DATA BUS --- p.59Chapter 3.6 --- MULTIPLE DELAYS COMPLETION GENERATION --- p.60Chapter 4. --- SIMULATION --- p.63Chapter 4.1 --- SIMULATION ENVIRONMENT --- p.64Chapter 4.1.1 --- SIMULATION PARAMETERS --- p.64Chapter 4.1.2 --- MEMORY TIMING SPECIFICATIONS --- p.64Chapter 4.1.3 --- BIT LINE LOAD DETERMINATION --- p.67Chapter 4.2 --- BENCHMARK SIMULATION --- p.69Chapter 4.2.1 --- CIRCUIT SCHEMATIC --- p.69Chapter 4.2.2 --- RESULTS --- p.71Chapter 4.3 --- CURRENT SENSING COMPLETION DETECTION --- p.73Chapter 4.3.1 --- CIRCUIT SCHEMATIC --- p.73Chapter 4.3.2 --- SENSE AMPLIFIER CURRENT CHARACTERISTICS --- p.75Chapter 4.3.3 --- RESULTS --- p.76Chapter 4.3.4 --- OBSERVATIONS --- p.80Chapter 4.4 --- VOLTAGE SENSING COMPLETION DETECTION --- p.82Chapter 4.4.1 --- CIRCUIT SCHEMATIC --- p.82Chapter 4.4.2 --- RESULTS --- p.83Chapter 4.5 --- MULTIPLE DELAYS COMPLETION GENERATION --- p.89Chapter 4.5.1 --- CIRCUIT SCHEMATIC --- p.89Chapter 4.5.2 --- RESULTS --- p.90Chapter 5. --- TESTING --- p.97Chapter 5.1 --- TEST CHIP DESIGN --- p.98Chapter 5.1.1 --- BLOCK DIAGRAM --- p.98Chapter 5.1.2 --- SCHEMATIC --- p.100Chapter 5.1.3 --- LAYOUT --- p.102Chapter 5.2 --- HSPICE POST-LAYOUT SIMULATION RESULTS --- p.104Chapter 5.2.1 --- GRAPHICAL RESULTS --- p.105Chapter 5.2.2 --- VOLTAGE SENSING COMPLETION DETECTION --- p.108Chapter 5.2.3 --- MULTIPLE DELAYS COMPLETION GENERATION --- p.114Chapter 5.3 --- MEASUREMENTS --- p.117Chapter 5.3.1 --- LOGIC RESULTS --- p.118Chapter 5.3.1.1 --- METHOD --- p.118Chapter 5.3.1.2 --- RESULTS --- p.118Chapter 5.3.2 --- TIMING RESULTS --- p.119Chapter 5.3.2.1 --- METHOD --- p.119Chapter 5.3.2.2 --- GRAPHICAL RESULTS --- p.121Chapter 5.3.2.3 --- VOLTAGE SENSING COMPLETION DETECTION --- p.123Chapter 5.3.2.4 --- MULTIPLE DELAYS COMPLETION GENERATION --- p.125Chapter 6. --- DISCUSSION --- p.127Chapter 6.1 --- CURRENT SENSING COMPLETION DETECTION --- p.128Chapter 6.1.1 --- COMMENTS AND CONCLUSION --- p.128Chapter 6.1.2 --- SUGGESTION --- p.128Chapter 6.2 --- VOLTAGE SENSING COMPLETION DETECTION --- p.129Chapter 6.2.1 --- RESULTS COMPARISON --- p.129Chapter 6.2.1.1 --- GENERAL --- p.129Chapter 6.2.1.2 --- BIT LINE LOAD --- p.132Chapter 6.2.1.3 --- BIT LINE SEGMENTATION --- p.133Chapter 6.2.2 --- RESOURCE CONSUMPTION --- p.133Chapter 6.2.2.1 --- AREA --- p.133Chapter 6.2.2.2 --- POWER --- p.134Chapter 6.2.3 --- COMMENTS AND CONCLUSION --- p.134Chapter 6.3 --- MULTIPLE DELAY COMPLETION GENERATION --- p.135Chapter 6.3.1 --- RESULTS COMPARISON --- p.135Chapter 6.3.1.1 --- GENERAL --- p.135Chapter 6.3.1.2 --- BIT LINE LOAD --- p.136Chapter 6.3.1.3 --- BIT LINE SEGMENTATION --- p.137Chapter 6.3.2 --- RESOURCE CONSUMPTION --- p.138Chapter 6.3.2.1 --- AREA --- p.138Chapter 6.3.2.2 --- POWER --- p.138Chapter 6.3.3 --- COMMENTS AND CONCLUSION --- p.138Chapter 6.4 --- GENERAL COMMENTS --- p.139Chapter 6.4.1 --- COMPARISON OF THE THREE TECHNIQUES --- p.139Chapter 6.4.2 --- BIT LINE SEGMENTATION --- p.141Chapter 6.5 --- APPLICATION --- p.142Chapter 6.6 --- FURTHER DEVELOPMENTS --- p.144Chapter 6.6.1 --- INTERACE WITH TWO-PHASE HCP --- p.144Chapter 6.6.2 --- DATA BUS EXPANSION --- p.146Chapter 6.6.3 --- SPEED OPTIMIZATION --- p.147Chapter 6.6.4 --- MODIFIED WRITE COMPLETION METHOD --- p.150Chapter 7. --- CONCLUSION --- p.152Chapter 7.1 --- PROBLEM DEFINITION --- p.152Chapter 7.2 --- IMPLEMENTATION --- p.152Chapter 7.3 --- EVALUATION --- p.153Chapter 7.4 --- COMMENTS AND SUGGESTIONS --- p.155Chapter 8. --- REFERENCES --- p.R-lChapter 9. --- APPENDIX --- p.A-lChapter 9.1 --- HSPICE SIMULATION PARAMETERS --- p.A-lChapter 9.1.1 --- TYPICAL SIMULATION CONDITION --- p.A-lChapter 9.1.2 --- FAST SIMULATION CONDITION --- p.A-3Chapter 9.1.3 --- SLOW SIMULATION CONDITION --- p.A-4Chapter 9.2 --- SRAM CELL LAYOUT AND NETLIST --- p.A-5Chapter 9.3 --- TEST CHIP SPECIFICATIONS --- p.A-8Chapter 9.3.1 --- GENERAL SPECIFICATIONS --- p.A-8Chapter 9.3.2 --- PIN ASSIGNMENT --- p.A-9Chapter 9.3.3 --- TIMING DIAGRAMS AND SPECIFICATIONS --- p.A-10Chapter 9.3.4 --- SCHEMATICS AND LAYOUTS --- p.A-11Chapter 9.3.4.1 --- STANDARD MEMORY COMPONENTS --- p.A-12Chapter 9.3.4.2 --- DVSCD AND MDCG COMPONENTS --- p.A-20Chapter 9.3.5 --- MICROPHOTOGRAPH --- p.A-2
An ICT image processing chip based on fast computation algorithm and self-timed circuit technique.
by Johnson, Tin-Chak Pang.Thesis (M.Phil.)--Chinese University of Hong Kong, 1997.Includes bibliographical references.AcknowledgmentsAbstractList of figuresList of tablesChapter 1. --- Introduction --- p.1-1Chapter 1.1 --- Introduction --- p.1-1Chapter 1.2 --- Introduction to asynchronous system --- p.1-5Chapter 1.2.1 --- Motivation --- p.1-5Chapter 1.2.2 --- Hazards --- p.1-7Chapter 1.2.3 --- Classes of Asynchronous circuits --- p.1-8Chapter 1.3 --- Introduction to Transform Coding --- p.1-9Chapter 1.4 --- Organization of the Thesis --- p.1-16Chapter 2. --- Asynchronous Design Methodologies --- p.2-1Chapter 2.1 --- Introduction --- p.2-1Chapter 2.2 --- Self-timed system --- p.2-2Chapter 2.3 --- DCVSL Methodology --- p.2-4Chapter 2.3.1 --- DCVSL gate --- p.2-5Chapter 2.3.2 --- Handshake Control --- p.2-7Chapter 2.4 --- Micropipeline Methodology --- p.2-11Chapter 2.4.1 --- Summary of previous design --- p.2-12Chapter 2.4.2 --- New Micropipeline structure and improvements --- p.2-17Chapter 2.4.2.1 --- Asymmetrical delay --- p.2-20Chapter 2.4.2.2 --- Variable Delay and Delay Value Selection --- p.2-22Chapter 2.5 --- Comparison between DCVSL and Micropipeline --- p.2-25Chapter 3. --- Self-timed Multipliers --- p.3-1Chapter 3.1 --- Introduction --- p.3-1Chapter 3.2 --- Design Example 1 : Bit-serial matrix multiplier --- p.3-3Chapter 3.2.1 --- DCVSL design --- p.3-4Chapter 3.2.2 --- Micropipeline design --- p.3-4Chapter 3.2.3 --- The first test chip --- p.3-5Chapter 3.2.4 --- Second test chip --- p.3-7Chapter 3.3 --- Design Example 2 - Modified Booth's Multiplier --- p.3-9Chapter 3.3.1 --- Circuit Design --- p.3-10Chapter 3.3.2 --- Simulation result --- p.3-12Chapter 3.3.3 --- The third test chip --- p.3-14Chapter 4. --- Current-Sensing Completion Detection --- p.4-1Chapter 4.1 --- Introduction --- p.4-1Chapter 4.2 --- Current-sensor --- p.4-2Chapter 4.2.1 --- Constant current source --- p.4-2Chapter 4.2.2 --- Current mirror --- p.4-4Chapter 4.2.3 --- Current comparator --- p.4-5Chapter 4.3 --- Self-timed logic using CSCD --- p.4-9Chapter 4.4 --- CSCD test chips and testing results --- p.4-10Chapter 4.4.1 --- Test result --- p.4-11Chapter 5. --- Self-timed ICT processor architecture --- p.5-1Chapter 5.1 --- Introduction --- p.5-1Chapter 5.2 --- Comparison of different architecture --- p.5-3Chapter 5.2.1 --- General purpose Digital Signal Processor --- p.5-5Chapter 5.2.1.1 --- Hardware and speed estimation : --- p.5-6Chapter 5.2.2 --- Micropipeline without fast algorithm --- p.5-7Chapter 5.2.2.1 --- Hardware and speed estimation : --- p.5-8Chapter 5.2.3 --- Micropipeline with fast algorithm (I) --- p.5-8Chapter 5.2.3.1 --- Hardware and speed estimation : --- p.5-9Chapter 5.2.4 --- Micropipeline with fast algorithm (II) --- p.5-10Chapter 5.2.4.1 --- Hardware and speed estimation : --- p.5-11Chapter 6. --- Implementation of self-timed ICT processor --- p.6-1Chapter 6.1 --- Introduction --- p.6-1Chapter 6.2 --- Implementation of Self-timed 2-D ICT processor (First version) --- p.6-3Chapter 6.2.1 --- 1-D ICT module --- p.6-4Chapter 6.2.2 --- Self-timed Transpose memory --- p.6-5Chapter 6.2.3 --- Layout Design --- p.6-8Chapter 6.3 --- Implementation of Self-timed 1-D ICT processor with fast algorithm (final version) --- p.6-9Chapter 6.3.1 --- I/O buffers and control units --- p.6-10Chapter 6.3.1.1 --- Input control --- p.6-11Chapter 6.3.1.2 --- Output control --- p.6-12Chapter 6.3.1.2.1 --- Self-timed Computational Block --- p.6-13Chapter 6.3.1.3 --- Handshake Control Unit --- p.6-14Chapter 6.3.1.4 --- Integer Execution Unit (IEU) --- p.6-18Chapter 6.3.1.5 --- Program memory and Instruction decoder --- p.6-20Chapter 6.3.2 --- Layout Design --- p.6-21Chapter 6.4 --- Specifications of the final version self-timed ICT chip --- p.6-22Chapter 7. --- Testing of Self-timed ICT processor --- p.7-1Chapter 7.1 --- Introduction --- p.7-1Chapter 7.2 --- Pin assignment of Self-timed 1 -D ICT chip --- p.7-2Chapter 7.3 --- Simulation --- p.7-3Chapter 7.4 --- Testing of Self-timed 1-D ICT processor --- p.7-5Chapter 7.4.1 --- Functional test --- p.7-5Chapter 7.4.1.1 --- Testing environment and results --- p.7-5Chapter 7.4.2 --- Transient Characteristics --- p.7-7Chapter 7.4.3 --- Comments on speed and power --- p.7-10Chapter 7.4.4 --- Determination of optimum delay control voltage --- p.7-12Chapter 7.5 --- Testing of delay element and other logic cells --- p.7-13Chapter 8. --- Conclusions --- p.8-1BibliographyAppendice
Core interface optimization for multi-core neuromorphic processors
Hardware implementations of Spiking Neural Networks (SNNs) represent a
promising approach to edge-computing for applications that require low-power
and low-latency, and which cannot resort to external cloud-based computing
services. However, most solutions proposed so far either support only
relatively small networks, or take up significant hardware resources, to
implement large networks. To realize large-scale and scalable SNNs it is
necessary to develop an efficient asynchronous communication and routing fabric
that enables the design of multi-core architectures. In particular the core
interface that manages inter-core spike communication is a crucial component as
it represents the bottleneck of Power-Performance-Area (PPA) especially for the
arbitration architecture and the routing memory. In this paper we present an
arbitration mechanism with the corresponding asynchronous encoding pipeline
circuits, based on hierarchical arbiter trees. The proposed scheme reduces the
latency by more than 70% in sparse-event mode, compared to the state-of-the-art
arbitration architectures, with lower area cost. The routing memory makes use
of asynchronous Content Addressable Memory (CAM) with Current Sensing
Completion Detection (CSCD), which saves approximately 46% energy, and achieves
a 40% increase in throughput against conventional asynchronous CAM using
configurable delay lines, at the cost of only a slight increase in area. In
addition as it radically reduces the core interface resources in multi-core
neuromorphic processors, the arbitration architecture and CAM architecture we
propose can be also applied to a wide range of general asynchronous circuits
and systems
Recommended from our members
Methods to improve the reliability and resiliency of near/sub-threshold digital circuits
Energy consumption is one of the primary bottlenecks to both large and small scale modern compute platforms. Reducing the operating voltage of digital circuits to voltages where the supply voltage is near or below the threshold of the transistors has recently gained attention as a method to reduce the energy required for computations by as much as 6 times. However, when operating at near/sub-threshold voltages (where the supply voltage is near or below the threshold of the transistors), imperfections in transistor manufacturing, changes in temperature, and other difficult-to-predict factors cause wide variations in the timing of Complementary Metal-Oxide Semiconductor (CMOS) circuits due to an increased sensitivity at lower voltages. These increased variations result in poor aggregate performance and cause increased rates of error occurrence in computation.
This work introduces several new methods to improve the reliability of near/sub-threshold circuits. The first is a design automation technique that is used to aid in low-voltage digital standard cell synthesis. Second, two circuit-level techniques are also introduced that aim to improve the reliability and resiliency of digital circuits by means of completion/error detection. These techniques are shown to improve speed and lower energy consumption at low overheads compared to previous methods. Most importantly, these circuit-level methods are specifically designed to operate at low voltages and can themselves tolerate variations and operation in harsh environments. Finally, a test-chip prototype designed in 65nm-CMOS demonstrates the practicality and feasibility of a proposed current sensing error detector
Practical advances in asynchronous design
Journal ArticleRecent practical advances in asynchronous circuit and system design have resulted in renewed interest by circuit designers. Asynchronous systems are being viewed as in increasingly viable alternative to globally synchronous system organization. This tutorial will present the current state of the art in asynchronous circuit and system design in three different areas. The first section details asynchronous control systems. The second describes a variety of approaches to asynchronous datapaths. The third section is on asynchronous and self-timed circuits applied to the design of general purpose processors
Dynamically reconfigurable asynchronous processor
The main design requirements for today's mobile applications are:
· high throughput performance.
· high energy efficiency.
· high programmability.
Until now, the choice of platform has often been limited to Application-Specific
Integrated Circuits (ASICs), due to their best-of-breed performance and power
consumption. The economies of scale possible with these high-volume markets have
traditionally been able to hide the high Non-Recurring Engineering (NRE) costs
required for designing and fabricating new ASICs. However, with the NREs and
design time escalating with each generation of mobile applications, this practice may
be reaching its limit.
Designers today are looking at programmable solutions, so that they can respond
more rapidly to changes in the market and spread costs over several generations of
mobile applications. However, there have been few feasible alternatives to ASICs:
Digital Signals Processors (DSPs) and microprocessors cannot meet the throughput
requirements, whereas Field-Programmable Gate Arrays (FPGAs) require too much
area and power.
Coarse-grained dynamically reconfigurable architectures offer better solutions for
high throughput applications, when power and area considerations are taken into
account. One promising example is the Reconfigurable Instruction Cell Array
(RICA). RICA consists of an array of cells with an interconnect that can be
dynamically reconfigured on every cycle. This allows quite complex datapaths to be
rendered onto the fabric and executed in a single configuration - making these
architectures particularly suitable to stream processing. Furthermore, RICA can be
programmed from C, making it a good fit with existing design methodologies.
However the RICA architecture has a drawback: poor scalability in terms of area and
power. As the core gets bigger, the number of sequential elements in the array must
be increased significantly to maintain the ability to achieve high throughputs through
pipelining. As a result, a larger clock tree is required to synchronise the increased
number of sequential elements. The clock tree therefore takes up a larger percentage
of the area and power consumption of the core.
This thesis presents a novel Dynamically Reconfigurable Asynchronous Processor
(DRAP), aimed at high-throughput mobile applications. DRAP is based on the RICA
architecture, but uses asynchronous design techniques - methods of designing digital
systems without clocks. The absence of a global clock signal makes DRAP more
scalable in terms of power and area overhead than its synchronous counterpart.
The DRAP architecture maintains most of the benefits of custom asynchronous
design, whilst also providing programmability via conventional high-level languages.
Results show that the DRAP processor delivers considerably lower power
consumption when compared to a market-leading Very Long Instruction Word
(VLIW) processor and a low-power ARM processor. For example, DRAP resulted in
a reduction in power consumption of 20 times compared to the ARM7 processor, and
29 times compared to the TIC64x VLIW, when running the same benchmark capped
to the same throughput and for the same process technology (0.13μm). When
compared to an equivalent RICA design, DRAP was up to 22% larger than RICA but
resulted in a power reduction of up to 1.9 times. It was also capable of achieving up
to 2.8 times higher throughputs than RICA for the same benchmarks