261 research outputs found
Latency Optimized Asynchronous Early Output Ripple Carry Adder based on Delay-Insensitive Dual-Rail Data Encoding
Asynchronous circuits employing delay-insensitive codes for data
representation i.e. encoding and following a 4-phase return-to-zero protocol
for handshaking are generally robust. Depending upon whether a single
delay-insensitive code or multiple delay-insensitive code(s) are used for data
encoding, the encoding scheme is called homogeneous or heterogeneous
delay-insensitive data encoding. This article proposes a new latency optimized
early output asynchronous ripple carry adder (RCA) that utilizes single-bit
asynchronous full adders (SAFAs) and dual-bit asynchronous full adders (DAFAs)
which incorporate redundant logic and are based on the delay-insensitive
dual-rail code i.e. homogeneous data encoding, and follow a 4-phase
return-to-zero handshaking. Amongst various RCA, carry lookahead adder (CLA),
and carry select adder (CSLA) designs, which are based on homogeneous or
heterogeneous delay-insensitive data encodings which correspond to the
weak-indication or the early output timing model, the proposed early output
asynchronous RCA that incorporates SAFAs and DAFAs with redundant logic is
found to result in reduced latency for a dual-operand addition operation. In
particular, for a 32-bit asynchronous RCA, utilizing 15 stages of DAFAs and 2
stages of SAFAs leads to reduced latency. The theoretical worst-case latencies
of the different asynchronous adders were calculated by taking into account the
typical gate delays of a 32/28nm CMOS digital cell library, and a comparison is
made with their practical worst-case latencies estimated. The theoretical and
practical worst-case latencies show a close correlation....Comment: arXiv admin note: text overlap with arXiv:1704.0761
Asynchronous Early Output Dual-Bit Full Adders Based on Homogeneous and Heterogeneous Delay-Insensitive Data Encoding
This paper presents the designs of asynchronous early output dual-bit full
adders without and with redundant logic (implicit) corresponding to homogeneous
and heterogeneous delay-insensitive data encoding. For homogeneous
delay-insensitive data encoding only dual-rail i.e. 1-of-2 code is used, and
for heterogeneous delay-insensitive data encoding 1-of-2 and 1-of-4 codes are
used. The 4-phase return-to-zero protocol is used for handshaking. To
demonstrate the merits of the proposed dual-bit full adder designs, 32-bit
ripple carry adders (RCAs) are constructed comprising dual-bit full adders. The
proposed dual-bit full adders based 32-bit RCAs incorporating redundant logic
feature reduced latency and area compared to their non-redundant counterparts
with no accompanying power penalty. In comparison with the weakly indicating
32-bit RCA constructed using homogeneously encoded dual-bit full adders
containing redundant logic, the early output 32-bit RCA comprising the proposed
homogeneously encoded dual-bit full adders with redundant logic reports
corresponding reductions in latency and area by 22.2% and 15.1% with no
associated power penalty. On the other hand, the early output 32-bit RCA
constructed using the proposed heterogeneously encoded dual-bit full adder
which incorporates redundant logic reports respective decreases in latency and
area than the weakly indicating 32-bit RCA that consists of heterogeneously
encoded dual-bit full adders with redundant logic by 21.5% and 21.3% with nil
power overhead. The simulation results obtained are based on a 32/28nm CMOS
process technology
Reliable Low-Latency and Low-Complexity Viterbi Architectures Benchmarked on ASIC and FPGA
The Viterbi algorithm is commonly applied in a number of sensitive usage models including decoding convolutional codes used in communications such as satellite communication, cellular relay, and wireless local area networks. Moreover, the algorithm has been applied to automatic speech recognition and storage devices. In this thesis, efficient error detection schemes for architectures based on low-latency, low-complexity Viterbi decoders are presented. The merit of the proposed schemes is that reliability requirements, overhead tolerance, and performance degradation limits are embedded in the structures and can be adapted accordingly. We also present three variants of recomputing with encoded operands and its modifications to detect both transient and permanent faults, coupled with signature-based schemes. The instrumented decoder architecture has been subjected to extensive error detection assessments through simulations, and application-specific integrated circuit (ASIC) [32nm library] and field-programmable gate array (FPGA) [Xilinx Virtex-6 family] implementations for benchmark. The proposed fine-grained approaches can be utilized based on reliability objectives and performance/implementation metrics degradation tolerance
Reliable Hardware Architectures of CORDIC Algorithm with Fixed Angle of Rotations
Fixed-angle rotation operation of vectors is widely used in signal processing, graphics, and robotics. Various optimized coordinate rotation digital computer (CORDIC) designs have been proposed for uniform rotation of vectors through known and specified angles. Nevertheless, in the presence of faults, such hardware architectures are potentially vulnerable. In this thesis, we propose efficient error detection schemes for two fixed-angle rotation designs, i.e., the Interleaved Scaling and Cascaded Single-rotation CORDIC. To the best of our knowledge, this work is the first in providing reliable architectures for these variants of CORDIC. The former is suitable for low-area applications and, hence, we propose recomputing with encoded operands schemes which add negligible area overhead to the designs. Moreover, the proposed error detection schemes for the latter variant are optimized for efficient applications which hamper the performance of the architectures negligibly. We present three variants of recomputing with encoded operands to detect both transient and permanent faults, coupled with signature-based schemes. The overheads of the proposed designs are assessed through Xilinx FPGA implementations and their effectiveness is benchmarked through error simulations. The results give confidence for the proposed efficient architectures which can be tailored based on the reliability requirements and the overhead to be tolerated
Area/latency optimized early output asynchronous full adders and relative-timed ripple carry adders
This article presents two area/latency optimized gate level asynchronous full
adder designs which correspond to early output logic. The proposed full adders
are constructed using the delay-insensitive dual-rail code and adhere to the
four-phase return-to-zero handshaking. For an asynchronous ripple carry adder
(RCA) constructed using the proposed early output full adders, the
relative-timing assumption becomes necessary and the inherent advantages of the
relative-timed RCA are: (1) computation with valid inputs, i.e., forward
latency is data-dependent, and (2) computation with spacer inputs involves a
bare minimum constant reverse latency of just one full adder delay, thus
resulting in the optimal cycle time. With respect to different 32-bit RCA
implementations, and in comparison with the optimized strong-indication,
weak-indication, and early output full adder designs, one of the proposed early
output full adders achieves respective reductions in latency by 67.8, 12.3 and
6.1 %, while the other proposed early output full adder achieves corresponding
reductions in area by 32.6, 24.6 and 6.9 %, with practically no power penalty.
Further, the proposed early output full adders based asynchronous RCAs enable
minimum reductions in cycle time by 83.4, 15, and 8.8 % when considering
carry-propagation over the entire RCA width of 32-bits, and maximum reductions
in cycle time by 97.5, 27.4, and 22.4 % for the consideration of a typical
carry chain length of 4 full adder stages, when compared to the least of the
cycle time estimates of various strong-indication, weak-indication, and early
output asynchronous RCAs of similar size. All the asynchronous full adders and
RCAs were realized using standard cells in a semi-custom design fashion based
on a 32/28 nm CMOS process technology
Efficient modular arithmetic units for low power cryptographic applications
The demand for high security in energy constrained devices such as mobiles and PDAs is growing rapidly. This leads to the need for efficient design of cryptographic algorithms which offer data integrity, authentication, non-repudiation and confidentiality of the encrypted data and communication channels. The public key cryptography is an ideal choice for data integrity, authentication and non-repudiation whereas the private key cryptography ensures the confidentiality of the data transmitted. The latter has an extremely high encryption speed but it has certain limitations which make it unsuitable for use in certain applications. Numerous public key cryptographic algorithms are available in the literature which comprise modular arithmetic modules such as modular addition, multiplication, inversion and exponentiation. Recently, numerous cryptographic algorithms have been proposed based on modular arithmetic which are scalable, do word based operations and efficient in various aspects. The modular arithmetic modules play a crucial role in the overall performance of the cryptographic processor. Hence, better results can be obtained by designing efficient arithmetic modules such as modular addition, multiplication, exponentiation and squaring. This thesis is organized into three papers, describes the efficient implementation of modular arithmetic units, application of these modules in International Data Encryption Algorithm (IDEA). Second paper describes the IDEA algorithm implementation using the existing techniques and using the proposed efficient modular units. The third paper describes the fault tolerant design of a modular unit which has online self-checking capability --Abstract, page iv
- …