4,986 research outputs found
Indicating Asynchronous Array Multipliers
Multiplication is an important arithmetic operation that is frequently
encountered in microprocessing and digital signal processing applications, and
multiplication is physically realized using a multiplier. This paper discusses
the physical implementation of many indicating asynchronous array multipliers,
which are inherently elastic and modular and are robust to timing, process and
parametric variations. We consider the physical realization of many indicating
asynchronous array multipliers using a 32/28nm CMOS technology. The
weak-indication array multipliers comprise strong-indication or weak-indication
full adders, and strong-indication 2-input AND functions to realize the partial
products. The multipliers were synthesized in a semi-custom ASIC design style
using standard library cells including a custom-designed 2-input C-element. 4x4
and 8x8 multiplication operations were considered for the physical
implementations. The 4-phase return-to-zero (RTZ) and the 4-phase return-to-one
(RTO) handshake protocols were utilized for data communication, and the
delay-insensitive dual-rail code was used for data encoding. Among several
weak-indication array multipliers, a weak-indication array multiplier utilizing
a biased weak-indication full adder and the strong-indication 2-input AND
function is found to have reduced cycle time and power-cycle time product with
respect to RTZ and RTO handshaking for 4x4 and 8x8 multiplications. Further,
the 4-phase RTO handshaking is found to be preferable to the 4-phase RTZ
handshaking for achieving enhanced optimizations of the design metrics.Comment: arXiv admin note: text overlap with arXiv:1903.0943
Frame Theory for Signal Processing in Psychoacoustics
This review chapter aims to strengthen the link between frame theory and
signal processing tasks in psychoacoustics. On the one side, the basic concepts
of frame theory are presented and some proofs are provided to explain those
concepts in some detail. The goal is to reveal to hearing scientists how this
mathematical theory could be relevant for their research. In particular, we
focus on frame theory in a filter bank approach, which is probably the most
relevant view-point for audio signal processing. On the other side, basic
psychoacoustic concepts are presented to stimulate mathematicians to apply
their knowledge in this field
Automated Circuit Approximation Method Driven by Data Distribution
We propose an application-tailored data-driven fully automated method for
functional approximation of combinational circuits. We demonstrate how an
application-level error metric such as the classification accuracy can be
translated to a component-level error metric needed for an efficient and fast
search in the space of approximate low-level components that are used in the
application. This is possible by employing a weighted mean error distance
(WMED) metric for steering the circuit approximation process which is conducted
by means of genetic programming. WMED introduces a set of weights (calculated
from the data distribution measured on a selected signal in a given
application) determining the importance of each input vector for the
approximation process. The method is evaluated using synthetic benchmarks and
application-specific approximate MAC (multiply-and-accumulate) units that are
designed to provide the best trade-offs between the classification accuracy and
power consumption of two image classifiers based on neural networks.Comment: Accepted for publication at Design, Automation and Test in Europe
(DATE 2019). Florence, Ital
Kinematically optimal hyper-redundant manipulator configurations
“Hyper-redundant” robots have a very large or infinite degree of kinematic redundancy. This paper develops new methods for determining “optimal” hyper-redundant manipulator configurations based on a continuum formulation of kinematics. This formulation uses a backbone curve model to capture the robot's essential macroscopic geometric features. The calculus of variations is used to develop differential equations, whose solution is the optimal backbone curve shape. We show that this approach is computationally efficient on a single processor, and generates solutions in O(1) time for an N degree-of-freedom manipulator when implemented in parallel on O(N) processors. For this reason, it is better suited to hyper-redundant robots than other redundancy resolution methods. Furthermore, this approach is useful for many hyper-redundant mechanical morphologies which are not handled by known methods
Design of approximate overclocked datapath
Embedded applications can often demand stringent latency requirements. While high degrees of parallelism within custom FPGA-based accelerators may help to some extent, it may also be necessary to limit the precision used in the datapath to boost the operating frequency of the implementation. However, by reducing the precision, the engineer introduces quantisation error into the design.
In this thesis, we describe an alternative circuit design methodology when considering trade-offs between accuracy, performance and silicon area. We compare two different approaches that could trade accuracy for performance. One is the traditional approach where the precision used in the datapath is limited to meet a target latency. The other is a proposed new approach which simply allows the datapath to operate without timing closure. We demonstrate analytically and experimentally that for many applications it would be preferable to simply overclock the design and accept that timing violations may arise. Since the errors introduced by timing violations occur rarely, they will cause less noise than quantisation errors.
Furthermore, we show that conventional forms of computer arithmetic do not fail gracefully when pushed beyond the deterministic clocking region. In this thesis we take a fresh look at Online Arithmetic, originally proposed for digit serial operation, and synthesize unrolled digit parallel online arithmetic operators to allow for graceful degradation. We quantify the impact of timing violations on key arithmetic primitives, and show that substantial performance benefits can be obtained in comparison to binary arithmetic. Since timing errors are caused by long carry chains, these result in errors in least significant digits with online arithmetic, causing less impact than conventional implementations.Open Acces
A high-performance inner-product processor for real and complex numbers.
A novel, high-performance fixed-point inner-product processor based on a redundant binary number system is investigated in this dissertation. This scheme decreases the number of partial products to 50%, while achieving better speed and area performance, as well as providing pipeline extension opportunities. When modified Booth coding is used, partial products are reduced by almost 75%, thereby significantly reducing the multiplier addition depth. The design is applicable for digital signal and image processing applications that require real and/or complex numbers inner-product arithmetic, such as digital filters, correlation and convolution. This design is well suited for VLSI implementation and can also be embedded as an inner-product core inside a general purpose or DSP FPGA-based processor. Dynamic control of the computing structure permits different computations, such as a variety of inner-product real and complex number computations, parallel multiplication for real and complex numbers, and real and complex number division. The same structure can also be controlled to accept redundant binary number inputs for multiplication and inner-product computations. An improved 2's-complement to redundant binary converter is also presented
Approximate Early Output Asynchronous Adders Based on Dual-Rail Data Encoding and 4-Phase Return-to-Zero and Return-to-One Handshaking
Approximate computing is emerging as an alternative to accurate computing due
to its potential for realizing digital circuits and systems with low power
dissipation, less critical path delay, and less area occupancy for an
acceptable trade-off in the accuracy of results. In the domain of computer
arithmetic, several approximate adders and multipliers have been designed and
their potential have been showcased versus accurate adders and multipliers for
practical digital signal processing applications. Nevertheless, in the existing
literature, almost all the approximate adders and multipliers reported
correspond to the synchronous design method. In this work, we consider robust
asynchronous i.e. quasi-delay-insensitive realizations of approximate adders by
employing delay-insensitive codes for data representation and processing, and
the 4-phase handshake protocols for data communication. The 4-phase handshake
protocols used are the return-to-zero and the return-to-one protocols.
Specifically, we consider the implementations of 32-bit approximate adders
based on the return-to-zero and return-to-one handshake protocols by adopting
the delay-insensitive dual-rail code for data encoding. We consider a range of
approximations varying from 4-bits to 20-bits for the least significant
positions of the accurate 32-bit asynchronous adder. The asynchronous adders
correspond to early output (i.e. early reset) type, which are based on the
well-known ripple carry adder architecture. The experimental results show that
approximate asynchronous adders achieve reductions in the design metrics such
as latency, cycle time, average power dissipation, and silicon area compared to
the accurate asynchronous adders. Further, the reductions in the design metrics
are greater for the return-to-one protocol compared to the return-to-zero
protocol. The design metrics were estimated using a 32/28nm CMOS technology.Comment: arXiv admin note: text overlap with arXiv:1711.0233
- …