47,290 research outputs found
Automated Circuit Approximation Method Driven by Data Distribution
We propose an application-tailored data-driven fully automated method for
functional approximation of combinational circuits. We demonstrate how an
application-level error metric such as the classification accuracy can be
translated to a component-level error metric needed for an efficient and fast
search in the space of approximate low-level components that are used in the
application. This is possible by employing a weighted mean error distance
(WMED) metric for steering the circuit approximation process which is conducted
by means of genetic programming. WMED introduces a set of weights (calculated
from the data distribution measured on a selected signal in a given
application) determining the importance of each input vector for the
approximation process. The method is evaluated using synthetic benchmarks and
application-specific approximate MAC (multiply-and-accumulate) units that are
designed to provide the best trade-offs between the classification accuracy and
power consumption of two image classifiers based on neural networks.Comment: Accepted for publication at Design, Automation and Test in Europe
(DATE 2019). Florence, Ital
AC OPF in Radial Distribution Networks - Parts I,II
The optimal power-flow problem (OPF) has played a key role in the planning
and operation of power systems. Due to the non-linear nature of the AC
power-flow equations, the OPF problem is known to be non-convex, therefore hard
to solve. Most proposed methods for solving the OPF rely on approximations that
render the problem convex, but that may yield inexact solutions. Recently,
Farivar and Low proposed a method that is claimed to be exact for radial
distribution systems, despite no apparent approximations. In our work, we show
that it is, in fact, not exact. On one hand, there is a misinterpretation of
the physical network model related to the ampacity constraint of the lines'
current flows. On the other hand, the proof of the exactness of the proposed
relaxation requires unrealistic assumptions related to the unboundedness of
specific control variables. We also show that the extension of this approach to
account for exact line models might provide physically infeasible solutions.
Recently, several contributions have proposed OPF algorithms that rely on the
use of the alternating-direction method of multipliers (ADMM). However, as we
show in this work, there are cases for which the ADMM-based solution of the
non-relaxed OPF problem fails to converge. To overcome the aforementioned
limitations, we propose an algorithm for the solution of a non-approximated,
non-convex OPF problem in radial distribution systems that is based on the
method of multipliers, and on a primal decomposition of the OPF. This work is
divided in two parts. In Part I, we specifically discuss the limitations of BFM
and ADMM to solve the OPF problem. In Part II, we provide a centralized version
and a distributed asynchronous version of the proposed OPF algorithm and we
evaluate its performances using both small-scale electrical networks, as well
as a modified IEEE 13-node test feeder
Indicating Asynchronous Array Multipliers
Multiplication is an important arithmetic operation that is frequently
encountered in microprocessing and digital signal processing applications, and
multiplication is physically realized using a multiplier. This paper discusses
the physical implementation of many indicating asynchronous array multipliers,
which are inherently elastic and modular and are robust to timing, process and
parametric variations. We consider the physical realization of many indicating
asynchronous array multipliers using a 32/28nm CMOS technology. The
weak-indication array multipliers comprise strong-indication or weak-indication
full adders, and strong-indication 2-input AND functions to realize the partial
products. The multipliers were synthesized in a semi-custom ASIC design style
using standard library cells including a custom-designed 2-input C-element. 4x4
and 8x8 multiplication operations were considered for the physical
implementations. The 4-phase return-to-zero (RTZ) and the 4-phase return-to-one
(RTO) handshake protocols were utilized for data communication, and the
delay-insensitive dual-rail code was used for data encoding. Among several
weak-indication array multipliers, a weak-indication array multiplier utilizing
a biased weak-indication full adder and the strong-indication 2-input AND
function is found to have reduced cycle time and power-cycle time product with
respect to RTZ and RTO handshaking for 4x4 and 8x8 multiplications. Further,
the 4-phase RTO handshaking is found to be preferable to the 4-phase RTZ
handshaking for achieving enhanced optimizations of the design metrics.Comment: arXiv admin note: text overlap with arXiv:1903.0943
A Scalable Correlator Architecture Based on Modular FPGA Hardware, Reuseable Gateware, and Data Packetization
A new generation of radio telescopes is achieving unprecedented levels of
sensitivity and resolution, as well as increased agility and field-of-view, by
employing high-performance digital signal processing hardware to phase and
correlate large numbers of antennas. The computational demands of these imaging
systems scale in proportion to BMN^2, where B is the signal bandwidth, M is the
number of independent beams, and N is the number of antennas. The
specifications of many new arrays lead to demands in excess of tens of PetaOps
per second.
To meet this challenge, we have developed a general purpose correlator
architecture using standard 10-Gbit Ethernet switches to pass data between
flexible hardware modules containing Field Programmable Gate Array (FPGA)
chips. These chips are programmed using open-source signal processing libraries
we have developed to be flexible, scalable, and chip-independent. This work
reduces the time and cost of implementing a wide range of signal processing
systems, with correlators foremost among them,and facilitates upgrading to new
generations of processing technology. We present several correlator
deployments, including a 16-antenna, 200-MHz bandwidth, 4-bit, full Stokes
parameter application deployed on the Precision Array for Probing the Epoch of
Reionization.Comment: Accepted to Publications of the Astronomy Society of the Pacific. 31
pages. v2: corrected typo, v3: corrected Fig. 1
autoAx: An Automatic Design Space Exploration and Circuit Building Methodology utilizing Libraries of Approximate Components
Approximate computing is an emerging paradigm for developing highly
energy-efficient computing systems such as various accelerators. In the
literature, many libraries of elementary approximate circuits have already been
proposed to simplify the design process of approximate accelerators. Because
these libraries contain from tens to thousands of approximate implementations
for a single arithmetic operation it is intractable to find an optimal
combination of approximate circuits in the library even for an application
consisting of a few operations. An open problem is "how to effectively combine
circuits from these libraries to construct complex approximate accelerators".
This paper proposes a novel methodology for searching, selecting and combining
the most suitable approximate circuits from a set of available libraries to
generate an approximate accelerator for a given application. To enable fast
design space generation and exploration, the methodology utilizes machine
learning techniques to create computational models estimating the overall
quality of processing and hardware cost without performing full synthesis at
the accelerator level. Using the methodology, we construct hundreds of
approximate accelerators (for a Sobel edge detector) showing different but
relevant tradeoffs between the quality of processing and hardware cost and
identify a corresponding Pareto-frontier. Furthermore, when searching for
approximate implementations of a generic Gaussian filter consisting of 17
arithmetic operations, the proposed approach allows us to identify
approximately highly important implementations from possible
solutions in a few hours, while the exhaustive search would take four months on
a high-end processor.Comment: Accepted for publication at the Design Automation Conference 2019
(DAC'19), Las Vegas, Nevada, US
- …