47,290 research outputs found

    Automated Circuit Approximation Method Driven by Data Distribution

    Full text link
    We propose an application-tailored data-driven fully automated method for functional approximation of combinational circuits. We demonstrate how an application-level error metric such as the classification accuracy can be translated to a component-level error metric needed for an efficient and fast search in the space of approximate low-level components that are used in the application. This is possible by employing a weighted mean error distance (WMED) metric for steering the circuit approximation process which is conducted by means of genetic programming. WMED introduces a set of weights (calculated from the data distribution measured on a selected signal in a given application) determining the importance of each input vector for the approximation process. The method is evaluated using synthetic benchmarks and application-specific approximate MAC (multiply-and-accumulate) units that are designed to provide the best trade-offs between the classification accuracy and power consumption of two image classifiers based on neural networks.Comment: Accepted for publication at Design, Automation and Test in Europe (DATE 2019). Florence, Ital

    AC OPF in Radial Distribution Networks - Parts I,II

    Get PDF
    The optimal power-flow problem (OPF) has played a key role in the planning and operation of power systems. Due to the non-linear nature of the AC power-flow equations, the OPF problem is known to be non-convex, therefore hard to solve. Most proposed methods for solving the OPF rely on approximations that render the problem convex, but that may yield inexact solutions. Recently, Farivar and Low proposed a method that is claimed to be exact for radial distribution systems, despite no apparent approximations. In our work, we show that it is, in fact, not exact. On one hand, there is a misinterpretation of the physical network model related to the ampacity constraint of the lines' current flows. On the other hand, the proof of the exactness of the proposed relaxation requires unrealistic assumptions related to the unboundedness of specific control variables. We also show that the extension of this approach to account for exact line models might provide physically infeasible solutions. Recently, several contributions have proposed OPF algorithms that rely on the use of the alternating-direction method of multipliers (ADMM). However, as we show in this work, there are cases for which the ADMM-based solution of the non-relaxed OPF problem fails to converge. To overcome the aforementioned limitations, we propose an algorithm for the solution of a non-approximated, non-convex OPF problem in radial distribution systems that is based on the method of multipliers, and on a primal decomposition of the OPF. This work is divided in two parts. In Part I, we specifically discuss the limitations of BFM and ADMM to solve the OPF problem. In Part II, we provide a centralized version and a distributed asynchronous version of the proposed OPF algorithm and we evaluate its performances using both small-scale electrical networks, as well as a modified IEEE 13-node test feeder

    Indicating Asynchronous Array Multipliers

    Full text link
    Multiplication is an important arithmetic operation that is frequently encountered in microprocessing and digital signal processing applications, and multiplication is physically realized using a multiplier. This paper discusses the physical implementation of many indicating asynchronous array multipliers, which are inherently elastic and modular and are robust to timing, process and parametric variations. We consider the physical realization of many indicating asynchronous array multipliers using a 32/28nm CMOS technology. The weak-indication array multipliers comprise strong-indication or weak-indication full adders, and strong-indication 2-input AND functions to realize the partial products. The multipliers were synthesized in a semi-custom ASIC design style using standard library cells including a custom-designed 2-input C-element. 4x4 and 8x8 multiplication operations were considered for the physical implementations. The 4-phase return-to-zero (RTZ) and the 4-phase return-to-one (RTO) handshake protocols were utilized for data communication, and the delay-insensitive dual-rail code was used for data encoding. Among several weak-indication array multipliers, a weak-indication array multiplier utilizing a biased weak-indication full adder and the strong-indication 2-input AND function is found to have reduced cycle time and power-cycle time product with respect to RTZ and RTO handshaking for 4x4 and 8x8 multiplications. Further, the 4-phase RTO handshaking is found to be preferable to the 4-phase RTZ handshaking for achieving enhanced optimizations of the design metrics.Comment: arXiv admin note: text overlap with arXiv:1903.0943

    A Scalable Correlator Architecture Based on Modular FPGA Hardware, Reuseable Gateware, and Data Packetization

    Full text link
    A new generation of radio telescopes is achieving unprecedented levels of sensitivity and resolution, as well as increased agility and field-of-view, by employing high-performance digital signal processing hardware to phase and correlate large numbers of antennas. The computational demands of these imaging systems scale in proportion to BMN^2, where B is the signal bandwidth, M is the number of independent beams, and N is the number of antennas. The specifications of many new arrays lead to demands in excess of tens of PetaOps per second. To meet this challenge, we have developed a general purpose correlator architecture using standard 10-Gbit Ethernet switches to pass data between flexible hardware modules containing Field Programmable Gate Array (FPGA) chips. These chips are programmed using open-source signal processing libraries we have developed to be flexible, scalable, and chip-independent. This work reduces the time and cost of implementing a wide range of signal processing systems, with correlators foremost among them,and facilitates upgrading to new generations of processing technology. We present several correlator deployments, including a 16-antenna, 200-MHz bandwidth, 4-bit, full Stokes parameter application deployed on the Precision Array for Probing the Epoch of Reionization.Comment: Accepted to Publications of the Astronomy Society of the Pacific. 31 pages. v2: corrected typo, v3: corrected Fig. 1

    autoAx: An Automatic Design Space Exploration and Circuit Building Methodology utilizing Libraries of Approximate Components

    Full text link
    Approximate computing is an emerging paradigm for developing highly energy-efficient computing systems such as various accelerators. In the literature, many libraries of elementary approximate circuits have already been proposed to simplify the design process of approximate accelerators. Because these libraries contain from tens to thousands of approximate implementations for a single arithmetic operation it is intractable to find an optimal combination of approximate circuits in the library even for an application consisting of a few operations. An open problem is "how to effectively combine circuits from these libraries to construct complex approximate accelerators". This paper proposes a novel methodology for searching, selecting and combining the most suitable approximate circuits from a set of available libraries to generate an approximate accelerator for a given application. To enable fast design space generation and exploration, the methodology utilizes machine learning techniques to create computational models estimating the overall quality of processing and hardware cost without performing full synthesis at the accelerator level. Using the methodology, we construct hundreds of approximate accelerators (for a Sobel edge detector) showing different but relevant tradeoffs between the quality of processing and hardware cost and identify a corresponding Pareto-frontier. Furthermore, when searching for approximate implementations of a generic Gaussian filter consisting of 17 arithmetic operations, the proposed approach allows us to identify approximately 10310^3 highly important implementations from 102310^{23} possible solutions in a few hours, while the exhaustive search would take four months on a high-end processor.Comment: Accepted for publication at the Design Automation Conference 2019 (DAC'19), Las Vegas, Nevada, US
    • …
    corecore