607 research outputs found

    Generalized disjunction decomposition for evolvable hardware

    Get PDF
    Evolvable hardware (EHW) refers to self-reconfiguration hardware design, where the configuration is under the control of an evolutionary algorithm (EA). One of the main difficulties in using EHW to solve real-world problems is scalability, which limits the size of the circuit that may be evolved. This paper outlines a new type of decomposition strategy for EHW, the “generalized disjunction decomposition” (GDD), which allows the evolution of large circuits. The proposed method has been extensively tested, not only with multipliers and parity bit problems traditionally used in the EHW community, but also with logic circuits taken from the Microelectronics Center of North Carolina (MCNC) benchmark library and randomly generated circuits. In order to achieve statistically relevant results, each analyzed logic circuit has been evolved 100 times, and the average of these results is presented and compared with other EHW techniques. This approach is necessary because of the probabilistic nature of EA; the same logic circuit may not be solved in the same way if tested several times. The proposed method has been examined in an extrinsic EHW system using the(1+lambda)(1 + lambda)evolution strategy. The results obtained demonstrate that GDD significantly improves the evolution of logic circuits in terms of the number of generations, reduces computational time as it is able to reduce the required time for a single iteration of the EA, and enables the evolution of larger circuits never before evolved. In addition to the proposed method, a short overview of EHW systems together with the most recent applications in electrical circuit design is provided

    Automated Circuit Approximation Method Driven by Data Distribution

    Full text link
    We propose an application-tailored data-driven fully automated method for functional approximation of combinational circuits. We demonstrate how an application-level error metric such as the classification accuracy can be translated to a component-level error metric needed for an efficient and fast search in the space of approximate low-level components that are used in the application. This is possible by employing a weighted mean error distance (WMED) metric for steering the circuit approximation process which is conducted by means of genetic programming. WMED introduces a set of weights (calculated from the data distribution measured on a selected signal in a given application) determining the importance of each input vector for the approximation process. The method is evaluated using synthetic benchmarks and application-specific approximate MAC (multiply-and-accumulate) units that are designed to provide the best trade-offs between the classification accuracy and power consumption of two image classifiers based on neural networks.Comment: Accepted for publication at Design, Automation and Test in Europe (DATE 2019). Florence, Ital

    High-Performance Accurate and Approximate Multipliers for FPGA-Based Hardware Accelerators

    Get PDF
    Multiplication is one of the widely used arithmetic operations in a variety of applications, such as image/video processing and machine learning. FPGA vendors provide high-performance multipliers in the form of DSP blocks. These multipliers are not only limited in number and have fixed locations on FPGAs but can also create additional routing delays and may prove inefficient for smaller bit-width multiplications. Therefore, FPGA vendors additionally provide optimized soft IP cores for multiplication. However, in this work, we advocate that these soft multiplier IP cores for FPGAs still need better designs to provide high-performance and resource efficiency. Toward this, we present generic area-optimized, low-latency accurate, and approximate softcore multiplier architectures, which exploit the underlying architectural features of FPGAs, i.e., lookup table (LUT) structures and fast-carry chains to reduce the overall critical path delay (CPD) and resource utilization of multipliers. Compared to Xilinx multiplier LogiCORE IP, our proposed unsigned and signed accurate architecture provides up to 25% and 53% reduction in LUT utilization, respectively, for different sizes of multipliers. Moreover, with our unsigned approximate multiplier architectures, a reduction of up to 51% in the CPD can be achieved with an insignificant loss in output accuracy when compared with the LogiCORE IP. For illustration, we have deployed the proposed multiplier architecture in accelerators used in image and video applications, and evaluated them for area and performance gains. Our library of accurate and approximate multipliers is opensource and available online at https://cfaed.tu-dresden.de/pd-downloads to fuel further research and development in this area, facilitate reproducible research, and thereby enabling a new research direction for the FPGA community

    KAVUAKA: a low-power application-specific processor architecture for digital hearing aids

    Get PDF
    The power consumption of digital hearing aids is very restricted due to their small physical size and the available hardware resources for signal processing are limited. However, there is a demand for more processing performance to make future hearing aids more useful and smarter. Future hearing aids should be able to detect, localize, and recognize target speakers in complex acoustic environments to further improve the speech intelligibility of the individual hearing aid user. Computationally intensive algorithms are required for this task. To maintain acceptable battery life, the hearing aid processing architecture must be highly optimized for extremely low-power consumption and high processing performance.The integration of application-specific instruction-set processors (ASIPs) into hearing aids enables a wide range of architectural customizations to meet the stringent power consumption and performance requirements. In this thesis, the application-specific hearing aid processor KAVUAKA is presented, which is customized and optimized with state-of-the-art hearing aid algorithms such as speaker localization, noise reduction, beamforming algorithms, and speech recognition. Specialized and application-specific instructions are designed and added to the baseline instruction set architecture (ISA). Among the major contributions are a multiply-accumulate (MAC) unit for real- and complex-valued numbers, architectures for power reduction during register accesses, co-processors and a low-latency audio interface. With the proposed MAC architecture, the KAVUAKA processor requires 16 % less cycles for the computation of a 128-point fast Fourier transform (FFT) compared to related programmable digital signal processors. The power consumption during register file accesses is decreased by 6 %to 17 % with isolation and by-pass techniques. The hardware-induced audio latency is 34 %lower compared to related audio interfaces for frame size of 64 samples.The final hearing aid system-on-chip (SoC) with four KAVUAKA processor cores and ten co-processors is integrated as an application-specific integrated circuit (ASIC) using a 40 nm low-power technology. The die size is 3.6 mm2. Each of the processors and co-processors contains individual customizations and hardware features with a varying datapath width between 24-bit to 64-bit. The core area of the 64-bit processor configuration is 0.134 mm2. The processors are organized in two clusters that share memory, an audio interface, co-processors and serial interfaces. The average power consumption at a clock speed of 10 MHz is 2.4 mW for SoC and 0.6 mW for the 64-bit processor.Case studies with four reference hearing aid algorithms are used to present and evaluate the proposed hardware architectures and optimizations. The program code for each processor and co-processor is generated and optimized with evolutionary algorithms for operation merging,instruction scheduling and register allocation. The KAVUAKA processor architecture is com-pared to related processor architectures in terms of processing performance, average power consumption, and silicon area requirements

    Radiation Mitigation and Power Optimization Design Tools for Reconfigurable Hardware in Orbit

    Get PDF
    The Reconfigurable Hardware in Orbit (RHinO)project is focused on creating a set of design tools that facilitate and automate design techniques for reconfigurable computing in space, using SRAM-based field-programmable-gate-array (FPGA) technology. In the second year of the project, design tools that leverage an established FPGA design environment have been created to visualize and analyze an FPGA circuit for radiation weaknesses and power inefficiencies. For radiation, a single event Upset (SEU) emulator, persistence analysis tool, and a half-latch removal tool for Xilinx/Virtex-II devices have been created. Research is underway on a persistence mitigation tool and multiple bit upsets (MBU) studies. For power, synthesis level dynamic power visualization and analysis tools have been completed. Power optimization tools are under development and preliminary test results are positive

    Implementation of Block-based Neural Networks on Reconfigurable Computing Platforms

    Get PDF
    Block-based Neural Networks (BbNNs) provide a flexible and modular architecture to support adaptive applications in dynamic environments. Reconfigurable computing (RC) platforms provide computational efficiency combined with flexibility. Hence, RC provides an ideal match to evolvable BbNN applications. BbNNs are very convenient to build once a library of neural network blocks is built. This library-based approach for the design of BbNNs is extremely useful to automate implementations of BbNNs and evaluate their performance on RC platforms. This is important because, for a given application there may be hundreds to thousands of candidate BbNN implementations possible and evaluating each of them for accuracy and performance, using software simulations will take a very long time, which would not be acceptable for adaptive environments. This thesis focuses on the development and characterization of a library of parameterized VHDL models of neural network blocks, which may be used to build any BbNN. The use of these models is demonstrated in the XOR pattern classification problem and mobile robot navigation problem. For a given application, one may be interested in fabricating an ASIC, once the weights and architecture of the BbNN is decided. Pointers to ASIC implementation of BbNNs with initial results are also included in this thesis
    corecore