1,134 research outputs found

    An efficient implementation of lattice-ladder multilayer perceptrons in field programmable gate arrays

    Get PDF
    The implementation efficiency of electronic systems is a combination of conflicting requirements, as increasing volumes of computations, accelerating the exchange of data, at the same time increasing energy consumption forcing the researchers not only to optimize the algorithm, but also to quickly implement in a specialized hardware. Therefore in this work, the problem of efficient and straightforward implementation of operating in a real-time electronic intelligent systems on field-programmable gate array (FPGA) is tackled. The object of research is specialized FPGA intellectual property (IP) cores that operate in a real-time. In the thesis the following main aspects of the research object are investigated: implementation criteria and techniques. The aim of the thesis is to optimize the FPGA implementation process of selected class dynamic artificial neural networks. In order to solve stated problem and reach the goal following main tasks of the thesis are formulated: rationalize the selection of a class of Lattice-Ladder Multi-Layer Perceptron (LLMLP) and its electronic intelligent system test-bed – a speaker dependent Lithuanian speech recognizer, to be created and investigated; develop dedicated technique for implementation of LLMLP class on FPGA that is based on specialized efficiency criteria for a circuitry synthesis; develop and experimentally affirm the efficiency of optimized FPGA IP cores used in Lithuanian speech recognizer. The dissertation contains: introduction, four chapters and general conclusions. The first chapter reveals the fundamental knowledge on computer-aideddesign, artificial neural networks and speech recognition implementation on FPGA. In the second chapter the efficiency criteria and technique of LLMLP IP cores implementation are proposed in order to make multi-objective optimization of throughput, LLMLP complexity and resource utilization. The data flow graphs are applied for optimization of LLMLP computations. The optimized neuron processing element is proposed. The IP cores for features extraction and comparison are developed for Lithuanian speech recognizer and analyzed in third chapter. The fourth chapter is devoted for experimental verification of developed numerous LLMLP IP cores. The experiments of isolated word recognition accuracy and speed for different speakers, signal to noise ratios, features extraction and accelerated comparison methods were performed. The main results of the thesis were published in 12 scientific publications: eight of them were printed in peer-reviewed scientific journals, four of them in a Thomson Reuters Web of Science database, four articles – in conference proceedings. The results were presented in 17 scientific conferences

    Impact of the hardened floating-point cores on HIL technology

    Full text link
    The Hardware-In-the-Loop (HIL) technique is increasingly used for testing power electronics. FPGAs (Field-Programmable Gate Array) are becoming usual in this kind of emulation due to their acceleration capabilities. But even using FPGAs, it has not been possible to reach real time simulations when small integration steps are necessary (around 100 ns or lower) if floating-point representation is used. Fixed-point has been the solution, but at a high design effort cost. With the release of FPGAs with HFP (Hardened Floating-Point) cores – dedicated floating-point blocks implemented in silicon – the minimum achievable simulation step decreases significantly. This paper presents a comparison between HFP cores, floating-point in programmable logic and fixed-point for HIL models. Results show that both HFP-based and fixed-point arithmetic achieve a simulation step around 10 ns for a full-bridge converter model. A comparison regarding resolution and accuracy is also presented, because acceleration is not the only issue when decreasing the integration step. Numerical resolution also plays an important role, and 32-bit floating-point representation finds a double barrier: acceleration marked by technology, and numerical resolution. Both are explored in this paperThis work has been supported by the Spanish Ministerio deEconomía y Competitividad under project TEC2013-43017-

    Field Programmable Gate Arrays (FPGAs) II

    Get PDF
    This Edited Volume Field Programmable Gate Arrays (FPGAs) II is a collection of reviewed and relevant research chapters, offering a comprehensive overview of recent developments in the field of Computer and Information Science. The book comprises single chapters authored by various researchers and edited by an expert active in the Computer and Information Science research area. All chapters are complete in itself but united under a common research study topic. This publication aims at providing a thorough overview of the latest research efforts by international authors on Computer and Information Science, and open new possible research paths for further novel developments

    The 1991 3rd NASA Symposium on VLSI Design

    Get PDF
    Papers from the symposium are presented from the following sessions: (1) featured presentations 1; (2) very large scale integration (VLSI) circuit design; (3) VLSI architecture 1; (4) featured presentations 2; (5) neural networks; (6) VLSI architectures 2; (7) featured presentations 3; (8) verification 1; (9) analog design; (10) verification 2; (11) design innovations 1; (12) asynchronous design; and (13) design innovations 2

    Improving the Hardware Performance of Arithmetic Circuits using Approximate Computing

    Get PDF
    An application that can produce a useful result despite some level of computational error is said to be error resilient. Approximate computing can be applied to error resilient applications by intentionally introducing error to the computation in order to improve performance, and it has been shown that approximation is especially well-suited for application in arithmetic computing hardware. In this thesis, novel approximate arithmetic architectures are proposed for three different operations, namely multiplication, division, and the multiply accumulate (MAC) operation. For all designs, accuracy is evaluated in terms of mean relative error distance (MRED) and normalized mean error distance (NMED), while hardware performance is reported in terms of critical path delay, area, and power consumption. Three approximate Booth multipliers (ABM-M1, ABM-M2, ABM-M3) are designed in which two novel inexact partial product generators are used to reduce the dimensions of the partial product matrix. The proposed multipliers are compared to other state-of-the-art designs in terms of both accuracy and hardware performance, and are found to reduce power consumption by up to 56% when compared to the exact multiplier. The function of the multipliers is verified in several image processing applications. Two approximate restoring dividers (AXRD-M1, AXRD-M2) are proposed along with a novel inexact restoring divider cell. In the first divider, the conventional cells are replaced with the proposed inexact cells in several columns. The second divider computes only a subset of the trial subtractions, after which the divisor and partial remainder are rounded and encoded so that they may be used to estimate the remaining quotient bits. The proposed dividers are evaluated for accuracy and hardware performance alongside several benchmarking designs, and their function is verified using change detection and foreground extraction applications. An approximate MAC unit is presented in which the multiplication is implemented using a modified version of ABM-M3. The delay is reduced by using a fused architecture where the accumulator is summed as part of the multiplier compression. The accuracy and hardware savings of the MAC unit are measured against several works from the literature, and the design is utilized in a number of convolution operations

    A Memory-Centric Customizable Domain-Specific FPGA Overlay for Accelerating Machine Learning Applications

    Get PDF
    Low latency inferencing is of paramount importance to a wide range of real time and userfacing Machine Learning (ML) applications. Field Programmable Gate Arrays (FPGAs) offer unique advantages in delivering low latency as well as energy efficient accelertors for low latency inferencing. Unfortunately, creating machine learning accelerators in FPGAs is not easy, requiring the use of vendor specific CAD tools and low level digital and hardware microarchitecture design knowledge that the majority of ML researchers do not possess. The continued refinement of High Level Synthesis (HLS) tools can reduce but not eliminate the need for hardware-specific design knowledge. The designs by these tools can also produce inefficient use of FPGA resources that ultimately limit the performance of the neural network. This research investigated a new FPGA-based software-hardware codesigned overlay architecture that opens the advantages of FPGAs to the broader ML user community. As an overlay, the proposed design allows rapid coding and deployment of different ML network configurations and different data-widths, eliminating the prior barrier of needing to resynthesize each design. This brings important attributes of code portability over different FPGA families. The proposed overlay design is a Single-Instruction-Multiple-Data (SIMD) Processor-In-Memory (PIM) architecture developed as a programmable overlay for FPGAs. In contrast to point designs, it can be programmed to implement different types of machine learning algorithms. The overlay architecture integrates bit-serial Arithmetic Logic Units (ALUs) with distributed Block RAMs (BRAMs). The PIM design increases the size of arithmetic operations and on-chip storage capacity. User-visible inference latencies are reduced by exploiting concurrent accesses to network parameters (weights and biases) and partial results stored throughout the distributed BRAMs. Run-time performance comparisons show that the proposed design achieves a speedup compared to HLS-based or custom-tuned equivalent designs. Notably, the proposed design is programmable, allowing rapid design space exploration without the need to resynthesize when changing ML algorithms on the FPGA

    Improved Development Cycle for 8-bit FPGA-Based Soft-Macros Targeting Complex Algorithms

    Get PDF
    Developing complex algorithms on 8-bit processors without proper development tools is challenging. This paper integrates a series of novel techniques to improve the development cycle for 8-bit soft-macros such as Xilinx PicoBlaze. The improvements proposed in this paper reduce development time significantly by eliminating the required resynthesis of the whole design upon HDL source code changes. Additionally, a technique is proposed to increase the maximum supported data memory size for PicoBlaze which facilitates development of complex algorithms. Also, a general verification technique is proposed based on a series of testbenches that perform code verification using comparison method. The proposed testbench scenario integrates “Inter-Processor Communication (IPC), shared memory, and interrupt” concepts that lays out a guideline for FPGA developers to verify their own designs using the proposed method. The proposed development cycle relies on a chip that has Programmable Logic (PL) fabric (to hold the soft processor) alongside of a hardened processor (to be used as algorithm verifier), therefore, a Xilinx Zynq Ultrascale+ MPSoC is chosen which has a hardened ARM processor. The development cycle proposed in this paper targets the PicoBlaze, but it can be easily ported to other FPGA macros such as Lattice Mico8, or any non-Xilinx FPGA macros
    corecore