454 research outputs found

    Design Space Exploration of Neural Network Activation Function Circuits

    Full text link
    The widespread application of artificial neural networks has prompted researchers to experiment with FPGA and customized ASIC designs to speed up their computation. These implementation efforts have generally focused on weight multiplication and signal summation operations, and less on activation functions used in these applications. Yet, efficient hardware implementations of nonlinear activation functions like Exponential Linear Units (ELU), Scaled Exponential Linear Units (SELU), and Hyperbolic Tangent (tanh), are central to designing effective neural network accelerators, since these functions require lots of resources. In this paper, we explore efficient hardware implementations of activation functions using purely combinational circuits, with a focus on two widely used nonlinear activation functions, i.e., SELU and tanh. Our experiments demonstrate that neural networks are generally insensitive to the precision of the activation function. The results also prove that the proposed combinational circuit-based approach is very efficient in terms of speed and area, with negligible accuracy loss on the MNIST, CIFAR-10 and IMAGENET benchmarks. Synopsys Design Compiler synthesis results show that circuit designs for tanh and SELU can save between 3.13-7.69 and 4.45-8:45 area compared to the LUT/memory-based implementations, and can operate at 5.14GHz and 4.52GHz using the 28nm SVT library, respectively. The implementation is available at: https://github.com/ThomasMrY/ActivationFunctionDemo.Comment: 5 pages, 5 figures, 16 conferenc

    FPGA design methodology for industrial control systems—a review

    Get PDF
    This paper reviews the state of the art of fieldprogrammable gate array (FPGA) design methodologies with a focus on industrial control system applications. This paper starts with an overview of FPGA technology development, followed by a presentation of design methodologies, development tools and relevant CAD environments, including the use of portable hardware description languages and system level programming/design tools. They enable a holistic functional approach with the major advantage of setting up a unique modeling and evaluation environment for complete industrial electronics systems. Three main design rules are then presented. These are algorithm refinement, modularity, and systematic search for the best compromise between the control performance and the architectural constraints. An overview of contributions and limits of FPGAs is also given, followed by a short survey of FPGA-based intelligent controllers for modern industrial systems. Finally, two complete and timely case studies are presented to illustrate the benefits of an FPGA implementation when using the proposed system modeling and design methodology. These consist of the direct torque control for induction motor drives and the control of a diesel-driven synchronous stand-alone generator with the help of fuzzy logic

    A Survey of Spiking Neural Network Accelerator on FPGA

    Full text link
    Due to the ability to implement customized topology, FPGA is increasingly used to deploy SNNs in both embedded and high-performance applications. In this paper, we survey state-of-the-art SNN implementations and their applications on FPGA. We collect the recent widely-used spiking neuron models, network structures, and signal encoding formats, followed by the enumeration of related hardware design schemes for FPGA-based SNN implementations. Compared with the previous surveys, this manuscript enumerates the application instances that applied the above-mentioned technical schemes in recent research. Based on that, we discuss the actual acceleration potential of implementing SNN on FPGA. According to our above discussion, the upcoming trends are discussed in this paper and give a guideline for further advancement in related subjects

    Using LSTM recurrent neural networks for monitoring the LHC superconducting magnets

    Full text link
    The superconducting LHC magnets are coupled with an electronic monitoring system which records and analyses voltage time series reflecting their performance. A currently used system is based on a range of preprogrammed triggers which launches protection procedures when a misbehavior of the magnets is detected. All the procedures used in the protection equipment were designed and implemented according to known working scenarios of the system and are updated and monitored by human operators. This paper proposes a novel approach to monitoring and fault protection of the Large Hadron Collider (LHC) superconducting magnets which employs state-of-the-art Deep Learning algorithms. Consequently, the authors of the paper decided to examine the performance of LSTM recurrent neural networks for modeling of voltage time series of the magnets. In order to address this challenging task different network architectures and hyper-parameters were used to achieve the best possible performance of the solution. The regression results were measured in terms of RMSE for different number of future steps and history length taken into account for the prediction. The best result of RMSE=0.00104 was obtained for a network of 128 LSTM cells within the internal layer and 16 steps history buffer

    Real Time 3-D Graphics Processing Hardware Design using Field-Programmable Gate Arrays.

    Get PDF
    Three dimensional graphics processing requires many complex algebraic and matrix based operations to be performed in real-time. In early stages of graphics processing, such tasks were delegated to a Central Processing Unit (CPU). Over time as more complex graphics rendering was demanded, CPU solutions became inadequate. To meet this demand, custom hardware solutions that take advantage of pipelining and massive parallelism become more preferable to CPU software based solutions. This fact has lead to the many custom hardware solutions that are available today. Since real time graphics processing requires extreme high performance, hardware solutions using Application Specific Integrated Circuits (ASICs) are the standard within the industry. While ASICs are a more than adequate solution for implementing high performance custom hardware, the design, implementation and testing of ASIC based designs are becoming cost prohibitive due to the massive up front verification effort needed as well as the cost of fixing design defects.Field Programmable Gate Arrays (FPGAs) provide an alternative to the ASIC design flow. More importantly, in recent years FPGA technology have begun to improve in performance to the point where ASIC and FPGA performance has become comparable. In addition, FPGAs address many of the issues of the ASIC design flow. The ability to reconfigure FPGAs reduces the upfront verification effort and allows design defects to be fixed easily. This thesis demonstrates that a 3-D graphics processor implementation on and FPGA is feasible by implementing both a two dimensional and three dimensional graphics processor prototype. By using a Xilinx Virtex 5 ML506 FPGA development kit a fully functional wireframe graphics rendering engine is implemented using VHDL and Xilinx's development tools. A VHDL testbench was designed to verify that the graphics engine works functionally. This is followed by synthesizing the design and real hardware and developing test applications to verify functionality and performance of the design. This thesis provides the ground work for push forward the use of FPGA technology in graphics processing applications

    Experimental Validation of a Faithful Binary Circuit Model

    Get PDF
    International audienceFast digital timing simulations based on continuous-time, digital-value circuit models are an attractive and heavily used alternative to analog simulations. Models based on analytic delay formulas are particularly interesting here, as they also facilitate formal verification and delay bound synthesis of complex circuits. Recently, FĂĽgger et al. (arXiv:1406.2544 [cs.OH]) proposed a circuit model based on so-called involution channels. It is the first binary circuit model that realistically captures solvability of short-pulse filtration, a non-trivial glitch propagation problem related to building one-shot inertial delays. In this work, we address the question of whether involu-tion channels also accurately model the delay of real circuits. Using both Spice simulations and physical measurements, we confirm that modeling an inverter chain by involution channels accurately describes reality. We also demonstrate that transitions in vanishing pulse trains are accurately predicted by the involution model. For our Spice simulations, we used both UMC-90 and UMC-65 technology, with varying supply voltages from nominal down to near sub-threshold range. The measurements were performed on a special-purpose UMC-90 ASIC that combines an inverter chain with low-intrusive high-speed on-chip analog amplifiers

    Development, Characterization, and Analysis of Silicon Microstrip Detector Modules for the CBM Silicon Tracking System

    Get PDF
    The future Facility for Antiproton and Ion Research (FAIR) at GSI, Germany, will enable scientists to create tiny droplets of cosmic matter in the laboratory—matter subject to extreme conditions usually found in the interior of stars or during stellar collisions. The Compressed Baryonic Matter (CBM) experiment at FAIR aims to explore the quantum chromodynamics (QCD) phase diagram at high densities and moderate temperatures. By colliding heavy ions at relativistic beam energies, the conditions inside these supermassive objects can be recreated for an exceptionally short amount of time. The CBM detector is a fixed-target multi-purpose detector designed for measuring hadrons, electrons and muons in elementary nucleon and heavy-ion collisions over the full FAIR beam energy range delivered by the SIS100 synchrotron. One of the core detectors of CBM is the Silicon Tracking System (STS), responsible for measuring the momentum and tracks of up to 700 charged particles produced in a central nucleus-nucleus collisions. Due to the required momentum resolution, the material budget of the STS must be minimized. Therefore, the readout electronics and the cooling and mechanical infrastructure are placed out of the detector acceptance. The double-sided silicon microstrip sensors are connected to the self-triggering frontend electronics using low-mass flexible microcables with a length of up to 50 cm. The main goal of this thesis was to develop a high-density interconnection technology based on copper microcables. We developed a low-mass double-layered copper microcable at the edge of modern fabrication technology. Based on the copper microcable, we developed a novel high-density interconnection technology, comprising fine-grain solder paste printing on the microcable and gold stud bumping on the die. The gold stud--solder technology combines a high automation capability with good mechanical and electrical properties, making it an interesting technology also for future detector systems. Building on the gold stud--solder technology, a fully customized bonder machine was developed and constructed in hardware and software. Its main purpose is the realization of the challenging interconnection between the microcable and the sensor. Key components of the machine are four step motors with a sub-micron step resolution, a dual-camera pattern recognition system, a heatable, temperature-controlled bond head and sensor plate, as well as tailor-made mechanical supports for the STS detector modules. With the help of this bonder machine, a full-scale STS detector module in the copper technology was built. The noise performance of the copper module was evaluated in a bias voltage scan. Very low noise levels were observed. Measurements of the absolute value of the signal with a radioactive source allowed us to estimate the signal-to-noise ratio of the module. The results of these measurements give us confidence that STS modules based on the copper technology can achieve a satisfying performance comparable to the modules built in the aluminium technology. Another essential component of the STS detector module is the frontend electronics chip. During this work, the version 2.1 of the STS-XYTER readout ASIC was extensively characterized. Noise discrepancies between odd and even channels and increasingly higher noise towards the higher channel numbers had been observed in the predecessor chip. Our measurements of the STS-XYTER2.1 verified that both issues were successfully resolved. Furthermore, the noise behavior of the ASIC with respect to input load capacitance was studied. This is essential to parametrize expected noise levels for the many kinds of detector modules employed in the STS, to which the measured noise levels can then be compared. Measurements of the noise levels as a function of shaping time showed that the overall noise level is practically independent of shaper peaking time. Radiation tests with 50 MeV protons were performed with copper microcables connected to the ASIC in a non-powered state. No indications of damage to the chip and interconnects could be observed. Finally, a complete STS detector module in aluminium technology was subjected to a pencil-like monochromatic beam of 2.7 GeV/c protons at the Cooling Synchrotron at the research center Jülich. Several essential performance criteria of the detector module were evaluated. The best coincidence between the STS and the reference fiber hodoscopes was established based on time information. An excellent time resolution of a few nanoseconds could be demonstrated. Based on the best coincidence, the spatial resolution of the full system was determined to be a few hundred microns. This is in line with expectations, as the resolution is limited by the fiber hodoscope resolution. Charge distributions of 1-strip clusters showed a clear separation between the noise and the proton signal peak, with a signal-to-noise ratio above 20 for the p-- and n-side. The charge collection efficiency of the module was estimated to be 96%96 \%. The COSY beamtime enabled a first-time evaluation of the full analysis software chain with real data and the evaluation of the full electronic readout chain of STS. The experience gained at COSY is immensely helpful for commissioning and data analysis in more complex beam environments such as mCBM, where a subsample of the CBM detectors is exposed to the particles created in a heavy-ion collision in run-time scenarios closely resembling the final CBM environment
    • …
    corecore