1,243 research outputs found

    FPGA acceleration of a quantized neural network for remote-sensed cloud detection

    Get PDF
    The capture and transmission of remote-sensed imagery for Earth observation is both computationally and bandwidth expensive. In the analyses of remote-sensed imagery in the visual band, atmospheric cloud cover can obstruct up to two-thirds of observations, resulting in costly imagery being discarded. Mission objectives and satellite operational details vary; however, assuming a cloud-free observation requirement, a doubling of useful data downlinked with an associated halving of delivery cost is possible through effective cloud detection. A minimal-resource, real-time inference neural network is ideally suited to perform automatic cloud detection, both for pre-processing captured images prior to transmission and preventing unnecessary images being taken by larger payload sensors. Much of the hardware complexity of modern neural network implementations resides in high-precision floating-point calculation pipelines. In recent years, research has been conducted in identifying quantized, or low-integer precision equivalents to known deep learning models, which do not require the extensive resources of their floating-point, full-precision counterparts. Our work leverages existing research on binary and quantized neural networks to develop a real-time, remote-sensed cloud detection solution using a commodity field-programmable gate array. This follows on developments of the Forwards Looking Imager for predictive cloud detection developed by Craft Prospect, a space engineering practice based in Glasgow, UK. The synthesized cloud detection accelerator achieved an inference throughput of 358.1 images per second with a maximum power consumption of 2.4 W. This throughput is an order of magnitude faster than alternate algorithmic options for the Forwards Looking Imager at around one third reduction in classification accuracy, and approximately two orders of magnitude faster than the CloudScout deep neural network, deployed with HyperScout 2 on the European Space Agency PhiSat-1 mission. Strategies for incorporating fault tolerance mechanisms are expounded

    Support-reducing decomposition for FPGA mapping

    Get PDF
    Decomposition is a technology-independent process, in which a large complex function is broken into smaller, less complex functions. The costs of two-level or factored-form representations (cubes and literals) are used in most decomposition methods, as they have a high correlation with the area of cell-based designs. However, this correlation is weaker for field-programmable gate arrays (FPGAs) based on look-up tables. Furthermore, local optimizations have limited power due to the structural bias of the circuit descriptions. This paper tries to reduce the structural biasing by remapping the LUT network and decomposing the derived functions using the support as cost function. The proposed method improves the FPGA mapping results of a commercial tool for the 20 largest MCNC benchmarks, with gains of 28% in delay plus 18% in area when targeting delay, and a reduction of 28% in area plus 14% in delay with area as cost function. Results with 23% less area and 6% less delay are obtained after physical synthesis (post place-and-route). Moreover, 12 of the best known results for delay (and 3 for area) of the EPFL benchmarks are improved.Peer ReviewedPostprint (author's final draft

    Automatic design of domain-specific instructions for low-power processors

    Get PDF
    This paper explores hardware specialization of low­ power processors to improve performance and energy efficiency. Our main contribution is an automated framework that analyzes instruction sequences of applications within a domain at the loop body level and identifies exactly and partially-matching sequences across applications that can become custom instructions. Our framework transforms sequences to a new code abstraction, a Merging Diagram, that improves similarity identification, clusters alike groups of potential custom instructions to effectively reduce the search space, and selects merged custom instructions to efficiently exploit the available customizable area. For a set of 11 media applications, our fast framework generates instructions that significantly improve the energy-delay product and speed­ up, achieving more than double the savings as compared to a technique analyzing sequences within basic blocks. This paper shows that partially-matched custom instructions, which do not significantly increase design time, are crucial to achieving higher energy efficiency at limited hardware areas

    Nanosecond machine learning event classification with boosted decision trees in FPGA for high energy physics

    Full text link
    We present a novel implementation of classification using the machine learning / artificial intelligence method called boosted decision trees (BDT) on field programmable gate arrays (FPGA). The firmware implementation of binary classification requiring 100 training trees with a maximum depth of 4 using four input variables gives a latency value of about 10 ns, independent of the clock speed from 100 to 320 MHz in our setup. The low timing values are achieved by restructuring the BDT layout and reconfiguring its parameters. The FPGA resource utilization is also kept low at a range from 0.01% to 0.2% in our setup. A software package called fwXmachina achieves this implementation. Our intended user is an expert of custom electronics-based trigger systems in high energy physics experiments or anyone that needs decisions at the lowest latency values for real-time event classification. Two problems from high energy physics are considered, in the separation of electrons vs. photons and in the selection of vector boson fusion-produced Higgs bosons vs. the rejection of the multijet processes.Comment: 66 pages, 27 figures, 13 tables, JINST versio

    Digital electronic predistortion for optical communications

    Get PDF
    The distortion of optical signals has long been an issue limiting the performance of communication systems. With the increase of transmission speeds the effects of distortion are becoming more prominent. Because of this, the use of methods known from digital signal processing (DSP) are being introduced to compensate for them. Applying DSP to improve optical signals has been limited by a discrepancy in digital signal processing speeds and optical transmission speeds. However high speed Field Programmable Gate Arrays (FPGA) which are sufficiently fast have now become available making DSP experiments without costly ASIC implementation possible for optical transmission experiments. This thesis focuses on Look Up Table (LUT) based digital Electronic Predistortion (EPD) for optical transmission. Because it is only one out of many possible implementations of EPD, it has to be placed in context with other EPD techniques and other distortion combating techniques in general, especially since it is possible to combine the different techniques. Building an actual transmitter means that compromises and decisions have to be made in the design and implementation of an EPD based system. These are based on balancing the desire to achieve optimal performance with technological and economic limitations. This is partly done using optical simulations to asses the performance. This thesis describes a novel experimental transmitter that has been built as part of this research applying LUT based EPD to an optical signal. The experimental transmitter consists of a digital design (using a hardware description language) for a pair of FPGAs and an analogue optical/electronic setup including two standard DAC integrated circuits. The DSP in the transmitter compensated for both chromatic dispersion and self phase modulation. We achieved transmission of 10.7 Gb/s non-return-to-zero (NRZ) signals with a +4 dBm launch power over 450 km keeping the required optical-signal-to-noise-ratio (OSNR) for a bit-error-rate of 2x10^{-3} below 11 dB. In doing so we showed experimentally, for the first time, that nonlinear effects can be compensated with this approach and that the combination of FPGA-DAC is a viable approach for an experimental setup

    Anti-Tamper Method for Field Programmable Gate Arrays Through Dynamic Reconfiguration and Decoy Circuits

    Get PDF
    As Field Programmable Gate Arrays (FPGAs) become more widely used, security concerns have been raised regarding FPGA use for cryptographic, sensitive, or proprietary data. Storing or implementing proprietary code and designs on FPGAs could result in the compromise of sensitive information if the FPGA device was physically relinquished or remotely accessible to adversaries seeking to obtain the information. Although multiple defensive measures have been implemented (and overcome), the possibility exists to create a secure design through the implementation of polymorphic Dynamically Reconfigurable FPGA (DRFPGA) circuits. Using polymorphic DRFPGAs removes the static attributes from their design; thus, substantially increasing the difficulty of successful adversarial reverse-engineering attacks. A variety of dynamically reconfigurable methodologies exist for implementation that challenge designers in the reconfigurable technology field. A Hardware Description Language (HDL) DRFPGA model is presented for use in security applications. The Very High Speed Integrated Circuit HDL (VHSIC) language was chosen to take advantage of its capabilities, which are well suited to the current research. Additionally, algorithms that explicitly support granular autonomous reconfiguration have been developed and implemented on the DRFPGA as a means of protecting its designs. Documented testing validates the reconfiguration results and compares power usage, timing, and area estimates from a conventional and DRFPGA model

    Null Convention Logic applications of asynchronous design in nanotechnology and cryptographic security

    Get PDF
    This dissertation presents two Null Convention Logic (NCL) applications of asynchronous logic circuit design in nanotechnology and cryptographic security. The first application is the Asynchronous Nanowire Reconfigurable Crossbar Architecture (ANRCA); the second one is an asynchronous S-Box design for cryptographic system against Side-Channel Attacks (SCA). The following are the contributions of the first application: 1) Proposed a diode- and resistor-based ANRCA (DR-ANRCA). Three configurable logic block (CLB) structures were designed to efficiently reconfigure a given DR-PGMB as one of the 27 arbitrary NCL threshold gates. A hierarchical architecture was also proposed to implement the higher level logic that requires a large number of DR-PGMBs, such as multiple-bit NCL registers. 2) Proposed a memristor look-up-table based ANRCA (MLUT-ANRCA). An equivalent circuit simulation model has been presented in VHDL and simulated in Quartus II. Meanwhile, the comparison between these two ANRCAs have been analyzed numerically. 3) Presented the defect-tolerance and repair strategies for both DR-ANRCA and MLUT-ANRCA. The following are the contributions of the second application: 1) Designed an NCL based S-Box for Advanced Encryption Standard (AES). Functional verification has been done using Modelsim and Field-Programmable Gate Array (FPGA). 2) Implemented two different power analysis attacks on both NCL S-Box and conventional synchronous S-Box. 3) Developed a novel approach based on stochastic logics to enhance the resistance against DPA and CPA attacks. The functionality of the proposed design has been verified using an 8-bit AES S-box design. The effects of decision weight, bitstream length, and input repetition times on error rates have been also studied. Experimental results shows that the proposed approach enhances the resistance to against the CPA attack by successfully protecting the hidden key --Abstract, page iii
    corecore