950 research outputs found

    PolyLUT: Learning Piecewise Polynomials for Ultra-Low Latency FPGA LUT-based Inference

    Full text link
    Field-programmable gate arrays (FPGAs) are widely used to implement deep learning inference. Standard deep neural network inference involves the computation of interleaved linear maps and nonlinear activation functions. Prior work for ultra-low latency implementations has hardcoded the combination of linear maps and nonlinear activations inside FPGA lookup tables (LUTs). Our work is motivated by the idea that the LUTs in an FPGA can be used to implement a much greater variety of functions than this. In this paper, we propose a novel approach to training neural networks for FPGA deployment using multivariate polynomials as the basic building block. Our method takes advantage of the flexibility offered by the soft logic, hiding the polynomial evaluation inside the LUTs with zero overhead. We show that by using polynomial building blocks, we can achieve the same accuracy using considerably fewer layers of soft logic than by using linear functions, leading to significant latency and area improvements. We demonstrate the effectiveness of this approach in three tasks: network intrusion detection, jet identification at the CERN Large Hadron Collider, and handwritten digit recognition using the MNIST dataset

    Synchronization Technique for OFDM-Based UWB System

    Get PDF

    Multipartite table methods

    Get PDF
    International audienceA unified view of most previous table-lookup-and-addition methods (bipartite tables, SBTM, STAM, and multipartite methods) is presented. This unified view allows a more accurate computation of the error entailed by these methods, which enables a wider design space exploration, leading to tables smaller than the best previously published ones by up to 50 percent. The synthesis of these multipartite architectures on Virtex FPGAs is also discussed. Compared to other methods involving multipliers, the multipartite approach offers the best speed/area tradeoff for precisions up to 16 bits. A reference implementation is available at www.ens-lyon.fr/LIP/Arenaire/

    Architecture-Preserving Provable Repair of Deep Neural Networks

    Full text link
    Deep neural networks (DNNs) are becoming increasingly important components of software, and are considered the state-of-the-art solution for a number of problems, such as image recognition. However, DNNs are far from infallible, and incorrect behavior of DNNs can have disastrous real-world consequences. This paper addresses the problem of architecture-preserving V-polytope provable repair of DNNs. A V-polytope defines a convex bounded polytope using its vertex representation. V-polytope provable repair guarantees that the repaired DNN satisfies the given specification on the infinite set of points in the given V-polytope. An architecture-preserving repair only modifies the parameters of the DNN, without modifying its architecture. The repair has the flexibility to modify multiple layers of the DNN, and runs in polynomial time. It supports DNNs with activation functions that have some linear pieces, as well as fully-connected, convolutional, pooling and residual layers. To the best our knowledge, this is the first provable repair approach that has all of these features. We implement our approach in a tool called APRNN. Using MNIST, ImageNet, and ACAS Xu DNNs, we show that it has better efficiency, scalability, and generalization compared to PRDNN and REASSURE, prior provable repair methods that are not architecture preserving.Comment: Accepted paper at PLDI 2023. Tool is available at https://github.com/95616ARG/APRNN

    Fast Visualization by Shear-Warp using Spline Models for Data Reconstruction

    Full text link
    This work concerns oneself with the rendering of huge three-dimensional data sets. The target thereby is the development of fast algorithms by also applying recent and accurate volume reconstruction models to obtain at most artifact-free data visualizations. In part I a comprehensive overview on the state of the art in volume rendering is given. Part II is devoted to the recently developed trivariate (linear,) quadratic and cubic spline models defined on symmetric tetrahedral partitions directly obtained by slicing volumetric partitions of a three-dimensional domain. This spline models define piecewise polynomials of total degree (one,) two and three with respect to a tetrahedron, i.e. the local splines have the lowest possible total degree and are adequate for efficient and accurate volume visualization. The following part III depicts in a step by step manner a fast software-based rendering algorithm, called shear-warp. This algorithm is prominent for its ability to generate projections of volume data at real time. It attains the high rendering speed by using elaborate data structures and extensive pre-computation, but at the expense of data redundancy and visual quality of the finally obtained rendering results. However, to circumvent these disadvantages a further development is specified, where new techniques and sophisticated data structures allow combining the fast shear-warp with the accurate ray-casting approach. This strategy and the new data structures not only grant a unification of the benefits of both methods, they even easily admit for adjustments to trade-off between rendering speed and precision. With this further development also the 3-fold data redundancy known from the original shear-warp approach is removed, allowing the rendering of even larger three-dimensional data sets more quickly. Additionally, real trivariate data reconstruction models, as discussed in part II, are applied together with the new ideas to onward the precision of the new volume rendering method, which also lead to a one order of magnitude faster algorithm compared to traditional approaches using similar reconstruction models. In part IV, a hierarchy-based rendering method is developed which utilizes a wavelet decomposition of the volume data, an octree structure to represent the sparse data set, the splines from part II and a new shear-warp visualization algorithm similar to that presented in part III. This thesis is concluded by the results centralized in part V

    An Efficient Hardware Implementation of LDPC Decoder

    Get PDF
    Reliable communication over noisy channel is an old but still challenging issues for communication engineers. Low density parity check codes (LDPC) are linear block codes proposed by Robert G. Gallager in 1960. LDPC codes have lesser complexity compared to Turbo-codes. In most recent wireless communication standard, LDPC is used as one of the most popular forward error correction (FEC) codes due to their excellent error-correcting capability. In this thesis we focus on hardware implementation of the LDPC used in Digital Video Broadcasting - Satellite - Second Generation (DVB-S2) standard ratified in 2005. In architecture design of LDPC decoder, because of the structure of DVB-S2, a memory mapping scheme is used that allows 360 functional units implement simultaneously. The functional units are optimized to reduce hardware resource utilization on an FPGA. A novel design of Range addressable look up table (RALUT) for hyperbolic tangent function is proposed that simplifies the LDPC decoding algorithm while the performance remains the same. Commonly, RALUTs are uniformly distributed on input, however, in our proposed method, instead of representing the LUT input uniformly, we use a non-uniform scale assigning more values to those near zero. Zynq XC7Z030, a family of FPGA’s, is used for Evaluation of the complexity of the proposed design. Synthesizes result show the speed increase due to use of LUT method, however, LUT demand more memory. Thus, we decrease the usage of resource by applying RALUT method